Index not being used (2023-04-05 02:16 by kishima #94871)
Hello,
I am creating an index on a table with a column that might have very large text (over 3k characters).
```
CREATE TABLE big_better_text (
id SERIAL PRIMARY KEY,
text TEXT
);
CREATE INDEX idx_big_text ON big_better_text USING gin(text gin_bigm_ops);
```
I wanted to use this extension because regular indexes are not used in this scenario, and I thought this extension would enable this behavior. Did I understand it correctly?
I want to do an exact string match on this column.
Re: Index not being used (2023-04-06 02:05 by fujii_masao #94893)
メッセージ #94871 への返信
> Hello,
>
> I am creating an index on a table with a column that might have very large text (over 3k characters).
> ```
> CREATE TABLE big_better_text (
> id SERIAL PRIMARY KEY,
> text TEXT
> );
>
> CREATE INDEX idx_big_text ON big_better_text USING gin(text gin_bigm_ops);
> ```
> I wanted to use this extension because regular indexes are not used in this scenario, and I thought this extension would enable this behavior. Did I understand it correctly?
> I want to do an exact string match on this column.
Yes, you can use pg_bigm to create a bi-gram index on a column with large text data. This allows the index to be used by a LIKE query such as the one below.
SELECT * FROM big_better_text WHERE text LIKE '...';
Re: Index not being used (2023-04-06 04:03 by kishima #94895)
Reply To Message #94893
> メッセージ #94871 への返信
> > Hello,
> >
> > I am creating an index on a table with a column that might have very large text (over 3k characters).
> > ```
> > CREATE TABLE big_better_text (
> > id SERIAL PRIMARY KEY,
> > text TEXT
> > );
> >
> > CREATE INDEX idx_big_text ON big_better_text USING gin(text gin_bigm_ops);
> > ```
> > I wanted to use this extension because regular indexes are not used in this scenario, and I thought this extension would enable this behavior. Did I understand it correctly?
> > I want to do an exact string match on this column.
>
> Yes, you can use pg_bigm to create a bi-gram index on a column with large text data. This allows the index to be used by a LIKE query such as the one below.
>
> SELECT * FROM big_better_text WHERE text LIKE '...';
Would a query in the form
select big_better_text.text, count(*) as ct from big_better_text group by big_better_text.text order by ct desc;
be benefited from it? Tbh this is kind of a large scan, so maybe there's no good index for this. But if I specify some texts for identical string match, it indeed benefits the query.
Re: Index not being used (2023-04-10 20:20 by fujii_masao #94950)
メッセージ #94895 への返信
> Would a query in the form
>
> select big_better_text.text, count(*) as ct from big_better_text group by big_better_text.text order by ct desc;
>
> be benefited from it? Tbh this is kind of a large scan, so maybe there's no good index for this. But if I specify some texts for identical string match, it indeed benefits the query.
I'm sorry to inform you that the pg_bigm GIN index cannot be used for GROUP BY queries.
You could try creating a btree index using a function on the column, e.g., CREATE INDEX testidx ON big_better_text (md5(text)), and then issue a query like "SELECT big_better_text.text, count(*) as ct FROM big_better_text GROUP BY md5(text), text ORDER BY ct DESC". This might allow you to use the index for GROUP BY on the large text column. However, it's not clear if using the index will provide better performance than a sequential scan.
Re: Index not being used (2023-04-06 04:04 by kishima #94896)
Reply To Message #94893
> メッセージ #94871 への返信
> > Hello,
> >
> > I am creating an index on a table with a column that might have very large text (over 3k characters).
> > ```
> > CREATE TABLE big_better_text (
> > id SERIAL PRIMARY KEY,
> > text TEXT
> > );
> >
> > CREATE INDEX idx_big_text ON big_better_text USING gin(text gin_bigm_ops);
> > ```
> > I wanted to use this extension because regular indexes are not used in this scenario, and I thought this extension would enable this behavior. Did I understand it correctly?
> > I want to do an exact string match on this column.
Thanks for the answer!
>
> Yes, you can use pg_bigm to create a bi-gram index on a column with large text data. This allows the index to be used by a LIKE query such as the one below.
>
> SELECT * FROM big_better_text WHERE text LIKE '...';