mysql - What is the best way to implement a substring search in SQL? -


we have simple sql problem here. in varchar column, wanted search string anywhere in field. best way implement performance? index not going here, other tricks?

we using mysql , have 3 million records. need execute many of these queries per second trying implement these best performance.

the simple way far is:

select * table column '%search%' 

i should further specify column long string "sadfasdfwerwe" , have search "asdf" in column. so not sentences , trying match word in them. full text search still here?

check out presentation practical fulltext search in mysql.

i compared:

today use apache solr, puts lucene service bunch of features , tools.


re comment: aha, okay, no. none of fulltext search capabilities mentioned going help, since assume kind of word boundaries

the other way efficiently find arbitrary substrings n-gram approach. basically, create index of possible sequences of n letters , point strings each respective sequence occurs. typically done n=3, or trigram, because it's point of compromise between matching longer substrings , keeping index manageable size.

i don't know of sql database supports n-gram indexing transparently, set using inverted index:

create table trigrams (   trigram char(3) primary key );  create table trigram_matches (   trigram char(3),   document_id int,   primary key (trigram, document_id),   foreign key (trigram) references trigrams(trigram),   foreign key (document_id) references mytable(document_id) ); 

now populate hard way:

insert trigram_matches   select t.trigram, d.document_id   trigrams t join mytable d     on d.textcolumn concat('%', t.trigram, '%'); 

of course take quite while! once it's done, can search more quickly:

select d.* mytable d join trigram_matches t   on t.document_id = d.document_id t.trigram = 'abc' 

of course searching patterns longer 3 characters, inverted index still helps narrow search lot:

select d.* mytable d join trigram_matches t   on t.document_id = d.document_id t.trigram = 'abc'   , d.textcolumn '%abcdef%'; 

Comments

Popular posts from this blog

c++ - How do I get a multi line tooltip in MFC -

asp.net - In javascript how to find the height and width -

c# - DataTable to EnumerableRowCollection -