mysql - What is the best way to implement a substring search in SQL? -
we have simple sql problem here. in varchar column, wanted search string anywhere in field. best way implement performance? index not going here, other tricks?
we using mysql , have 3 million records. need execute many of these queries per second trying implement these best performance.
the simple way far is:
select * table column '%search%'
i should further specify column long string "sadfasdfwerwe" , have search "asdf" in column. so not sentences , trying match word in them. full text search still here?
check out presentation practical fulltext search in mysql.
i compared:
like
predicates- regular expression predicates (no better
like
) - myisam fulltext indexing
- sphinx search
- apache lucene
- inverted indexing
- google custom search engine
today use apache solr, puts lucene service bunch of features , tools.
re comment: aha, okay, no. none of fulltext search capabilities mentioned going help, since assume kind of word boundaries
the other way efficiently find arbitrary substrings n-gram approach. basically, create index of possible sequences of n letters , point strings each respective sequence occurs. typically done n=3, or trigram, because it's point of compromise between matching longer substrings , keeping index manageable size.
i don't know of sql database supports n-gram indexing transparently, set using inverted index:
create table trigrams ( trigram char(3) primary key ); create table trigram_matches ( trigram char(3), document_id int, primary key (trigram, document_id), foreign key (trigram) references trigrams(trigram), foreign key (document_id) references mytable(document_id) );
now populate hard way:
insert trigram_matches select t.trigram, d.document_id trigrams t join mytable d on d.textcolumn concat('%', t.trigram, '%');
of course take quite while! once it's done, can search more quickly:
select d.* mytable d join trigram_matches t on t.document_id = d.document_id t.trigram = 'abc'
of course searching patterns longer 3 characters, inverted index still helps narrow search lot:
select d.* mytable d join trigram_matches t on t.document_id = d.document_id t.trigram = 'abc' , d.textcolumn '%abcdef%';
Comments
Post a Comment