Click to See Complete Forum and Search --> : Best string matching algorithm


seemap123
November 2nd, 2006, 01:31 PM
Hi,
I have implemented the Levenshtien's algorithm for string matching to find the insertions, deletions and substitutions for converting one string to another. But, this works fine for small strings about 256 characters.
Can anyone give me a suggestion as to which algorithm will work best for comparing strings(data) of more than 256 characters( about 2GB if it is a clob field)?
Thanks in advance

MikeAThon
November 2nd, 2006, 03:14 PM
There is a fine collection of string searching algorithms over here: "Exact String Matching Algorithms" at http://www-igm.univ-mlv.fr/~lecroq/string/index.html

The site explains each algorithm, gives pseudo-code, and provides java animations of examples of the algorithm's progress. All of the algorithms are designed to work on huge volumes of text.

The examples all involve genetics, i.e., searching for nucleotide sequences ("agct" etc), but it's easy to see how the examples apply to plain text too.

It's my understanding that the best/most efficient algorithms are based on the Boyer-Moore algorithm, which is also explained in Wikipedia: "Boyer–Moore string search algorithm" at http://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_search_algorithm

Mike