Fuzzy Matching Demo in Access
Longest Common Subsequence
The LCS is calculated by the length of the longest common, not necessarily contiguous, sub-sequence of characters divided by the average character lengths of both strings. A good value for LCS is greater than or equal to 0.8.
Public Function LCS(ByVal str1, ByVal str2) As Single
'* *************** Longest Common Subsequence ******************
'* The LCS is calculated by the length of the longest common,
'* not necessarily contiguous, sub-sequence of characters
'* divided by the average character lengths of both strings.
'* In this case: c(m, n) / (((Str1Len) + (Str2Len)) / 2))
'* LCS is symmetric.
'*
'* str1 and str2 are arrays.
'***************************************************************
Dim i As Integer, j As Integer, m As Integer, n As Integer
Dim c() As Integer, b() As Integer, X$(), Y$()
Dim Str1Len As Integer, Str2Len As Integer, SmStr(), LgStr()
Str1Len = UBound(str1)
Str2Len = UBound(str2)
n = Minimum(Str1Len, Str2Len)
m = Maximum(Str1Len, Str2Len)
ReDim X(m)
ReDim Y(m)
ReDim c(m, m)
ReDim b(m, m)
If Str1Len > Str2Len Then
For i = 0 To Str1Len - 1
X(i) = str1(i)
Next i
For i = 0 To Str2Len - 1
Y(i) = str2(i)
Next i
Else
For i = 0 To Str2Len - 1
X(i) = str2(i)
Next i
For i = 0 To Str1Len - 1
Y(i) = str1(i)
Next i
End If ' Str1Len >Str2Len
For i = 1 To m
For j = 1 To n
If X(i - 1) = Y(j - 1) Then
c(i, j) = c(i - 1, j - 1) + 1
b(i, j) = 1 ' /* from north west */
ElseIf c(i - 1, j) >= c(i, j - 1) Then
c(i, j) = c(i - 1, j)
b(i, j) = 2 ' /* from north */
Else
c(i, j) = c(i, j - 1)
b(i, j) = 3 ' /* from west */
End If
Next j
Next i
' return c[m][n];
If c(m, n) > 0 Then
LCS = CSng(Format((c(m, n) / (((Str1Len) + _
(Str2Len)) / 2)), "#.##"))
Else
LCS = 0
End If
Erase X
Erase Y
Erase c
Erase b
End Function
Double Metaphone
This is an algorithm to code English words (and foreign words often heard in the United States) phonetically by reducing them to 12 consonant sounds. This reduces matching problems from wrong spelling.
The Double Metaphone algorithm, developed by Lawrence Phillips and published in the June 2000 issue of C/C++ Users Journal, is part of a class of algorithms known as "phonetic matching" or "phonetic encoding" algorithms.
He wrote it as a replacement for SOUNDEX. These algorithms attempt to detect phonetic ("sounds-like") relationships between words. For example, a phonetic matching algorithm should detect a strong phonetic relationship between "Nelson" and "Nilsen," and no phonetic relationship between "Adam" and "Nelson."
Double Metaphone is designed primarily to encode American English names (though it also encodes most English words well) while taking into account the fact that such words can have more than one acceptable pronunciation.
Double Metaphone can compute a primary and a secondary encoding for a given word or name to indicate both the most likely pronunciation as well as an optional alternative pronunciation (hence the "double" in the name).
"United States" would generate this primary phonetic code: ANTTSTTS
"United Statess" would generate this primary phonetic code: ANTTSTTSS
The final step in Double Metaphone is to translate the two primary codes into match values. That must be done through one of the other algorithms. I will not show the code here because the main code and supporting code is very extensive. The code is all there in VBAlgorithms.txt, included in the attachment for this article.
Algorithmically speaking, matching is divine and 99% is sweet! What an adventure!
About the Author
Downloads
More for Developers
Top Authors
- Voted: 13 times.
- Voted: 11 times.
- Voted: 11 times.
- Voted: 8 times.
- Voted: 8 times.
- Paul Kimmel 214 articles
- Zafir Anjum 120 articles
- 15Seconds.com 99 articles
- Tom Archer - MSFT 83 articles
- Jeffrey Juday 82 articles


All