User ID:
Password:
Remember Me:
Forgot Password?
Not a member?
Click here for more information and to register.

    Fuzzy Matching Demo in Access



    Longest Common Subsequence

    The LCS is calculated by the length of the longest common, not necessarily contiguous, sub-sequence of characters divided by the average character lengths of both strings. A good value for LCS is greater than or equal to 0.8.

    Public Function LCS(ByVal str1, ByVal str2) As Single
    '* *************** Longest Common Subsequence ******************
    '* The LCS is calculated by the length of the longest common,
    '* not necessarily contiguous, sub-sequence of characters
    '* divided by the average character lengths of both strings.
    '* In this case:  c(m, n) / (((Str1Len) + (Str2Len)) / 2))
    '* LCS is symmetric.
    '*
    '* str1 and str2 are arrays.
    '***************************************************************
    
    Dim i As Integer, j As Integer, m As Integer, n As Integer
    Dim c() As Integer, b() As Integer, X$(), Y$()
    Dim Str1Len As Integer, Str2Len As Integer, SmStr(), LgStr()
    
       Str1Len = UBound(str1)
       Str2Len = UBound(str2)
    
       n = Minimum(Str1Len, Str2Len)
       m = Maximum(Str1Len, Str2Len)
    
       ReDim X(m)
       ReDim Y(m)
       ReDim c(m, m)
       ReDim b(m, m)
    
          If Str1Len > Str2Len Then
             For i = 0 To Str1Len - 1
                X(i) = str1(i)
             Next i
             For i = 0 To Str2Len - 1
                Y(i) = str2(i)
             Next i
          Else
             For i = 0 To Str2Len - 1
                X(i) = str2(i)
             Next i
             For i = 0 To Str1Len - 1
                Y(i) = str1(i)
             Next i
          End If ' Str1Len >Str2Len
    
          For i = 1 To m
             For j = 1 To n
                If X(i - 1) = Y(j - 1) Then
                   c(i, j) = c(i - 1, j - 1) + 1
                   b(i, j) = 1 ' /* from north west */
                ElseIf c(i - 1, j) >= c(i, j - 1) Then
                   c(i, j) = c(i - 1, j)
                   b(i, j) = 2 ' /* from north */
                Else
                   c(i, j) = c(i, j - 1)
                   b(i, j) = 3 ' /* from west */
                End If
             Next j
          Next i
    
       ' return c[m][n];
    
       If c(m, n) > 0 Then
          LCS = CSng(Format((c(m, n) / (((Str1Len) + _
                (Str2Len)) / 2)), "#.##"))
       Else
          LCS = 0
       End If
    
       Erase X
       Erase Y
       Erase c
       Erase b
    
    End Function
    

    Double Metaphone

    This is an algorithm to code English words (and foreign words often heard in the United States) phonetically by reducing them to 12 consonant sounds. This reduces matching problems from wrong spelling.

    The Double Metaphone algorithm, developed by Lawrence Phillips and published in the June 2000 issue of C/C++ Users Journal, is part of a class of algorithms known as "phonetic matching" or "phonetic encoding" algorithms.

    He wrote it as a replacement for SOUNDEX. These algorithms attempt to detect phonetic ("sounds-like") relationships between words. For example, a phonetic matching algorithm should detect a strong phonetic relationship between "Nelson" and "Nilsen," and no phonetic relationship between "Adam" and "Nelson."

    Double Metaphone is designed primarily to encode American English names (though it also encodes most English words well) while taking into account the fact that such words can have more than one acceptable pronunciation.

    Double Metaphone can compute a primary and a secondary encoding for a given word or name to indicate both the most likely pronunciation as well as an optional alternative pronunciation (hence the "double" in the name).

    "United States" would generate this primary phonetic code: ANTTSTTS
    "United Statess" would generate this primary phonetic code: ANTTSTTSS

    The final step in Double Metaphone is to translate the two primary codes into match values. That must be done through one of the other algorithms. I will not show the code here because the main code and supporting code is very extensive. The code is all there in VBAlgorithms.txt, included in the attachment for this article.

    Algorithmically speaking, matching is divine and 99% is sweet! What an adventure!

    About the Author

    MS Access programmer.

    Downloads

  • Fuzzy1.zip
  • Fuzzy2.zip

  • IT Offers


    Top Authors