CEditDist Abstract Template Class for Edit Distance Calculation on Generic Data Types

Environment: Any modern C++ compiler

Purpose

The CEditDist class performs edit distance calculations
on abstract data types.  The edit distance is defined as
the minimum cost required to convert one string into another,
where the conversion can include changing one character to
another, deleting characters and inserting characters, with
user-defined costs for each basic operation.  The algorithm
is O(nm) where n and m are the
lengths of the two strings compared (this is the fastest known
algorithm).  Edit distance calculations are useful for
finding the degree of similarity between strings, e.g. in
non-exact database queries.

The package is implemented as an abstract template base class. 
The programmer derives from this base class in order to define
the data types used and the cost functions.  The package
includes complete documentation and an example of a simple
implementation.

Example

Overview:

This example will create a class that can be
used to calculate the edit distance between two arrays of integers,
according to the following cost function: Deleting or inserting an
integer will have a cost of 5; Changing from one integer value to
a different value will have a cost of 3.

Step 1: Instantiate the Base Class.

The instantiation must be explicitly performed because of the special
treatment of abstract template classes in C++.  In our case, we
want a base class of integers, so we will use the following statement:

template class CEditDist<int>;

Step 2: Derive from CEditDist.

We must now create a derivation from the base class.  This derivation
will define the cost functions, which are virtual functions called from the
base class during the edit distance calculation.

class CIntEditClass : public CEditDist<int>
{
public:
   int DeleteCost(const int& deleted, int x, int y) { return 5; };
   int InsertCost(const int& inserted,int x, int y) { return 5; };
   int ChangeCost(const int& from, const int& to, int x, int y)
         { return (from==to ? 0 : 3); };
};

Step 3: Using the Class.

To use the class, construct an object from the base class and call its
EditDistance function, like so:

CIntEditDist ed;
int a[7] = {1,1,2,3,1,2,2};
int b[6] = {1,2,2,3,1,1};
int c = ed.EditDistance(a,7,b,6);

In the above example, the edit distance stored in variable c should
be 11.  This is because the least expensive edit replaces the second
character in a from 1 to 2, replaces the sixth character from 2 to 1,
and deletes the last character in a, for a total of two changes and one
deletion, with a cost of 3+3+5=11.

Downloads

Download source – 2 Kb

EditDist
classes and documentation (external link) – 27Kb

More by Author

Must Read