Comparing Large Bodies of Text with Hash Codes

Welcome to this week's installment of .NET Tips & Techniques! Each week, award-winning Architect and Lead Programmer Tom Archer demonstrates how to perform a practical .NET programming task.

While most people think of hash codes in relation to security, hash codes actually are a very fast means of comparing large text values. Using the standard Windows CryptoAPI can be very cumbersome, but the various classes defined in the .NET Cryptography namespace make using hash codes—and other cryptographic functions—easier and more accessible than ever. In this article, I illustrate just how easy it is to compare two text values in a .NET application using hash codes.

Creating a hash code for a body of text is as simple as deciding which hashing algorithm you wish to use (for example, MD5, SHA1, and so forth), instantiating the appropriate .NET service provider object, and then calling that object's ComputeHash method. (All hash algorithm classes ultimately derive from the HashAlgorithm class and inherit its ComputeHash method, which is usually overridden.) Other than that, there's just the typical conversion between Byte (or Char) arrays to String objects, and you're done.

Figure 1 contains a screen capture of the demo application included with this article.

Figure 1: Simple C++ Managed Extensions example illustrating the comparison of two text (string) values using hash codes

The application uses the MD5 hash code algorithm to compare two input strings. The two fields below the two input fields are the actual hash codes. Below you'll find the code used to generate those hash codes and compare the results.

The code first uses the Encoding::ASCII::GetBytes method to convert from the String values returned from the input controls to Byte arrays. A MD5CryptoServiceProvider object is then instantiated and its ComputeHash method is called for each Byte array, resulting in a second Byte array containing the hash code for the text value. The hash values are converted to String values and displayed on the demo dialog and compared for equality where the results of the comparison are shown in a message box. That's it—just a few lines of code to compare two text values of virtually any length!

using namespace System::Security::Cryptography;
using namespace System::Text;

...

private: System::Void btnCompare_Click(System::Object *  sender,
                                       System::EventArgs *  e)
{
  try
  {
    // Convert the text values into Byte arrays
       Byteba1[]=
    Encoding::ASCII->GetBytes(txt1->Text); Byte
              ba2[]=Encoding::ASCII->GetBytes(txt2->Text);

    MD5CryptoServiceProvider* md5csp = new MD5CryptoServiceProvider();

    // Get the hash values for each text value using ComputeHash
    Byte baHashCode1[] = md5csp->ComputeHash(ba1);
    Byte baHashCode2[] = md5csp->ComputeHash(ba2);
    
    // Convert the two hash code arrays into strings for display
    // and comparison
    ASCIIEncoding* encoding = new
    ASCIIEncoding();txtHash1->Text =
    BitConverter::ToString(baHashCode1);txtHash2->Text =
                  BitConverter::ToString(baHashCode2);

    // Display the results of the comparisons of the two hash codes
    MessageBox::Show(
      String::Format(S"The two values are {0}",
                     (0 == String::Compare(txtHash1->Text,
                                           txtHash2->Text)
                       ? S"the same" : S"different")));
  }
  catch(Exception* e)
  {
    MessageBox::Show(e->Message);
  }
}


About the Author

Tom Archer - MSFT

I am a Program Manager and Content Strategist for the Microsoft MSDN Online team managing the Windows Vista and Visual C++ developer centers. Before being employed at Microsoft, I was awarded MVP status for the Visual C++ product. A 20+ year veteran of programming with various languages - C++, C, Assembler, RPG III/400, PL/I, etc. - I've also written many technical books (Inside C#, Extending MFC Applications with the .NET Framework, Visual C++.NET Bible, etc.) and 100+ online articles.

Downloads