Error Detection Based on Check Digit Schemes

Introduction

A Check Digits is an alphanumeric character added to a number to detect human errors. Check Digits are a simple and easy way to neutralize the human element of keying data. Check Digits are not Error Correcting Codes. Should the reader want to investigate robust Error Correcting Codes, he or she is referred to Error-Correcting Codes by Peterson and Weldon.

It is also notable that CRC codes are not a check digit scheme because more than one alphanumeric digit is generated to detect errors.

If the reader desires a system for very long datasets or binary data sets, consider using a CRC, Alder checksums, or a hash-based checksum.

According to Richard Hamming, the two most common errors are:

  • 12 becomes 21 (adjacent characters are transposed)
  • 112 becomes 122 (incorrect doubling of triples)

Jacobus Verhoeff presents the follow statistics in Error Detecting Decimal Codes:

Error Approximate Percentage Comment
a → b 60% - 90% Single Error
ab → aab 10% - 20% Adding a Digit
aab → ab 10% - 20% Omitting a Digit
ab → ba 10% - 20% Transposition Error
aa → bb 0.5% - 1.5% Twin Errors
acb → bca > 1% Jump Twin Error
13 → 30 0.5% - 1.5% Phonetic Error (similar pronunciations)

This article discusses the following Check Digit Schemes:

  • Mod 9 Scheme
  • Mod 7 Scheme
  • Mod 11 Scheme
  • 3-Weight Method
  • IBM Scheme
  • UPC Scheme
  • Verhoeff Algorithm
  • ISO 7064 Mod N/N+1

Error Detection Based on Check Digit Schemes

Basic Sample

Although trivial, checkdigit1 presents the foundation for most of the articles. It has two purposes:

  • Demonstrate UNICODE portability with the C++ Standard Library
  • Demonstrate simple Number/String routines to allow one to work with digits as a unit rather than numbers as a whole

First, in Microsoft tradition, generic defines are created for use with wide and narrow versions of the C++ library.

#ifdef UNICODE
#  define _tcout std::wcout
#  define _tcerr std::wcerr
#  define _tcin  std::wcin
#else
#  define _tcout std::cout
#  define _tcerr std::cerr
#  define _tcin  std::cin
#endif

...

#ifdef UNICODE
#  define _tstring std::wstring
#else
#  define _tstring std::string
#endif

Next, the program enlists two helper functions to convert a number to a string, and vice versa. A sample of running checkdigit1 (which does not check anything) is shown below.

When using the wide version of the program, the user should define both UNICODE and _UNICODE. UNICODE invokes Microsoft's wide implementation; _UNICODE invokes the C++ Standard wide version. Otherwise, the user will encounter char* and wchar_t* issues.

int main(int argc, char* argv[])
{
   int n = 10101;

   _tcout << _T("s: ") << NumberToString( n ) << endl;
   _tcout << _T("n: ") << StringToNumber( _T("10101") ) << endl;

   return 0;
}

Should the user mismatch UNICODE and _UNICODE with cout and wcout, the most likely output will be a memory address rather than a string.

Error Detection Based on Check Digit Schemes

Mod 9 Scheme

The mod 9 system is used by the United States Postal Service on money orders. If a money order number is 123456789 and c = 0, the number imprinted on the money order is 1234567890. The system is c = n % 9. This system is also known as "casting out nines". Casting Out Nines suffers from the fact that 0 → 9 and 9 → 0 are not detected; and no transposition errors are detected (unless it involves the check digit).

Mod 7 Scheme

A more robust system than Casting Out Nines is a mod 7 system. Bressoud and Wagon prove the following in Computational Number Theory:

  • Single Error Detection Rate is 93.81%
  • Transposition Error Detection Rate is 93.87%

Similar to the Mod 9 Scheme, c = n % 7. However, only three transposition errors are undetectable: 70 ↔ 07, 81 ↔ 18, and 92 ↔ 29.

Mod 11 Scheme

The mod 11 scheme is a weighted scheme also known as International Standard Book Number (ISBN). The first nine digits represent the unique identifier; the 10th digit is the check digit. Because the scheme is mod 11, the check digit can be 0-9 or X.

This mod 11-based system operates on a dot product (in other words, the individual digits of the number n). n · (10, 9, ..., 2, 1) mod 10. Stated another way, let ai be the ith digit of n (most significant to least significant). Then

c = 10a1 + 9a2 + 8a3 + 7a4 + 6a5 + 5a6 + 4a7 + 3a8 + 2a9 mod 11

The 10th (a10) digit would be -c, so the following equation holds true.

0 = 10a1 + 9a2 + 8a3 + 7a4 + 6a5 + 5a6 + 4a7 + 3a8 + 2a9 + a10 mod 11

This scheme detects all single errors and the transposition of any two digits at any distance (assuming the overall number is 10 or fewer digits long).

// Check Digit Sample 3

_tstring isbn = _T("073560753");    // c should be 2
_tcout << _T("ISBN: ") << isbn << endl;

int i = 10, c = 0;
for( _tstring::const_iterator it  = isbn.begin();
                              it != isbn.end(); i--, it++ )
{
   if( _T('X') == *it )
      { c += 10 * i; }
   else
      { c += CharToNumber( *it ) * i; }

   c %= 11;
}

// a_10 = -c
c = 11 - c;

if( 10 == c )
   { _tcout << _T("Check digit is: ") << _T("X") << endl; }
else
   { _tcout << _T("Check digit is: ") << c << endl; }

Error Detection Based on Check Digit Schemes

3-Weight Method

The 3-Weight Method is also known as the Electronic Funds Transfer Routing Number Scheme used by the banking industry. It is a mod 10-based system, but rather than operating on n as a whole number, it operates on a dot product (in other words, the individual digits of the number n).

So, given an 8 digit routing number, the check digit c = n · (7, 3, 9, 7, 3, 9, 7, 3), mod 10. Stated another way, let ai be the ith digit of n (most significant to least significant). Then

c = 7a1 + 3a2 + 9a3 + 7a4 + 3a5 + 9a6 + 7a7 + 3a8 mod 10

c is then concatenated to the 8 digit route, forming a 9 digit routing number.

This scheme is based on the fact that multiplication modulo 10 yields a permutation of the 10 decimal digits if the multiplication factor is a digit of the set { 1, 3, 7, 9 }, but only a subset of the decimal digits if the factor is 5 or an even digit. This system cannot detect adjacent transpositions of digits that differ by 5.

The 3-Weight method does not use simple modular reductions, so c = 10 - c is not needed to hold the equality. This is basically what the paragraph above states.

The reader should find the following equation holds true (where c is now a9):

0 = 7a1 + 3a2 + 9a3 + 7a4 + 3a5 + 9a6 + 7a7 + 3a8 + a9 mod 10

The 3-Weight Scheme is presented below. As the reader can see, it generalizes to a arbitrary length of digits easily. However, if the reader desires a system for very long datasets or binary data sets, consider using a CRC or Alder checksums or a hash-based checksum.

// Check Digit Sample 2

_tstring number = "12345678";

int i = 0, c = 0;
for( _tstring::const_iterator it  = number.begin();
                              it != number.end(); i++, it++ )
{
   switch( i % 3 + 1 )
   {
      case 1:
         c += 7 * CharToNumber( *it );
         break;
      case 2:
         c += 3 * CharToNumber( *it );
         break;
      case 3:
         c += 9 * CharToNumber( *it );
         break;
      default:
         assert( false );
         break;
   }

   c %= 10;
}

Error Detection Based on Check Digit Schemes

IBM Check Digit Scheme

The IBM Check Digit Scheme is as close to a perfect system as can be obtained when working over the Field of Integers (Z). This scheme is used by companies such as MasterCard and Visa, and Canadian Social Insurance Numbers. The system is also known as Luhn's Algorithm. The scheme detects all Single Errors, and all Transposition Errors except 0 ↔ 9.

The check digit c is defined as the dot product:

-(a1, a2, a3, a4, ..., an-1, an) · (2, 1, 2, 1, ..., 1, 2) - r (mod 10)

r is the number of odd indexed digits greater than 5. This means r = r + 1 if and only if ai > 5, where i is odd.

In practice, the algorithm is implemented quite differently.

// Check Digit Sample 4

int i = 1, c = 0;
for( _tstring::reverse_iterator it = ccnumber.rbegin();
                                it != ccnumber.rend(); it++, i++)
{
   if( 0 == i % 2 )
   {
      // Add even indexes
      c += CharToNumber( *it );
   }
   else
   {
      // Cast Out Nines on the odd indexed digits after
      //   multiplying by 2
      int t = 2 * CharToNumber( *it );
      if( t > 9 ) t -= 9;

      c += t;
   }
}

UPC Scheme

The Universal Product Code system is similar to IBM's system, except that a weight of 3 is given for the even indexed digits. The implementation is left as an exercise to the reader.

Error Detection Based on Check Digit Schemes

Verhoeff Algorithm

Unlike previously examined methods (which operate over Z), Verhoeff's Algorithm operates over the dihedral group D5.

The algorithm is in use by companies such as ESB Networks of the Republic of Ireland.

The algorithm is perfect in that it detects all Single Errors and all Transposition Errors. Additionally, it detects most (but not all) Twin Errors, Jump Twin Errors, Jump Transpositions Errors (abc → cba), and Phonetic Errors.

A Group is a set of objects and an associated operation. Groups are used every day. For example, counting people in a room. The Group is over the Integers, and the operation is addition, denoted +. There are other operations that can be performed, such a as multiplication, denoted x.

Under addition, there is an identity element: 0. This provides the expected result: 1 + 0 = 1, 2 + 3 = 5, and so forth. Under multiplication, the identity element is 1. The same is true with multiplication: 1 x 1 = 1, 5 x 1 = 5, and so on.

Just as one can operate over the Integers, operations can be performed over D5. The operations will be defined as follows. Notice that the dihedral has been partitioned into 10. This is not coincidence.

The first operation that will be defined is ρ. This is the Greek letter rho, and it denotes rotate in D5. ρ0 is the identity element under rotate because it leaves the element unchanged. ρ1 rotates the dihedral 72°. ρ2 rotates the dihedral 144°. And finally, ρ5 = ρ0.

The next operation that will be defined is σ. This is the Greek letter sigma. It follows rho in the alphabet. It will be used to denote the operation reflection. σ0 leaves the element unchanged, whereas σ0 'flips' the element over its horizontal axis. So, σ2 = σ0. The perceptive reader should realize that ρ5 = σ2.

To visualize the observation, number the partitions of the dihedral as follows.

Now, perform operations on the element. For example, ρ7σ5. ρ7 = ρ2, so:

And next, σ5 = σ1:

The final visual representation is pictured below. Note that this was presented to the user as a visual demonstration. It is not meant to visually demonstrate Verhoeff 's Algorithm. Green denotes the new position of partition 2.

Error Detection Based on Check Digit Schemes

Verhoeff Algorithm (continued)

Verhorff's brilliant observation was as follows: One can use the 10 symmetries to encode the 10 digits by using the identity for 0; the rotations (in order) for 1, 2, 3, 4; and the reflections for 5, 6, 7, 8, 9.

The final step in preparation for the implementation is defining the permutation of digits.

Let δ (the Greek letter delta) be the permutation of digits as follows:

  • δ performs no transformation on 0
  • δ switches 1 and 4
  • δ switches 2 and 3
  • δ cyclically permutes 5, 8, 6, 9, 7

δ(0) = 0
δ(1) = 4
δ(4) = 1
δ(2) = 3
δ(3) = 2
δ(5) = 8
δ(8) = 6
δ(6) = 9
δ(9) = 7
δ(7) = 5

Unlike Z, D5 is not communative: 8 ° 7 ≠ 7 ° 8.

° 0 1 2 3 4 5 6 7 8 9
0 0 1 2 3 4 5 6 7 8 9
1 1 2 3 4 0 6 7 8 9 5
2 2 3 4 0 1 7 8 9 5 6
3 3 4 0 1 2 8 9 5 6 7
4 4 0 1 2 3 9 5 6 7 8
5 5 9 8 7 6 0 4 3 2 1
6 6 5 9 8 7 1 0 4 3 2
7 7 6 5 9 8 2 1 0 4 3
8 8 7 6 5 9 3 2 1 0 4
9 9 8 7 6 5 4 3 2 1 0

The reader should realize that D5 is a complex entity (dysfunctional) whose elements are somewhat arbitrarily mapped to the integers from 0 to 9, and the group operation is not at all like addition or multiplication. Also, there is no concept of order for D5: < and > have no meaning. The red shading of the table denotes the loss of the Communative Property when multiplying elements.

A small digression: should one desire to perform 6 ° 1, first locate the 6th row, and then the 1st column. 6 ° 1 = 5.

Error Detection Based on Check Digit Schemes

Verhoeff Algorithm (continued)

c = δn(a1) ° δn-1(a2) ° δn-2(a2) ° ... ° δ2(an-1) ° δ(an)

So, to encode 1793, the reader would:

c = δ4(1) ° δ3(7) ° δ2(9) ° δ1(3)

δ4(1) = δ3(4) = δ2(1) = δ1(4) = 1
δ3(7) = δ2(5) = δ1(8) = 6
δ2(9) = δ1(7) = 5
δ1(3) = 2

1 ° 6 ° 5 ° 2

1 ° (6 ° (5 ° 2)) = 4

As with other examples, the inverse of 4 is required so that the check equation holds true. Therefore, the check digit equals 1 (1 ° 4 = 0).

Below is the result of running checkdigit5 against 1000372996. Due to Multiplicative, Permutation, and Inverse Tables, the code is rather easy. Note that c is no longer a sum from a previous stage; it is used as a lookup, and then effectively discarded.

// Check Digit 5

int i = 0, c = 0;
for( _tstring::reverse_iterator it  = number.rbegin();
                                it != number.rend(); it++, i++)
{
    // c = M[ c ][ P[ i % 8 ][ CharToNumber( *it ) ]];
    unsigned int n = CharToNumber( *it );
    unsigned int p = P[ i % 8 ][ n ];
                 c = M[ c ][ p ];

}

c = Inv[ c ];

Error Detection Based on Check Digit Schemes

ISO 7064 Mod N/N+1

This section presents an overview of a family of check digits. Implemetations are presented in the Downloads.

The ISO specifications are designed for the following alphabets:

  • numeric (10 digits: 0 to 9)
  • alphabetic (26 letters: A to Z)
  • alphanumeric (letters and digits)

The ISO specifications are designed to detect the following (taking from the ISO website):

  • All single substitution errors (the substitution of a single character for another; for example, 4234 for 1234)
  • All or nearly all single (local) transposition errors (the transposition of two single characters, either adjacent or with one character between them; for example, 12354 or 12543 for 12345)
  • All or nearly all shift errors (shifts of the whole string to the left or right)
  • A high proportion of double substitution errors (two separate single substitution errors in the same string; for example, 7234587 for 1234567)
  • A high proportion of all other errors
  Comment
Mod 10/11 European Blood Banks,
Unique Identifier for People (U.S. NIH)
Mod 16/17 European Bank Routing
Mod 36/37 U.S. Livestock Identification

Summary

Although a perfect check digit system was presented, academia has provided much more. In a remarkable 1978 paper, A. S. Sethi, V. Rajaraman, and P. S. Kenjale described a check system that can be used to correct any Single Error or Transposition Error. It is noteworthy that the system requires two check digits rather than one, and is only effective for certain bases.

The reader is encouraged to visit Error Detecting Decimal Digits (included with this article) for Wagner and Putter's experience in developing a check digit system for a commercial mail order house.

Revisions

  • 11.26.2006 Initial Release


About the Author

Jeffrey Walton

In the past, I have worked as an IT consultant for County Government (Anne Arundel County), the Nuclear Energy Institute, the Treasury Department, and Social Security Administration as a Network Engineer and System Administrator. Primary Administration experience includes Microsoft Windows and Novell Netware, with additional exposure and familiarity with Mac and Linux OSes. Previous to the US government, I was a programmer for a small business using Microsoft Visual Languages (Basic 5.0, 6.0, and C++ 5.0, 6.0) and Scripting Languages. An undergraduate degree (BS in Computer Science) was obtained from University of Maryland, Baltimore County. Graduate work includes a Masters of Science (Computer Science) from Johns Hopkins University (expected before 2009). Training and Certifications include Microsoft, Checkpoint, and Cisco.

Downloads

Comments

  • There are no comments yet. Be the first to comment!

Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

  • Live Event Date: December 11, 2014 @ 1:00 p.m. ET / 10:00 a.m. PT Market pressures to move more quickly and develop innovative applications are forcing organizations to rethink how they develop and release applications. The combination of public clouds and physical back-end infrastructures are a means to get applications out faster. However, these hybrid solutions complicate DevOps adoption, with application delivery pipelines that span across complex hybrid cloud and non-cloud environments. Check out this …

  • Due to internal controls and regulations, the amount of long term archival data is increasing every year. Since magnetic tape does not need to be periodically operated or connected to a power source, there will be no data loss because of performance degradation due to the drive actuator. Read this white paper to learn about a series of tests that determined magnetic tape is a reliable long-term storage solution for up to 30 years.

Most Popular Programming Stories

More for Developers

RSS Feeds