I think it was irresponsible of Brian Friesen to write that equal checksums implies 99% probability of equal files. I assume that Brian just pulled that number out of the air. If you don't know a probability, Brian, then don't guess one! I agree with commenter msg555, that the probabilities are FAR better than Brian said. The probability would be 1-2^32, for a 32 bit checksum, if we assume all byte values are equally probable. Brian should fix his article!
Replyhttp://www.cryptopp.com/ Has CRC32 and tons of other crypto and hash codes over there. Open source.
ReplyThank you! Your article, performance chart, sample code help me a lot! Thank you very much!
ReplyThere is a 99.999999977% (assuming a good CRC algorithm) chance that two pieces of data with the same checksum are identicle. I think that is good enough odds to assume the data is indeed the same. If you use a 64 bit CRC there is a 99.99999999999999999458% chance.
ReplyThe project is useful, but I found that the algorithm, at least in Crc32Dynamic, is not PKZip compatible. I had to fix the function CalcCrc32 to the following: [code] { // dwCrc32 = ((dwCrc32) >> 8) ^ m_pdwCrc32Table[(byte) ^ ((dwCrc32) & 0x000000FF)]; dwCrc32 = m_pdwCrc32Table[(byte ^ dwCrc32) & 0x000000FF] ^ (dwCrc32 >> 8); } [/code] Old code contained in the line comment.
I can't explain why you might have gotten differnet results, but I can prove the code is compliant. In a general case (x^y)&z may not be equivalent to x^(y&z), but specifically in my algorithm it works just fine. Let's look closer at each one (I'm going to use '?' to represent bits with data): Mine: (byte) ^ (dwCrc32 & 0x000000FF) 1. 0x000000?? ^ (0x???????? & 0x000000FF) 2. 0x000000?? ^ (0x000000??) 3. 0x000000?? RFC1952: (byte ^ dwCrc32) & 0x000000FF 1. (0x000000?? ^ 0x????????) & 0x000000FF 2. (0x????????) & 0x000000FF 3. 0x000000?? Follow the logic yourself, it evaluates the same.Reply
See the code at the bottom of http://www.faqs.org/rfcs/rfc1952.html. That's where I got my change from. It produces different results. Notice how inside the bracket, it xor's before it and's. BTW, I don't mean my correction as a slight. Your project was useful to me, so thanks are in order. However, I still hold it isn't 100% compliant as posted.
ReplyHow come it didn't work for me then? Also, I'm pretty sure the expressions are different. (x ^ y) & z != x ^ (y & z).
ReplyActually my code is RFC 1952 compliant. If you look carefully at the two lines of code you'll see they are semantically the same, just in a different order. They each generate the same result, which means my code is RFC 1952 compliant. I have checked my CRCs against PKZip many times and never found a discrepancy.
ReplyOriginally posted by: Jamie Whitham
Great code! Thanks Brian :-)ReplyI am using this for a small settings file 300 bytes or so.
I have this in memory before saving to disk when I quit.
In my application power could be ditched at any point so I
always write two copies and use CRC to check them on loading.Here is the code I am using. I have changed Brian's code
as with such a small file I thought there's not much need
to create the CRC table etc. However it will be
inefficient on larger files.So you might find this simpler for smaller files....
Cheers
Jamie
// returns calculated CRC32
// pData points to data for which the checksum is needed
// dwLen is the length of that data in bytes
DWORD CheckSum(BYTE* pData,DWORD dwLen)
{
DWORD dwCrc32=0xFFFFFFFF;
for (DWORD ct=0;ct<dwLen;ct++)
{
DWORD i=(pData[ct]) ^ ((dwCrc32) & 0x000000FF);
for(INT j = 8; j > 0; j--)
{
if(i & 1)
i = (i >> 1) ^ 0xEDB88320;
else
i >>= 1;
}
dwCrc32 = ((dwCrc32) >> 8) ^ i;
}
return ~dwCrc32;
}
Originally posted by: wolf bit
I hope that the comments will be helpful. If you find some bugs or you have a time to test the performens please mail me !Reply
Code is free ... use it whenever you need it. For something more mail me!Enjoy !
unsigned short check_sum32bit(void *data_ptr, unsigned int length)
{
__asm {
mov ecx,dword ptr [ebp+0Ch] // get length
mov ebx,dword ptr [ebp+8] // get data_ptrmov edx, ecx
shr ecx, 2 // set loop counter
and edx, 0x3 // set the rest
xor eax, eax // equal mov eax,0 just is better
cmp ecx, 0 // if loop counter = 0 no loop
je noloopcheck_sum32: // all given data loop
add eax, ds:[ebx]
adc eax, 0
add ebx, 4
loop check_sum32 // end loop
noloop:
sub ebx, 4 // correct last loop variable befor exit ebx+4 should be dec again
cmp edx, 0 // check for the rest
je noxdata
add ebx, edx // !!! positioned the offset to the last data (could be a problem if data is less then 4 bytes)
mov ecx, 4 // get valid bytes from the rest data
sub ecx, edx
mov edx, ds:[ebx]
shrloop:
shr edx, 8 // set no valid bytes to zero
loop shrloop
add eax, edx // add the rest data
adc eax, 0 // add carry flagnoxdata:
mov ebx, eax // make 32 bits check sum to 16 bits chesk sum if you need 32 bits just make the function "unsigned int" and delete next lines !
shr ebx, 16
add ax, bx
adc ax, 0
not ax
}
return ; // returned value is in ax register
}
Originally posted by: krishna
I need to implement a CRC check on file to detect whether it has been modified. I compare it with old CRC. As it is 99% guranteed how do I make sure that the file has changed or not. One way could be that I use two CRC's generated by different algorithms. Could that be useful.
ReplyOriginally posted by: Jonathan
Thanks for the useful tidbit of code, perfect for what I need!
ReplyOriginally posted by: Imran
Hi Guys!
I am having a little bit of problem with understanding the two implementations of source code i have.This is because of the fact that the implementations I have seem quite different from the CRC methods explained in a typical Communication & Networking books.
I would like someone to explain these implementations to me ( reason why we we assume initial value to be ffffffff, why we and crc with 0x80000000 etc). Moreover I will appreciate if you could point out the flaws in both algorithms.
Also please let me know how can i verify the out put of CRC algorithm.
Thanks for your help
Imran
Here is the code:
#define CRC_POLYNOMIAL 0xEDB88320
int CRC_Algorithm1(int *addr)
{
int i, j;
long crc = 0xFFFFFFFF;
int carry;
int maddr[6];
/* Put the ethernet address into a local array */
memcpy(maddr, addr, 6);
/* Cycle through each character of the address */
for (i = 0; i < 6; ++i)
{
/* Cycle through each bit of this character */
for (j = 0; j < 8; ++j)
{
/* Update the CRC for this address */
carry = ( (crc & 0x80000000) ? 0x01 : 0x00) ^ (maddr[i] & 0x01);
crc <<= 1;
maddr[i] >>= 1;
if (carry)
{
crc = ((crc ^ CRC_POLYNOMIAL) | carry);
}
}
}
return (crc);
} /* CRC_Algorithm1 */
SECOND ALGORITHM
static int crc32( char * s, int length ) {
/* indices */
int perByte;
int perBit;
/* crc polynomial for Ethernet */
const unsigned long poly = 0xedb88320;
/* crc value - preinitialized to all 1's */
unsigned long crc_value = 0xffffffff;
for ( perByte = 0; perByte < length; perByte ++ ) {
unsigned char c;
c = *(s++);
for ( perBit = 0; perBit < 8; perBit++ ) {
crc_value = (crc_value>>1)^
(((crc_value^c)&0x01)?poly:0);
c >>= 1;
}
}
return crc_value;
}
Reply