Click to See Complete Forum and Search --> : Duplicate Data Finder
TheCPUWizard
January 23rd, 2008, 04:21 PM
I have about 4TB of data. Much of it duplicate. This typically happens from copying a directory for backup purposes, etc. Unfortunately none of the tree structures are the same. Plus there will often be different files (eg "clean" a project in one directory, have a full build in another).
I am looking for a program that can scan my system(s) and point out "definate" and "probable" duplicates.
After much searching on the Net, I have found nothing....so I figured I would ask here......
PeejAvery
January 23rd, 2008, 05:49 PM
Writing one shouldn't be too hard, if there is no acceptable solution found.
Here are some things that I have dug up. Hope they help!
http://noclone.net/Duplicate_File_Finder.asp
http://www.raymond.cc/blog/archives/2007/12/17/find-and-remove-duplicate-files-to-free-up-hard-disk-space/
http://www.tucows.com/preview/373411#MoreInfo
TheCPUWizard
January 24th, 2008, 01:06 AM
Thanks, I will take a look at those.
But writing one IS difficult. I spent about 3 weeks writing one, and although it found the obvious cases, it missed tons!!
Especially difficult are conditions like:
C:\A\B\C has a zip file with 90 pictures in it, all bmp
D:\T\Y\J has 87 jpeg files (which are converted versions of the raw data in the zip.
The second set of files are redundant...
PeejAvery
January 24th, 2008, 07:52 AM
That is why you would use CRC and MD5 to check the files. As well, you can check their byte usage. Either way, no matter what you use or purchase, there is always going to have to be some user initiated cleanup.
TheCPUWizard
January 24th, 2008, 11:24 AM
That is why you would use CRC and MD5 to check the files. As well, you can check their byte usage. Either way, no matter what you use or purchase, there is always going to have to be some user initiated cleanup.
CRC/MD5 or anything similar would not help compare a BMP to a JPG, for a WAV to and MP3.
About 3TB of the data I am trying to "scrub" is media. "Stills" and Audio are bad enough, but things get really insane when you start dealin with Video :eek: :eek:
I have been looking at some of the "professional" grade tools used by the recording industry and they look VERY promising, but they are NOT cheap (some go for nearly $10K). I am now looking for one of my old contacts when I did development in that field who will let me bring the server into their shop and run the tools...
PeejAvery
January 24th, 2008, 12:49 PM
CRC/MD5 or anything similar would not help compare a BMP to a JPG, for a WAV to and MP3.
That is why I mentioned byte checking.
AllanBraun
May 9th, 2008, 08:30 AM
You should try Directory Report
http://www.file-utilities.com
It can find duplicate files based on same:
name, size, CRC-32 and/or comparing byte-by-byte
It can also generate reports to a
text, csv, xml and html file
It has many other features to help you in cleaning up your disk such as:
Finding your largest files
Finding your largest directories
comparing the contents of two directories
Bonfiglio111
April 28th, 2010, 08:22 AM
Try Duplicate Finder from ashisoft, it will find and delete all duplicates with accurately, it has many features like Preview, and more options to choose, i use this <a href=http://www.ashisoft.com>Duplicate Finder file</a> to remove unwanted files in my pc.
codeguru.com
Copyright Internet.com Inc., All Rights Reserved.