|Source/Download||git clone https://github.com/gorski/broken-file-matcher.git|
|Version||1.0.0-SNAPSHOT (alpha state) 2016|
|License||GNU GPL v3|
Java-based app created to recover huge amounts of Photos from a broken hard drive. It’s really 4 hours coding project, pretty unfinished, but maybe you’ll find the idea interesting.
Let’s assume we have a huge photo library (almost 100Gb of photos), sorted and aligned by names, years, tags, etc. Collections gets corrupted on disk (hardware failure). Recovery software manages to restore most of the pictures, but 95% of them are broken after restore (missing parts, artifacts, unreadable, etc.). That’s our directory A (recovered photo library structure, but with broken photos).
Client has the backup of the data (and that’s our directory B), at least most of it (over 90Gb of 100Gb) because backup was made 3-4 months before failure. However, photos are not organized into folders. They’re in totally different structure, but all of the photos are fine.
The Goal and The Problem
The goal is, to create directory C, with mere of A and B. The merge has to keep:
- the structure of A
- only correct JPEG photos from B
The problem is, that files can’t be matched by names (names got changes as well), checksums, thumbnails, etc.
Both directories are getting parsed and some special hash is created.
For hash calculation, some particular bits are being taken. They’re not translated in any way, so little change in file (when file gets broken) results in little change in the file.
All the hashes from A are compared together, and when they match above given treshold file from directory B is taken as match and copied over to directory C.