Back two years ago, I was trying to use hashdeep to make sure that my files particularly JPEGs were not bit rotting. I've had dozens of JPEGs die in the past because of bit rot. I had a bad RAID controller 10 years ago that killed a bunch of JPEGs, so I've been obsessed with correct copies. I tried using hash deep to validate at the user level that JPEGs were not corrupted, but this would basically run for a while on the Mac and then crash because the network connection to the file server was not stable.

So I abandoned this effort and now I'm just trying to get everything copied properly. I have a bunch of different files on different servers now, so hopefully this will be less of the problem. Particularly since I'm keep decades worth of snapshots now. So even if there is bit corruption in a block, hopefully there is an old block somewhere else that keeps it. I should probably just take complete snapshots and stuff them into AWS Glacier at some point as more insurance but for now, hopefully the storage stays stable. I do have btrfs bit rot checking as well and am running RAID10 drives so that will help a little bit too.

Using Rsync Dryrun to the do the same thing

But in the course of doing all this, I needed to remember how to verify whether two directory trees are the same, so into the hell that is rsync once again. The main flags that are needed are -c which means don't just check the date and time of modification but actually go through each file and do a checksum to see if they are identical. This prevents missing files that are corrupted on the target drive, but doesn't defend against corruption in the source.

The second trick is a subtle one there is a difference when you put a trailing slash in rsync. If you put it in, it assumes that all the files are at the same levels so there is a difference between them and you will almost always want the following slash with the source argument (the first one):

# This command will rsync into ./Backup/Personal so it looks for a child
rsync -vnarcP ./Personal ./Backup
# This command will rsync directly into ./Backup/Personal-2022-03-04
rsync -vnarcP ./Personal/ ./Backup/Personal-2022-03-04

If you squint sideways you can see why this makes sense, basically, the first command is more convenient because it assumes you are copying into something of the same name but in the second, you can change the name of the directory at the same time, but it is very confusing!

Also in terms of the many flags, the most important ones are:

• -n this means a dry run you pretty much always want to try it, then give a list of files that it would change but not make any changes. This is really useful to check if two directories are actually the same without actually doing the copy, so it acts like a poor man's hashdeep but is more reliable