tech: rsync to copy file with -avP no -c

OK, I’ve made many mistakes here with rsync where I picked the wrong flag. For a long time, I’ve wanted to use the -c or checksum flag and mistakenly thought this meant that rsync would checksum the files after the copy. This isn’t the case, what happens is that rsync will ignore the date, time, and file length and just use a checksum to determine if a file should be copied.

It turns out that on copy, rsync *always* checks that the file copy is good, so you only need -c when your date and time stamps are not reliable.

One implication is that it has to read the entire set of files to be copied first and run a checksum on them before passing this along. With a 14TB dataset on a DroboPro using USB 2.0, this can take forever. Specifically, it was stuck on “sending incremental file list” for two days.

Instead you should use:

rsync -avP "/Volumes/src/" "/Volumes/dst"

There is another syntax weirdness where if you use the following slash, it means the equivalent of rsync /Volumes/src/* /Volumes/dst that is it doesn’t create a new folder called dst and just sticks all the files in that directory at the top of dst. So beware of that.

DroboPro on USB 2.0 to ThunderBay 8 at 32MBps single thread best

With this and the use of the -P, you can see the actual performance of the system and it is much higher than doing finder copies. Specifically, when running just a rsync we get:

  1. Rsync alone, that is nothing is on the system at all, so no other thing, we get 32MBps (not 3MBps, so 10x faster than a Finder copy). Compared with the maximum bandwidth of 480Mbps, this is actually not that far off being an effective 224Mbps and when looking at the Cisco expected speeds of USB 2.0 controllers, it is well in the range of 30MBps.
  2. When running two rsyncs, one from a Thunderbolt 2 drive, we should see about the same speed since the Thunderbolt 2 is running at 20Gbps (2GBps available) and Thunderbolt 3 is running at 40GBps, but here the maximum speed of the hard drives matters, so with RAID10 running alone, we see performance at 280-400MBps write and 200MBps read, so you can see we are not saturating the drives at this point for large 5GB files, but it will be much less for smaller files as you need to seek to place them. A simple example moving to a 1GB file, the performance falls to 230-320MBps write and 180MBps read but will be much worse for small files.
  3. The actual speed we get with two rsync jobs and a BackBlaze, running is about the same 32MBps which shows that it is the USB 2.0 system that is the limit. Interestingly, while doing these jobs and testing the Disk Speed Test, it doesn’t change much, it is still 463Mbps/260MBps so there is plenty left in the Thunderbolt 4 drive performance, it is far from saturation
  4. So the next text is to run other things and start a Synology Drive Client job which streams three files from the NAS over the Ethernet, you see a significant falloff in performance to 20MBps, I speculate this is because we are now writing four huge files at once, so there is going to be lots of seeks and fragmentation of the file system. There seems to be no way to make the Synology Drive Client do just one file at a time, so if you want the best performance, you should just let a single source lay down the big files so the disk layout has a minimal amount of fragmentation

I’m Rich & Co.

Welcome to Tongfamily, our cozy corner of the internet dedicated to all things technology and interesting. Here, we invite you to join us on a journey of tips, tricks, and traps. Let’s get geeky!

Let’s connect