OK even with the best hard drives, you will get failures, right now we are running our main disk array with btrfs against shr2. Last night one of the 4TB disks failed and it immediately sent an email which is pretty cool.
Here is how you remove the old drive and fix it. It’s a little complicated because the Synology NAS has gotten pretty complicated, but basically, the layers of abstractions are: a) disks, b) storage pool, c) volumes. The storage pool allows you to add and subtract disks and change things while the actual operating system volumes live above it
In this world, the process is to login the synology user interface on port 5001. Then you can look at the Storage Manager
and see which disk has failed in the Storage Pool. These are hot swap systems, so you can then, flip the switch and remove the failing drive. Flip it and then let it spin down just in case you can recover data.
Now unscrew the four bottom Philips screws and put in a new drive. You can use the Synology site to figure out how big a hard drive you can support. Right now the 2413+ is newish and works with 12TB drives. But with the high bit error rates of these large drives, you really want to stay as small as you can. With SHR2, you need to have a drive at least as big as the one you are replacing.
You can now look at the disk list and see that there is a new disk that is not initialized
. It is a little confusing, but you don’t actually initialize it. What you do is to go to the Storage Pool list and click on the broken pool and click on Repair
This will show you the new drive and you can add it. With SHR2, you then have to wait quite a long time for the parity to be recalculated. Right now after 2 hours, the drive array is at 11%.
As an aside, with these very large disks, SHR2 is really not a great format, instead, you probably really want these drives in RAID10. This means that you mirror every pair of disks and then stripe the data across them. Theoretically this is less reliable than SHR2 because if you lose both members of a mirror set, you are toast. But, in practice, it is much faster to recover and you are not rereading the entire array when you rebuild, so you won’t get the dreaded, one drive fails and then on rebuild, another one has an error and you lose the entire array.