Backup the Backup
The storage capacity increase unfortunately is not matched by the reliability increase. In other words: hard drives (and be it mechanical or solid state) still keep alarmingly failing. This failure rate is what makes the redundant systems alive - have a RAID disk array and it will manage your failing drives proactively notifying you about imminent failures and letting you replace the (inexpensive) drives on the fly.
The question comes though - what happens if the RAID storage unit itself fails? I'm now in the middle of recovering from such failure. Trying to be prepared, some time ago I decided to have two RAID units it two separate locations - my city home and the country cabin. Ideally they would synchronize automatically over the Internet. The synchronization never worked though as only this year I finally have the (fiber) Internet links fast enough to make this practical. So from time to time I was bringing one RAID unit to the other location and running a snapshot sync. But getting lazy over time, I was now more than a year behind. Exactly when the primary RAID (the one with the most recent copies of everything) started falling apart. After about an hour after being powered on, it reports one drive (a random one) is dead followed quickly by the same report of another drive. I tried replacing one of the drives with a new one and it still keeps failing. So clearly the drives do not seem to be the issue.
The I checked the power supply - and bingo! - all the capacitors were bulged clearly indicating the capacitor plague got them. But after replacing the power supply the behavior unfortunately is still the same - drives being randomly reported as dead. So my problem is elsewhere.
This brought me to the last resort - use the secondary RAID hardware to restore the primary unit's hard drives, to be able to offload the data to a temporary storage. I'm still not sure this would work, as swapping the drives between the RAID units may not be seamless. There may be some specific configuration stored on the primary (failing) motherboard. So before doing this transplant, I'm in the process of backing up the backup unit. What does not sound much - a mere 5TB of data takes now the second day of being transferred from the RAID NAS server - over the gigabit Ethernet - to an external hard drive connected over a 40GBps Thunderbolt interface to my computer.
In the end - what clearly got me - has been the assumption the electronics would run for years. They definitely were running for more than 10 years, but time files and then they fail. BTW there was time when I had the idea of backing up the data on optical discs. I recorded several of them and not a single one is readable today. And yes, solid state drives fail too. The reason may be different - software bugs - but the bottom line is - back up the backup. This will reduce the probability of losing it all.
Comments
Post a Comment