Hard disk drives keep getting bigger, meaning capacity just keeps getting cheaper. But storage capacity is like money: The more you have, the more you use. And this growth in capacity means that data is at risk from a very old nemesis: Unrecoverable Read Errors (URE).
Let’s get one thing out of the way from the start: The only thing protecting your data from corruption is some simple error checking on the disk drive itself and anything built into the software stack on your array or server. RAID doesn’t do any error checking at all, and neither does NTFS in Windows or HFS+ in Mac OS X. And none of those things can correct a read error if they encounter one. When people talk about disk or integrity checks they’re usually talking about the integrity of the file system or RAID set, not of the actual data itself.
Let this sink in: Your data is not protected. It can be corrupted. And you will never know until you need it.
Yes, I’m trying to scare you.
What Protects Your Data?
For most regular people, your only line of defense against random read and write errors is something called error correction coding (ECC), which is built into your hard disk drive’s controller. ECC is essential because magnetic media often has “bad” bits that aren’t readable, especially as information density increases. So hard disk controllers take care of recoverable read errors all the time.
As implemented in most modern hard disk drives, ECC works pretty well, but it’s not perfect. Most manufacturers claim that 1 bad bit will slip through every 1014 to 1016 bits, which is actually really good. But what about those unrecoverable read errors (URE’s)? They’re out of the disk drive’s hands. Hopefully something higher in the stack can recover the data. Maybe your filesystem, or maybe the storage array software.
The good news is that every enterprise storage array worthy of the name has data integrity checking built in, including all the big names and most of the smaller companies, too. After all, if an array can’t store data reliably it’s not really worth buying! So if you’re using a storage array, you’re probably good. Drobo apparently does integrity checking, too. So there’s that.
The bad news is that NTFS, ext3, and HFS+ don’t do any kind of data integrity checking. That means that the vast majority of user data is reliant on the ECC in the hard disk drive itself to ensure it meets the prime directive of storage.
The worse news is that unrecoverable read errors do happen, so all this data is at risk. Heavy data users (Greenplum, Amazon, CERN) report that errors really do happen about as often as hard disk drive manufacturers suggest they might. Furthermore, errors often come after the disk controller is done with the data: Faulty firmware, poor connections, bad cables, and even cosmic radiation can induce URE’s.
How Common is URE?
It’s hard to understand what one error in 1014, 1015, or 1016 really means in the real world. One easier way to think about it is that 1014 equals 12.5 TB, 1015 equals 125 TB, and 1016 is 1.25 PB. But this doesn’t really tell the correct story either. These are error rates, not error guarantees. You can read an exabyte of data and never encounter a URE, just like you can buy a lottery ticket and become a millionaire. The important thing is the probability.
As Matt Simmons points out, we can easily calculate the probability of a URE based on a given amount of data. The formula is Statistics 101 material, and he does a fine job of laying it out in his blog post, Recalculating Odds of RAID5 URE Failure. But even that was a little hard to grasp.
So here’s my take: Given a number of hard disk drives of a certain size in a set, how likely is a URE? I graphed it out for 1-10 drives of modern sizes, 1-10 TB. And the results are pretty scary.
Although a single 1 TB drive has less than an 8% chance of URE, those fancy new 10 TB drives start out over 55%, assuming a URE rate of 1 in 1014. Throw a few into a RAID set and you’ve got real trouble. If your risk threshold is a 50/50 chance, you can’t have more than three 3 TB drives (9 TB) in a set before you’re there. Even if you’re a crazy risk-taker, five 6 TB drives (30 TB) gets you over 90%. This is not good.
How about a URE rate of 1 in 1015? You’d reach 50% at around 90 TB, which is admittedly pretty high. But that’s still not a crazy huge amount of data, and it’ll be downright common in just a few years. And when you consider the reasonably likely issue of bad firmware and bad cables, it seems like URE isn’t an alien issue.
Play around with the numbers using my Google URE Spreadsheet.
What happens if you lose a bit? Maybe nothing. Video and audio files will probably keep playing. Photos might still look OK. But maybe not. And a faulty cable could wipe out the whole file, not just a bit of it (if you pardon the pun).
Protect Your Data
What can you do about unrecoverable read errors? Simple answer: Use a better storage stack.
As mentioned above, most enterprise storage systems implement serious data integrity checking as part of their storage controller. Many use erasure coding, like the Reed-Solomon codes already used for ECC in the hard disk drive. Others prefer retaining a SHA1 hash for all data and recovering it from an alternate location if it gets corrupted. Either way the risk is reduced to such an extent that you don’t have to worry about it.
But what about servers and home users of non-enterprise storage? You’ve got trouble here. You can’t use NTFS, ext, or HFS+. Btrfs and ReFS have integrity features but they’re not effective out of the box: Btrfs only checks integrity with CRC32 and it’s not clear how it recovers data, while Microsoft engineered “integrity streams” into ReFS but only uses them for metadata by default. All they need to do is turn on integrity streams for user data, and this seems like a no-brainer for them in enterprise storage scenarios.
The only real option is ZFS. It has wonderful, robust integrity checking and data recovery. In fact, this was one of the design goals for ZFS! Honestly, if you care about your data and have more than a dozen terabytes of it, you must use something like ZFS or a real storage array.
Maybe not today, and maybe not tomorrow, but soon you’re going to need data integrity checking and error recovery. It’s time for Microsoft, Apple, and Btrfs to step up and provide it.