Why do some data storage solutions perform better than others? What tradeoffs are made for economy and how do they affect the system as a whole? These questions can be puzzling, but there are core truths that are difficult to avoid. Mechanical disk drives can only move a certain amount of data. RAM caching can improve performance, but only until it runs out. I/O channels can be overwhelmed with data. And above all, a system must be smart to maximize the potential of these components. These are the four horsemen of storage system performance, and they cannot be denied.
A Lack of Intelligence
Disks can be made faster (and more added), solid-state storage and cache can be added, and I/O bottlenecks can be removed, but what then? How can storage performance keep up with Moore’s Law over the decades? The answer is intelligence: Storage systems must adapt and tune themselves to changing workloads.
It’s far simpler to slap the label “intelligent” on the storage system than it is to add real smarts to the box. The biggest hurdle has always been a lack of communication between clients and applications (at the extreme top of the stack) and storage devices (at the extreme bottom). I’ve called virtualization “a stack of lies”, and in many ways that’s exactly what it is. At each point in the I/O chain, information is lost that would have helped a real intelligent storage array to make better decisions.
Consider a very simple case: Your laptop. It probably contains a SATA hard disk drive connected to a basic controller on the PCIe bus addressed by the CPU. An operating system (probably Windows or Mac OS X) runs on the system, and it relies on a file system (NTFS or HFS+, respectively) to organize and access the hard disk drive. But it also has a volume manager (currently unnamed by Microsoft, though Apple internally calls theirs CoreStorage) that virtualizes storage and adds features like encryption and compression. The files seen by the operating system pass through “filter drivers”, then the file system (which chopped them into blocks), the volume manager (which organizes these blocks), the laptop’s SATA controller, the disk drive’s own controller (which decides where to place these blocks) and cache, and finally to the magnetic media. Even in this very simple scenario, the operating system literally has no idea where data is stored, and the disk literally has no idea what it is storing.
But applications don’t really “care” about files. Each application has its own semantics for storage and retrieval of data, and the file is simply a universal and convenient metaphor for application data storage. Most applications use a proprietary container format which includes metadata and scratch data along with the actual content. The characteristic pattern of reads and writes to this subfile information varies widely by application. This is why a storage device that excels for video editing may be totally inappropriate for databases or e-mail storage.
Enterprise servers add more layers of translation, with Fibre Channel HBA’s, network switches, redundant RAID controllers, and separate caches all performing their magic and discarding valuable meta-information. Many enterprise systems also include independent caching devices in the server, network, or as a gateway to the storage array. Everything in the stack is valuable in one way or another, adding reliability, recoverability, and performance. But the machinations of the stack obscure what goes on above, blocking the ability to add intelligence to the array.
Higher-level applications and server virtualization further obfuscate the storage stack. An operating system may run only a small component of a large enterprise application, so related I/O may come from multiple directions at once. And each operating system may run on a virtual machine, with a hypervisor adding its own file system, volume manager, and storage abstractions. This so-called “I/O blender” purÃ©es and randomizes all storage access before it gets anywhere near the array.
De-Multiplex and Communicate
The only way truly to add intelligence to a storage system, from a lowly hard drive to high-end enterprise array, is to de-multiplex data and add a communications channel through the stack. If the array can untangle the randomized I/O coming from above, and can accept and act on information about that data stream, many things become possible.
Data layout is an often-overlooked topic, but can have a massive impact on system performance. As we pointed out when discussing spindles, the physical placement of data on a disk can have a dramatic impact on I/O performance. But data placement is also critical for RAID systems and those that use automated tiered storage. Depending on system parameters, it may be better to keep data “together” or “apart” to improve performance, but this cannot be accomplished unless the array “knows” which I/O blocks belong together.
As discussed previously, pre-fetch caching can be extremely valuable to accelerate I/O performance. But pre-fetching information is almost impossible on the wrong side of the I/O blender. If an array could de-multiplex the data stream and tag each access by application, pre-fetch algorithms could be much more effective. An array could even work with a cache in the network or the server to pre-fill buffers with the data that would be needed next.
A storage system that intelligently manages caches all through the I/O chain is something of a Holy Grail in enterprise storage. Time and again, pundits and system architects have suggested moving data closer to the CPU to improve performance. At the same time, others recommend maintaining a distance to improve manageability, availability, and flexibility. Intelligently managing a set of caches in multiple locations is the ideal solution, but the inherent obfuscation of the current I/O paradigm makes this extremely difficult.
The Four Horsemen of storage system performance cannot be denied, but they do offer a clear path forward. Storage systems must improve in many different areas, from spindles and drives to caching and I/O bottlenecks. But above all else, storage systems must become smarter in order to become faster, and this requires greater insight into the true nature of the data stream being stored. All storage performance developments, from the laptop to the enterprise, boiled down to adaptations to the demands of the Four Horsemen.
Gregg Holzrichter says
Great series of posts! One concept you highlight – “If the array can untangle the randomized I/O coming from above, and can accept and act on information about that data stream, many things become possible.” – made me think of the way Virsto (www.virsto.com) works. Their server-side software intercepts the random I/O from each VM, while maintaining a virtual machine level map to intelligently place (deduped) data on the array. Speeds up performance and allows you to get much more out of existing storage capacity. Just entered the VMware space – were just shipping on Hyper-V for past year.
Luciano Dalle Ore says
Right on! A couple of thoughts on the subject:
I spoke to Greg Ganger (CMU parallel data lab) on
this subject last year in Portland at the Hot Storage meeting, and he mentioned
that they had done some work on this very subject a few years back but could
not get any traction, as the players did not seem to be interested at the time.
My best guess is that the reason this is so difficult is that it would require
changes throughout the stack before a complete solution is implemented.
Interestingly enough, this is where NAS can have
some advantages over SANs as NFS packets have a lot more information than raw SAN
blocks. I would also expect that an integrated NAS server would be able to do a
better job in “not blending” the information before it gets to the disks.
Higher-level protocols have an advantage when it comes to defeating the IO Blender, but mechanisms like VASA/VADM from VMware are probably the long-term “solution” we’ll see implemented. Then there’s cloud and object storage protocols, which skirt the issue entirely!
Thanks for the comments!