The Four Horsemen of Storage System Performance: Never Enough Cache

October 7, 2010 By Stephen 1 Comment

Four Horsemen-400 — The Four Horsemen of Storage System Performance: These four ugly gentlemen stand between you and your data.

Why do some data storage solutions perform better than others? What tradeoffs are made for economy and how do they affect the system as a whole? These questions can be puzzling, but there are core truths that are difficult to avoid. Mechanical disk drives can only move a certain amount of data. RAM caching can improve performance, but only until it runs out. I/O channels can be overwhelmed with data. And above all, a system must be smart to maximize the potential of these components. These are the four horsemen of storage system performance, and they cannot be denied.

Overcoming the Limits of Spindles

Perhaps the previous discussion of spindles left you exhausted, imagining a spindly-legged centipede of a storage system, trying and failing to run on stilts. The Rule of Spindles would be the end of the story were it not for the second horseman: Cache. He stands in front of the spindles, quickly dispatching requests using solid state memory rather than spinning disks. Cache also acts as a buffer, allowing writes to queue up without forcing the requesters to wait in line.

Cache may be quick, but practical concerns limit its effectiveness. Solid state memory is available in many types, but all are far more expensive per gigabyte than magnetic hard disk media. DRAM has historically cost 400 times as much as disk capacity, and even NAND flash (the current darling of the industry) is more than 40 times as expensive. Practically speaking, this means that disk devices, from the drives themselves to large enterprise storage arrays, usually include a very small amount of cache relative to their total capacity.

When specifying a storage system, the mathematics of cache and spindles adhere to a simple rule: More is better for performance but worse for the budget. This leads to a trade-off, where a point of diminishing return tells us to stop adding both spindles and cache and accepting the storage system as it is.

A History of Cache

Cache was not always as common as it is today. When even a small amount of DRAM cost hundreds of dollars, adding a single RAM chip to a hard disk drive would have broken the bank. So many drives had no cache at all well into the mid 1990’s. Operating systems of the time used expensive system memory as a buffer for storage operations rather than expecting cache in the disk controller or drive – remember setting the Buffers command in config.sys?

This was not as bad as it seems, at least in theory. Operating systems stand a fighting chance of “knowing” what data will be requested next, and could therefore request it ahead of time. They also might get a hint about data that will never be used again and can thus flush that from the so-called buffer cache. Although MS-DOS wasn’t very good at this, modern systems have greatly advanced in this respect using a technology called demand paging.

Caching at the array was the key differentiator for early enterprise RAID systems, overcoming the punishing slowdowns caused by parity calculations when data was written. EMC adapted their DRAM-based solid-state storage systems to become a cache in front of hard disk drives and the Symmetrix was born. The Data General (now EMC) CLARiiON was notable as well, bringing a large intelligent write cache to the vast market of midrange systems that could never justify the high price of a Symmetrix. Today, all vendors, from IBM to HP to NetApp to HDS, have vast and clever caches.

The importance of cache on enterprise storage performance can not be over-stated. Mix together rotational latency, seek time, and RAID penalty and you get seriously-compromised I/O response time. But cache can eliminate this penalty entirely, provided there is capacity, by confirming the write and queueing it for later (a concept known as write-back caching). Busy shared storage systems would be simply unusable without cache.

Five Uses for Disk Buffers

Hard disk drives today normally contain a small amount of RAM to use as a buffer for I/O requests. This serves the following needs, though not all are found on all drives:

A read cache, allowing frequently-requested data to be read from memory rather than involving mechanical disk operations
An I/O-matching mechanism, allowing slower disks and faster interfaces to work together
A read-around (ahead or behind) pre-fetch cache, saving a few blocks around any requested read on the assumption that they will also be requested soon
A read-after-write cache, saving recently-written data to serve later read requests
A command queue, allowing write commands to be reordered, avoiding the “elevator seeking” common to early hard disk drives

Disk buffer size has expanded rapidly in recent years, with some devices including 64 MB or more or DRAM. Seagate’s Momentus XT drive even includes 4 GB of NAND flash as a massive read cache!

Write-Through and Write-Back Cache

There are two basic methods of caching data:

The earliest systems used read-only or write-through caches. All I/O requests pass through the cache, which usually saves the most recent and serves them up when a read is requested. They don’t buffer write requests at all, simply passing them through to the storage system to process. They are safe, since the storage device always has a consistent set of committed writes, but they do nothing to offset the RAID penalty.	Most modern storage systems use a write-back (also called “write-behind”) cache, which acknowledges writes before they are committed to disk. They use non-volatile RAM, battery-backed DRAM, or NAND flash to ensure that data is not lost in the event of a power outage. Though far more effective, this type of memory is also far more costly.

There are two basic methods of caching data:

The earliest systems used read-only or write-through caches. All I/O requests pass through the cache, which usually saves the most recent and serves them up when a read is requested. They don’t buffer write requests at all, simply passing them through to the storage system to process. They are safe, since the storage device always has a consistent set of committed writes, but they do nothing to offset the RAID penalty.

Most modern storage systems use a write-back (also called “write-behind”) cache, which acknowledges writes before they are committed to disk. They use non-volatile RAM, battery-backed DRAM, or NAND flash to ensure that data is not lost in the event of a power outage. Though far more effective, this type of memory is also far more costly.

Just about every modern storage array uses caching, and most employ the write-back method to accelerate writes as well as reads. Some have very smart controllers that perform other tricks, but Smart is another Horseman for another day. As mentioned before, RAID systems would be nearly unusable without write-back cache allowing the disks to catch up with random writes.

Onward: I/O, and Smarts

The horseman of spindles is harsh, but he does not rule the day. There are many ways to overcome his limits and his three brothers often come into play. These are cache, which bypasses the spindle altogether; I/O, which can constrain even the fastest combination of disk and cache; and the intelligence of the whole system, which limits or accelerates all the rest. We will examine these horsemen in the future!

I’ve been meaning to write this up for a long time. Thanks for listening and commenting!

You might also want to read these other posts...

Comments

frostymarvelous says

February 24, 2017 at 9:03 am

This is a great series. I think you should link all others in each one so they are easy to find.
Thanks for sharing your knowledge.

GPS Time Rollover Failures Keep Happening (But They’re Almost Done)

This is week “1111111111” in the GPS system. Tomorrow morning it will roll over to week “0000000000”. How well will various systems handle this change? Not well, judging by what we’ve seen so far!

Ranting and Raving About the 2018 iPad Pro

I remain enthusiastic about the iPad Pro, despite getting a scratched screen and my concerns about durability. It’s a worthy successor to the original and offers enough improvements that I’d recommend the upgrade for just about anyone who uses their iPad for serious work. It’s still not yet a laptop replacement, but this is due more to a lack of desktop-class software for iOS than anything in Apple’s control.

Follow the Yellow Brick Road to the Software-Defined Future

November 29, 2012

The Software-Defined Datacenter is a great concept, but it just won’t work. The big enterprise companies will never allow VMware (and daddy EMC) to commoditize them out of existence, so useful implementations will be rarer than ruby slippers. The best we can hope for is point enhancements to enable greater virtual machine mobility through SDN and improved storage integration.

My Advice For New Business Travelers: Get The Credit Cards!

March 20, 2014

New business travelers first should sign up for a preferred hotel and airline program and focus their travel dollars there. Then, they should get a Chase Sapphire card or American Express Platinum card to earn points on travel and transfer those points to the hotel or airline program for redemption. Finally, they should consider getting the co-branded credit card for their hotel chain and airline of choice.

ZFS Is the Best Filesystem (For Now…)

July 10, 2017

ZFS should have been great, but I kind of hate it: ZFS seems to be trapped in the past, before it was sidelined it as the cool storage project of choice; it’s inflexible; it lacks modern flash integration; and it’s not directly supported by most operating systems. But I put all my valuable data on ZFS because it simply offers the best level of data protection in a small office/home office (SOHO) environment. Here’s why.

What’s (Still) Wrong With Dropbox For Business

April 17, 2013

I am a heavy (and paying) user of Dropbox, using it both for business and personal storage and synchronization. Although I find the service incredibly useful, Dropbox is far from perfect, especially for business users. So I thought I would take a few moments to talk about what I’d like to see Dropbox improve.

Microsoft: Kill the Craptops Before They Destroy Windows!

January 7, 2013

Release after release, Microsoft pushes Windows forward. Yet the operating system is continually undermined by the “value-focused” low-end machines pushed by the majority of OEMs. This race to the bottom has tarnished Windows for a decade and now threatens to derail Windows 8. Microsoft must do something to stop the crap before it’s too late!

Nimble Storage Rolls Out an All-Flash Array

February 24, 2016

It took longer than I expected for Nimble Storage to introduce an all-flash array, but their AF7000 looks to be a very credible offering. They’re targeting XtremIO and Pure with their marketing, but I expect HP, Dell, and especially NetApp to be cross-shopped more frequently. In that fight, I expect the Nimble AF7000 to be very attractive indeed!

Co-Processors, GPGPU, and Heterogeneous Computing

June 26, 2017

I’ve been thinking a lot lately about microprocessors, from the many-core CPUs that AMD and Intel introduced recently to the massively scalable GPGPU processing that’s taking machine learning by storm. After years of consolidation on commodity x86 CPUs, it seems that the computing paradigm is turning again to specialized offload processors. This trend towards heterogeneous computing will change the face of hardware, from mobile devices to the datacenter.

The Myths of Standardization

December 15, 2011

I certainly benefit from standardization of the world around me, and I welcome interoperability and interchangeability as well as the price and product selection advantages. But I am not blithely focused on standardization above all else. I will happily use a proprietary solution if the alternative is inelegant, ineffective, or insufficient.