• Skip to main content
  • Skip to primary sidebar
  • Skip to footer
  • Home
  • About
    • Stephen Foskett
      • My Publications
        • Urban Forms in Suburbia: The Rise of the Edge City
      • Storage Magazine Columns
      • Whitepapers
      • Multimedia
      • Speaking Engagements
    • Services
    • Disclosures
  • Categories
    • Apple
    • Ask a Pack Rat
    • Computer History
    • Deals
    • Enterprise storage
    • Events
    • Personal
    • Photography
    • Terabyte home
    • Virtual Storage
  • Guides
    • The iPhone Exchange ActiveSync Guide
      • The iPhone Exchange ActiveSync Troubleshooting Guide
    • The iPad Exchange ActiveSync Guide
      • iPad Exchange ActiveSync Troubleshooting Guide
    • Toolbox
      • Power Over Ethernet Calculator
      • EMC Symmetrix WWN Calculator
      • EMC Symmetrix TimeFinder DOS Batch File
    • Linux Logical Volume Manager Walkthrough
  • Calendar

Stephen Foskett, Pack Rat

Understanding the accumulation of data

You are here: Home / Everything / Computer History / The Four Horsemen of Storage System Performance: Never Enough Cache

The Four Horsemen of Storage System Performance: Never Enough Cache

October 7, 2010 By Stephen 1 Comment

The Four Horsemen of Storage System Performance: These four ugly gentlemen stand between you and your data.

Why do some data storage solutions perform better than others? What tradeoffs are made for economy and how do they affect the system as a whole? These questions can be puzzling, but there are core truths that are difficult to avoid. Mechanical disk drives can only move a certain amount of data. RAM caching can improve performance, but only until it runs out. I/O channels can be overwhelmed with data. And above all, a system must be smart to maximize the potential of these components. These are the four horsemen of storage system performance, and they cannot be denied.

Overcoming the Limits of Spindles

Perhaps the previous discussion of spindles left you exhausted, imagining a spindly-legged centipede of a storage system, trying and failing to run on stilts. The Rule of Spindles would be the end of the story were it not for the second horseman: Cache. He stands in front of the spindles, quickly dispatching requests using solid state memory rather than spinning disks. Cache also acts as a buffer, allowing writes to queue up without forcing the requesters to wait in line.

Cache may be quick, but practical concerns limit its effectiveness. Solid state memory is available in many types, but all are far more expensive per gigabyte than magnetic hard disk media. DRAM has historically cost 400 times as much as disk capacity, and even NAND flash (the current darling of the industry) is more than 40 times as expensive. Practically speaking, this means that disk devices, from the drives themselves to large enterprise storage arrays, usually include a very small amount of cache relative to their total capacity.

When specifying a storage system, the mathematics of cache and spindles adhere to a simple rule: More is better for performance but worse for the budget. This leads to a trade-off, where a point of diminishing return tells us to stop adding both spindles and cache and accepting the storage system as it is.

A History of Cache

Cache was not always as common as it is today. When even a small amount of DRAM cost hundreds of dollars, adding a single RAM chip to a hard disk drive would have broken the bank. So many drives had no cache at all well into the mid 1990’s. Operating systems of the time used expensive system memory as a buffer for storage operations rather than expecting cache in the disk controller or drive – remember setting the Buffers command in config.sys?

This was not as bad as it seems, at least in theory. Operating systems stand a fighting chance of “knowing” what data will be requested next, and could therefore request it ahead of time. They also might get a hint about data that will never be used again and can thus flush that from the so-called buffer cache. Although MS-DOS wasn’t very good at this, modern systems have greatly advanced in this respect using a technology called demand paging.

Caching at the array was the key differentiator for early enterprise RAID systems, overcoming the punishing slowdowns caused by parity calculations when data was written. EMC adapted their DRAM-based solid-state storage systems to become a cache in front of hard disk drives and the Symmetrix was born. The Data General (now EMC) CLARiiON was notable as well, bringing a large intelligent write cache to the vast market of midrange systems that could never justify the high price of a Symmetrix. Today, all vendors, from IBM to HP to NetApp to HDS, have vast and clever caches.

The importance of cache on enterprise storage performance can not be over-stated. Mix together rotational latency, seek time, and RAID penalty and you get seriously-compromised I/O response time. But cache can eliminate this penalty entirely, provided there is capacity, by confirming the write and queueing it for later (a concept known as write-back caching). Busy shared storage systems would be simply unusable without cache.

Five Uses for Disk Buffers

Hard disk drives today normally contain a small amount of RAM to use as a buffer for I/O requests. This serves the following needs, though not all are found on all drives:

  1. A read cache, allowing frequently-requested data to be read from memory rather than involving mechanical disk operations
  2. An I/O-matching mechanism, allowing slower disks and faster interfaces to work together
  3. A read-around (ahead or behind) pre-fetch cache, saving a few blocks around any requested read on the assumption that they will also be requested soon
  4. A read-after-write cache, saving recently-written data to serve later read requests
  5. A command queue, allowing write commands to be reordered, avoiding the “elevator seeking” common to early hard disk drives

Disk buffer size has expanded rapidly in recent years, with some devices including 64 MB or more or DRAM. Seagate’s Momentus XT drive even includes 4 GB of NAND flash as a massive read cache!

Write-Through and Write-Back Cache

There are two basic methods of caching data:
The earliest systems used read-only or write-through caches. All I/O requests pass through the cache, which usually saves the most recent and serves them up when a read is requested. They don’t buffer write requests at all, simply passing them through to the storage system to process. They are safe, since the storage device always has a consistent set of committed writes, but they do nothing to offset the RAID penalty. Most modern storage systems use a write-back (also called “write-behind”) cache, which acknowledges writes before they are committed to disk. They use non-volatile RAM, battery-backed DRAM, or NAND flash to ensure that data is not lost in the event of a power outage. Though far more effective, this type of memory is also far more costly.

Just about every modern storage array uses caching, and most employ the write-back method to accelerate writes as well as reads. Some have very smart controllers that perform other tricks, but Smart is another Horseman for another day. As mentioned before, RAID systems would be nearly unusable without write-back cache allowing the disks to catch up with random writes.

Onward: I/O, and Smarts

The horseman of spindles is harsh, but he does not rule the day. There are many ways to overcome his limits and his three brothers often come into play. These are cache, which bypasses the spindle altogether; I/O, which can constrain even the fastest combination of disk and cache; and the intelligence of the whole system, which limits or accelerates all the rest. We will examine these horsemen in the future!

I’ve been meaning to write this up for a long time. Thanks for listening and commenting!

Note: Some of these links include affiliate codes that help pay for this blog. For example, buying an Amazon Kindle with this link sends a few bucks my way! But I don't write this blog to make money, and am happy to link to sites and stores that don't pay anything. I like Amazon and buy tons from them, but you're free to buy whatever and wherever you want.

You might also want to read these other posts...

  • Electric Car Over the Internet: My Experience Buying From…
  • Liberate Wi-Fi Smart Bulbs and Switches with Tasmota!
  • Introducing Rabbit: I Bought a Cloud!
  • Running Rabbits: More About My Cloud NUCs
  • How To Connect Everything From Everywhere with ZeroTier

Filed Under: Computer History, Enterprise storage, Personal, Terabyte home, Virtual Storage Tagged With: 4 horsemen, cache, CLARiiON, Data General, DRAM, EMC, NAND, RAID, storage, Symmetrix, write-back, write-through

Primary Sidebar

The work of the information officer [should be] regarded as the natural dynamic extension of that of the librarian.

Douglas John Foskett

Subscribe via Email

Subscribe via email and you will receive my latest blog posts in your inbox. No ads or spam, just the same great content you find on my site!
 New posts (daily)
 Where's Stephen? (weekly)

Download My Book


Download my free e-book:
Essential Enterprise Storage Concepts!

Recent Posts

How To Install ZeroTier on TrueNAS 12

February 3, 2022

Scam Alert: Fake DMCA Takedown for Link Insertion

January 24, 2022

How To Connect Everything From Everywhere with ZeroTier

January 14, 2022

Electric Car Over the Internet: My Experience Buying From Vroom

November 28, 2020

Powering Rabbits: The Mean Well LRS-350-12 Power Supply

October 18, 2020

Tortoise or Hare? Nvidia Jetson TK1

September 22, 2020

Running Rabbits: More About My Cloud NUCs

September 21, 2020

Introducing Rabbit: I Bought a Cloud!

September 10, 2020

Remove ROM To Use LSI SAS Cards in HPE Servers

August 23, 2020

Test Your Wi-Fi with iPerf for iOS

July 9, 2020

Symbolic Links

    Featured Posts

    Go Get a ProtonMail Account and Protect Your Online Life!

    July 19, 2017

    The End of Unlimited Data – Part 1: The Buffet

    June 2, 2010

    The Four Horsemen of Storage System Performance: The Rule of Spindles

    August 25, 2010

    A Complete List of VMware VAAI Primitives

    November 10, 2011

    Begun, the Patent Wars Have

    July 9, 2012

    EMC Redefine Possible (TL;DR Edition)

    July 9, 2014

    Ranting and Raving About the 2018 iPad Pro

    November 11, 2018

    Put that camera away and enjoy the view!

    April 11, 2012

    Defining Failure: What Is MTTR, MTTF, and MTBF?

    July 6, 2011

    On the Death of Innovation, or “These Kids These Days!”

    May 21, 2012

    Footer

    Legalese

    Copyright © 2022 · Log in