March 21, 2014

The I/O Blender Part 1: Ye Olde Storage I/O Path

The I/O Blender Series

Back in the day, when data was smaller and servers were machines, I/O followed a predictable pattern. Storage arrays could anticipate requests and RAID was beautiful. Then came virtualization, and with it an end to ye olde storage I/O path.

In the good old days, I/O was predictable

Server = HBA = LUN

It was a simpler time back in the 1990′s. Each server had a SCSI host bus adapter (HBA) of its own. Maybe two, if failover was in order. This card transmitted block I/O requests from the operating system “over the wire” to a hard disk drive or storage array controller. And that wire was dedicated just for this purpose: Parallel SCSI or point-to-point Fibre Channel.

The storage array controller had a number of SCSI ports of its own; each was cabled to one of those server HBA’s. The storage array took requests from these “front-end” ports and translated them into internal requests. Usually this meant addressing a certain LUN carved from a single RAID set, though some smarter systems included a DRAM cache to accelerate performance.

The “back-end” of the storage array was a simple SCSI connection to a tray of hard disk drives. Most used parallel SCSI or copper FC, dual-ported and daisy chained from shelf to shelf. The RAID sets were statically mapped to 2, 5, or perhaps a few more disk drives. And that was that.

You should probably also read Storage Arrays Do A Few Things Very Well

Pre-Filling the Cache

The storage array “knew” that any I/O on the first port of controller A belonged to a unique server, and the same for every other port. This allowed the array controller to “learn” the I/O pattern of each port, and thus each server. Smart arrays would begin to predict the next read request to pre-fill the cache with likely data.

Even less-smart arrays got into the game. They could “read around” incoming I/O, and this worked fairly well for prefetch. This worked because the array also “knew” which data blocks belonged to a given host: A LUN was a complete and indivisible unit of storage and could be treated as such.

Copying and Moving Data

Since each LUN was a logical data set, arrays could copy and move data in a consistent manner. If the array copied an entire LUN as a single atomic operation, the data it contained would be consistent. This was the fundamental concept behind EMC Time Finder and many other “business continuance volume” (BCV) products.

In fact, in the 1990′s and early 2000′s, the main challenge in implementing BCV’s was creating “consistency groups” of multiple LUNs belonging to the same server or application. Once these groups were established, scripts could be used to pause an application while the storage array initiated data copies or replication.

Sharing and Not Sharing

Here, for no reason but nostalgia, I present a classic Gadzoox FCL1063TW FC-AL hub!

The advent of Fibre Channel meant that shared access to storage was finally possible. A Fibre Channel SAN allowed multiple servers to access the same front end ports and even the same LUN. But Fibre Channel’s use of World Wide Names meant that the storage array could still uniquely identify I/O and map it to a single server. Everything still worked in a SAN just as it had in a direct attached environment.

If a LUN was to be shared, the servers would use SCSI reservations to avoid conflicting writes and stale buffers. A golden age of SAN filesystems dawned around the year 2000, with Fibre Channel poised to be the high-end, high-performance storage interconnect of choice.

Not all operating systems played nicely in this environment, however. Microsoft Windows was notorious for “assuming” ownership of every LUN it could see. Even worse, Windows would write a disk signature on each, potentially corrupting data belonging to other operating systems. But even this was simple to address in classical Fibre Channel SANs using zoning or on the array with “LUN masking” technology.

Stephen’s Stance

This old-fashioned, predictable storage I/O path was deterministic and decipherable: The server, the switch, and the array all had enough information to do their jobs effectively and efficiently. But server virtualization changes everything, as we will see in the next entry in this series.