Data storage has always been one of the most conservative areas of enterprise IT. There is little tolerance for risk, and rightly so: Storage is persistent, long-lived, and must be absolutely reliable. Lose a server or network switch and there is the potential for service disruption or transient data corruption, but lose a storage array (and thus the data on it) and there can be serious business consequences.
Perhaps the only area more conservative than storage is data protection (backup and archiving) and for much the same reasons. Data backups are the lifeline of modern businesses. Truly, every company is a digital company today, and without the digits there is no company!
Data storage and data protection also illustrate what my friend Dave McCrory would call data gravity: It is much more difficult to move large volumes of data than it is to move compute and network resources, so these tend physically to cluster wherever the storage is placed.
Today’s typical enterprise storage architecture looks remarkably like what I saw in the first years of my career: Specialized yet simplistic storage arrays at the center connected with proprietary protocols and networks to a host of compute resources. And these storage arrays tend to rely on the same basic methods to drive reliability: Data is duplicated between multiple hard disk drives in a fixed configuration behind twin controllers which determine where to place data and arbitrate access.
Storage arrays also offer data services of various sorts. The earliest of these features focused on shared access to data, including cloning and snapshots. Then we saw capacity optimization added, including deduplication, thin provisioning, and compression. With the advent of solid-state storage, much of the focus of mainstream storage development centered on performance through tiering and caching. Now, integration points like VAAI and VVOL have become increasingly important as servers are virtualized.
https://www.youtube.com/watch?v=I1wg1DNHbNU
But through it all, mainstream storage remains fairly rudimentary, with architectures and protocols that reflect the 1980’s rather than this decade. Most data is still stored on simple RAID sets and most access still uses the SCSI protocol, emulating a long-gone disk-centric system architecture. Even NAS, the other popular storage paradigm, is stuck with RAID and network protocols designed for Microsoft Windows and Sun UNIX networks in the 1990’s.
That’s pretty much where we are today, but things are finally changing. Over the next few weeks, I’ll be writing about the future of storage, transformed and re-thought. But I’ll also consider the bridge to this future, and ask if we’ll ever get there.
Steve C says
Well said. I think it is important to distinguish between implementations (which can come and go, although best practices emerge and can stabilize implementation over a decade or more) and key architectural interfaces (over which layer upon layer of ecosystem are built, to the point where replacing the interface becomes hard if not economically infeasible).
SCSI as the interface between server (or disk array controller) and disk was actually a watershed change in interface. Responsibility shifted from the host operating system identifying a physical location on a physical disk (what cylinder does the head seek to, which platter (which head), and rotationally which sector) to the disk, which simply presented a single linear array of then-512-byte sectors. This abstraction for block storage has been not only used for physical disks for 30 years, but also for a wide range of logical disks.
More importantly, an enormous amount of software has been layered over this linear-array-of-sectors model over the decades. It is this layering which makes the SCSI abstraction as timeless as (say) TCP/IP in networking, or as its counterpart (but variable length) linear array of bytes, the “file”.
As interesting as new ways to access linear arrays of sectors (NVMe over fabric, anyone?) and linear arrays of bytes (objects, anyone?) are, they are simply new access methods or implementations for accessing the same age-old abstractions, and for the most part can be slipped in under the many layers of software written over the last 50 years to use those abstractions.
@FStevenChalmers (speaking for self, work at Hewlett Packard)