Storage arrays are big, expensive, and difficult to manage. Plus, concentrating storage in a single device puts everything at risk if there is an outage. So why buy a storage array at all? Arrays do a few things very well, and this often makes up for the difference, on balance.
Storage Arrays Accelerate Performance
One of the most important advancements in data storage in the 1990’s was the application of advanced caching algorithms to enterprise storage arrays. DRAM cache (later augmented with flash) makes a massive difference in responsiveness. And this is especially important when it comes to shared, networked storage devices.
One of the first arrays to really “get it right” in my own experience was the Data General CLARiiON. Although comparatively small in size, price, and power, the little CLARiiON really stood up and performed when it was configured correctly. I started using these little wonders while working at Texaco’s Star Enterprise subsidiary in the late 1990’s and admired their ability to keep up with our heavy workloads even though they only had a few megabytes of cache.
The fundamental “cool thing” that these arrays do is cache frequently accessed data in high-performance memory rather than going back to the disks every time. It is fairly straightforward to implement a read cache, though intelligently pre-filling it is substantially more difficult. Enterprise storage devices also implemented a “write-back cache”, meaning they acknowledged incoming writes from clients as soon as data was in cache, rather than waiting for it to be written to disk. The combination of read- and write-cache technology made up for the comparatively slow random I/O capabilities of hard disk drives.
See also The Four Horsemen of Storage System Performance: Never Enough Cache
In the 2000’s, “wide-striping” concepts really took hold. Where RAID traditionally only used a few disks in a set, modern arrays from 3PAR, XIV, and pretty much every other modern array spreads data across every spindle it can. This accelerates I/O and can reduce the amount of time required to rebuild a failed data protection set. It also increases the risk of data loss to some extent, but this is usually made up for by the advanced capabilities of the array itself.
Lately, storage arrays have implemented automated sub-LUN tiering, which places active data on high-performance flash or disk. This also does quite a bit to accelerate performance and is, in many ways, similar to caching approaches pioneered two decades earlier. Again, most modern arrays now offer some form of storage tiering, though implementations vary greatly in detail and effectiveness.
Offloaded Data Movement
The “killer app” of the NetApp filer I purchased in 1996 was its ability to snapshot data throughout the day. Similarly, the winning feature of the EMC Symmetrix I purchased in 1997 was TimeFinder, which allowed the array to make perfect copies of entire LUNs on command. We later purchased a second Symmetrix and implemented SRDF, a remote equivalent.
Offloading data movement and copying operations to the storage array was perhaps the greatest selling feature of high-end enterprise devices. This feature really “separated the men from the boys” since it proved extremely difficult for buyers to manage and implement. It also took a burly, “manly” array to move data around without impacting primary access.
Today, nearly every storage array offers some form of automated data movement. From snapshots to mirrors to replication, offloaded data movement remains a key selling point for shared storage arrays.
VMware’s VAAI has many compelling benefits, but most users talk about offloaded data copying first and foremost. Similarly, one of the most exciting features of Microsoft Windows Server 2012 is ODX, which allows the operating system (as well as Hyper-V) to offload data movement to a compatible array.
Shared Storage Made Real
The third reason that many people buy a storage array is the simple ability to share storage between multiple servers. NAS arrays obviously excel at sharing with many clients, but many SAN storage arrays are used in this method as well. Some systems even allow multi-protocol access to the same data.
File-based protocols like NFS and SMB were designed to allow many clients to access a single pool of storage. This was initially intended only for client access, but NFS has seen massive uptake for servers. This is especially true for VMware storage, where NFS has become the second-most popular method of accessing shared storage. Microsoft is responding by promoting SMB for applications and Hyper-V as part of the launch of Windows Server 2012.
Today, most clustered applications use shared SCSI LUNs over Fibre Channel, iSCSI, or SAS. Most advanced features in VMware vSphere require shared storage, and this has done much to promote adoption of networked storage arrays in smaller and midsize businesses. SCSI is perhaps not the ideal protocol for shared storage access, but as long as “persistent reservations” are supported, it will do.
Stephen’s Stance
The most common objection to shared storage is a concern about “all my eggs in one basket”. While this is a very real consideration, most companies eventually accept the risk in return for better performance, offloaded data movement, or sharing of storage resources.
Leave a Reply