October 31, 2014

Scaling Storage Is Hard To Do

Data storage isn’t as easy as it sounds, especially at enterprise or cloud scale. It’s simple enough to read and write a bit of data, but much harder to build a system that scales to store petabytes. That’s why I’m keenly focused on a new wave of storage systems built from the ground up for scaling!

This is one scaly anteater! No, really, he’s a scaly anteater!

No More Monoliths

Traditional storage arrays are integrated monoliths: A single “brain” that knows where every bit of data is stored. This approach worked well in the heyday of RAID and block protocols, since performance was paramount and scaling meant adding another autonomous system. But monolithic systems just don’t work in modern, dynamic environments.

Although they offered large-scale capacity and performance, there was really nothing scalable about these arrays. Monolithic storage arrays like the old-fashioned EMC Symmetrix could handle a fixed number of disk drives, controllers, interfaces, cache cards, and such. You bought it, filled it up, ran it as-is, and decommissioned it. The best they could offer was “buy as you grow” purchasing of in-place assets.

Lately, purveyors of these devices have turned to virtualization (the storage kind, not server virtualization) to modernize them. The brain behind a monolithic array can now virtualize other storage systems, from JBOD to older arrays. This scale-up approach is better than nothing, but it’s not where the industry is headed.

The Limits of Modular Scaling

Customers wanted more. They demanded flexibility, matching the number of disk drives to the required capacity and performance. They were also drawn to the idea of buying disk drives as capacity was needed rather than all at once.

The market responded with scale-up modular storage, exemplified by the NetApp Filer and DG CLARiiON. These featured a controller “head” and one or more disk shelves that could be added later. NetApp parlayed a combination of file-level protocols and (surprisingly rare) RAID-4 data layout to enable on-the-fly capacity expansion, while later systems were often more traditional, with block protocols and fixed RAID-5 and -10 data layout.

Regardless, modular storage offers a mere approximation of scale, since “scaling up” only goes so far before the “head” can’t handle the load. Then there is the daisy-chained interconnect to consider: Many systems use a pair of controllers with just a few Fibre Channel loops or SAS ports for every disk shelf to share.

You can only scale up so far before you exhaust your “head”

Clusters Only Scale So Far

How do you add capacity and performance without drastically re-architecting storage systems? For the last decade, enterprise storage vendors have relied on clustering as a means to scale. Each clustered controller is locked in a “mind meld” with every other, sharing precious data maps and caches in real time, enabling customers to add whole arrays for greater capacity and performance.

But clustering only goes as far as the interconnect will allow. Systems that relied on Fibre Channel, IP/Ethernet, and iSCSI for inter-node communication could only scale to a handful of nodes before node coordination latency got in the way. This is why pleasant and useful arrays from the likes of EqualLogic (now Dell) and LeftHand (HP) struggle to grow to even a dozen nodes. And clustering has proved devilish for NetApp to implement.

InfiniBand was a white knight, bringing DMA and nanosecond latency and enabling hundred-node monsters like Isilon to thrive. Look “under the skirt” of any reasonably scalable clustered storage solution and you’re likely to find InfiniBand HCA’s and a Mellanox switch in the middle. It’s as if Mr. Spock could mind-meld with the whole crew!

Clustering sounds great, but it’s awfully taxing to keep all the nodes consistent!

Hyper-Scale Storage

But web services and “big data” need something more than a massive cluster. Hyper-scale workloads need massive-scale storage with real flexibility, and that requires breaking free of the tightly-coupled model of yore. Hyper-scale storage is built of autonomous nodes, each handling a piece of the dataset and I/O workload.

Traditional storage protocols like SCSI (for FC and iSCSI block access) and NFS or SMB (for NAS file access) don’t work well in a thousand-node web-scale model, however. Sure, these old standbys are great for accessing a single node or cluster, but it takes a modern object-based storage protocol to truly leverage hyper-scale storage.

This is exactly the architecture that the latest storage arrays are adopting: Object storage inside, with loosely-coupled nodes offering truly dynamic scaling. Although many allow native API access, most of these products also include an integrated object-to-file gateway, with VMware-friendly NFS or Windows-oriented SMB as the front-end protocol. These aren’t the ideal protocols for scaly-storage access, but at least they’re compatible with existing applications.

By finally divorcing data storage from legacy RAID, these systems offer compelling advantages. Many include integrated tiering, with big, slow disks and flash storage acting in concert. Some have even talked of adding tape to the mix, with Amazon’s Glacier perhaps the only of these in production. They can also include remote replication, data protection features, and more.

The “new architecture” is hyper-scaly: Loosely-coupled nodes teaming up to massive proportions

Learn more about Exablox and Cleversafe in these Tech Field Day videos:

Stephen’s Stance

Loosely-coupled object storage is the future: No more monoliths or clusters. The new wave of startups recognize this, with companies and projects as diverse as Amazon S3, Cleversafe, Ceph, Exablox, Gridstore, and Convergent.io moving rapidly to offer object-based storage. Object storage is nothing new (hello Centera, HCP, Caringo, etc) but perhaps it has finally found its place at the center of the enterprise storage universe!

Disclaimer: Exablox, Cleversafe, NetApp, EMC, Dell (EqualLogic and Caringo), and HP (LeftHand) have sponsored my Tech Field Day event, and I recorded a video series for TrueBit.tv focused on Convergent.io.

I found the fabulous pangolin wallpaper on BlenderArtists

  • Gridstore

    Thanks for the shout out, Stephen. Another spot on analysis of the shift in storage to address limits of decades old technology. We’re not only focused on being able to scale effortlessly and massively, but also ensuring that you don’t have to compromise performance while scaling.

  • Marc Villemade

    Hey Stephen,

    Great article. Object Storage has certainly been given a second life with the new generation of vendors who are bring true scalability, storage efficiency and for some of those high-performance as well.

    What’s missing in your article is the important idea that object storage can, and in my opinion should be, software-based and hardware agnostic. Otherwise you still end up with the pitfalls of hardware vendor locking and lack of flexibility.

    At Scality, that’s exactly our philosophy, and our approach allows our customers to choose their preferred HW vendors or mix and match multiple vendors. It gives a lot of flexibility in terms of hardware refresh, picking the best vendors for different features (performance, density, power-efficiency…) without having to care about the object storage layer.

    We do have some hardware recommendations, reference architectures and partnerships with hardware vendors, but our software is totally agnostic. And that’s a compelling point for every company managing petascale storage systems.

    -Marc

  • Pingback: Scaling Storage Is Hard To Do | Storage CH Blog

  • Pingback: Difficulties of Scaling Storage | The Maximum Midrange Blog

  • Pingback: Hybrid, flash, converged storage, what's next? Hybrid storage stack?!

  • Pingback: Hybrid, flash, converged storage, what’s next? Hybrid storage stack!?

  • Kannan Subbiah

    Stephen,

    Nice article and enjoyed reading it. Sharing it in my daily digest to my network.

  • Pingback: Scaling Storage In Conventional Arrays | Storage CH Blog