July 29, 2014

Scaling Storage In Conventional Arrays

Clustering sounds great, but it’s awfully taxing to keep all the nodes consistent!

It is amazing that something as simple-sounding as making an array get bigger can be so complex, yet scaling storage is notoriously difficult. Our storage protocols just weren’t designed with scaling in mind, and they lack the flexibility needed to dynamically address multiple nodes. Data protection is extremely difficult and data movement is always time-consuming.

This is part of a series on “Scale-Out” Storage Field Day 4

Three Ways To Scale Storage

Traditionally, there have been three ways to scale storage, each with its own pros and cons:

  1. Scale-up array: Add more storage behind a single controller “head”, which acts as the termination point for client I/O. Since anything behind the head is invisible to the client anyway, scaling up can be non-disruptive. But there are limits to the amount of data a single head can control, and performance can suffer as the front- and back-end controllers, CPU, and memory become saturated.
  2. Scale-out cluster:  As the array scales, the controller “head” becomes increasingly critical, leading most vendors to cluster arrays for high availability. Most use a “shared-everything” cluster design, with each controller able to “peek into” every other controller’s RAM and caches, ensuring data consistency. But it’s difficult to maintain this “mind meld” between clustered heads beyond a handful of members.
  3. Scale-out gateway: Recently, many storage vendors have adopted a two-tier architecture with a true scale-out object store on the back end and one or more protocol gateways in the middle. The client talks to the gateway using a conventional protocol like iSCSI or NFS; the gateway handles data distribution; and the back end provides scale, data protection, and consistency.

All three of these traditional scale-out architectures have proven their worth in production over the past decade, and all are actively in development by “next-generation” storage vendors.

Four Examples From Storage Field Day 4

At Storage Field Day 4, the delegates heard about scale-out clusters from CloudByte, Overland Storage, and Nimble Storage, all of which can scale up as well as out.

There are many other scale-out cluster solutions, with market leaders like NetApp, EMC Isilon, and Dell’s Compellent and EqualLogic serving as familiar examples.

CloudByte

CloudByte’s scale-out architecture joins multiple nodes into a cluster, sharing storage in the cluster using ZFS. But CloudByte adds an additional layer of abstraction via what they call a tenant storage machine (TSM), which can be moved from cluster to cluster on demand. In this way, they transcend traditional clustering limitations, but data locality and client access is limited to a single cluster within the larger pool.

Overland Storage

Overland Storage scales out using a gateway driver running on their scale-up SnapScale storage nodes. Client I/O is wide striped across “peer sets” of disks located on nodes throughout the cluster. This novel approach allows the SnapScale cluster to grow while balancing storage across every node in the cluster, though it is not clear to me how they rebalance data as the cluster grows. Client I/O is balanced using round-robin DNS to distribute client connections, a simple but inflexible approach.

Nimble Storage

Nimble Storage also scales up and out by slicing all data and distributing it equally across the storage in a pool of nodes. Data is rebalanced in the background as the cluster grows, using what I would call a “lazy” algorithm to avoid performance impact. Currently, Nimble only supports four arrays in a single pool, but they promise that this number will grow over time. Since Nimble uses the iSCSI protocol exclusively, they rely on a host-side MPIO driver to allow parallel and highly-available client access across nodes.

Avere Systems and Cleversafe

Storage Field Day 4 delegates also learned about the scale-out gateway offering from Avere Systems and Cleversafe, who are working together to deliver such a solution. In the past, Storage Field Day 3 saw the launch of Exablox, which sells an integrated solution which includes both a scale-out object store and NAS gateway, but Avere and Cleversafe are focused solely on the gateway and object store, respectively. The Cleversafe distributed storage net (dsNet) platform is a massively-scalable object quite unlike any conventional scale-out storage array. Placing an Avere NAS gateway in front of the dsNet allows high performance NFS and SMB access to the scalable, distributed object store.

Note that Avere’s gateways can be used in front of any NFS or SMB storage system, so this concept isn’t limited to Cleversafe.

Coho Data

There was one more conventional-protocol scalable storage system at Storage Field Day 4: Coho Data. Coho pushes the scaling work into the network in a rather clever way. I’ll cover that in more detail in the future, but for now here’s their tech presentation so you can see how it works!

Stephen’s Stance

Scaling storage is hard. Really hard. Especially when you’re not able to change the client driver or protocol. So my hat is off to these companies and others who have come up with clever ways to maintain compatibility while scaling out beyond the bounds of a single storage array. Next time I’ll write about another approach: Scaling storage by changing the client or protocol!