• Skip to main content
  • Skip to primary sidebar
  • Home
  • About
    • Stephen Foskett
      • My Publications
        • Urban Forms in Suburbia: The Rise of the Edge City
      • Storage Magazine Columns
      • Whitepapers
      • Multimedia
      • Speaking Engagements
    • Services
    • Disclosures
  • Categories
    • Apple
    • Ask a Pack Rat
    • Computer History
    • Deals
    • Enterprise storage
    • Events
    • Personal
    • Photography
    • Terabyte home
    • Virtual Storage
  • Guides
    • The iPhone Exchange ActiveSync Guide
      • The iPhone Exchange ActiveSync Troubleshooting Guide
    • The iPad Exchange ActiveSync Guide
      • iPad Exchange ActiveSync Troubleshooting Guide
    • Toolbox
      • Power Over Ethernet Calculator
      • EMC Symmetrix WWN Calculator
      • EMC Symmetrix TimeFinder DOS Batch File
    • Linux Logical Volume Manager Walkthrough
  • Calendar

Stephen Foskett, Pack Rat

Understanding the accumulation of data

You are here: Home / Everything / Computer History / Scaling Storage Is Hard To Do

Scaling Storage Is Hard To Do

June 4, 2013 By Stephen 8 Comments

Data storage isn’t as easy as it sounds, especially at enterprise or cloud scale. It’s simple enough to read and write a bit of data, but much harder to build a system that scales to store petabytes. That’s why I’m keenly focused on a new wave of storage systems built from the ground up for scaling!

This is one scaly anteater! No, really, he’s a scaly anteater!

No More Monoliths

Traditional storage arrays are integrated monoliths: A single “brain” that knows where every bit of data is stored. This approach worked well in the heyday of RAID and block protocols, since performance was paramount and scaling meant adding another autonomous system. But monolithic systems just don’t work in modern, dynamic environments.

Although they offered large-scale capacity and performance, there was really nothing scalable about these arrays. Monolithic storage arrays like the old-fashioned EMC Symmetrix could handle a fixed number of disk drives, controllers, interfaces, cache cards, and such. You bought it, filled it up, ran it as-is, and decommissioned it. The best they could offer was “buy as you grow” purchasing of in-place assets.

Lately, purveyors of these devices have turned to virtualization (the storage kind, not server virtualization) to modernize them. The brain behind a monolithic array can now virtualize other storage systems, from JBOD to older arrays. This scale-up approach is better than nothing, but it’s not where the industry is headed.

The Limits of Modular Scaling

Customers wanted more. They demanded flexibility, matching the number of disk drives to the required capacity and performance. They were also drawn to the idea of buying disk drives as capacity was needed rather than all at once.

The market responded with scale-up modular storage, exemplified by the NetApp Filer and DG CLARiiON. These featured a controller “head” and one or more disk shelves that could be added later. NetApp parlayed a combination of file-level protocols and (surprisingly rare) RAID-4 data layout to enable on-the-fly capacity expansion, while later systems were often more traditional, with block protocols and fixed RAID-5 and -10 data layout.

Regardless, modular storage offers a mere approximation of scale, since “scaling up” only goes so far before the “head” can’t handle the load. Then there is the daisy-chained interconnect to consider: Many systems use a pair of controllers with just a few Fibre Channel loops or SAS ports for every disk shelf to share.

You can only scale up so far before you exhaust your “head”

Clusters Only Scale So Far

How do you add capacity and performance without drastically re-architecting storage systems? For the last decade, enterprise storage vendors have relied on clustering as a means to scale. Each clustered controller is locked in a “mind meld” with every other, sharing precious data maps and caches in real time, enabling customers to add whole arrays for greater capacity and performance.

But clustering only goes as far as the interconnect will allow. Systems that relied on Fibre Channel, IP/Ethernet, and iSCSI for inter-node communication could only scale to a handful of nodes before node coordination latency got in the way. This is why pleasant and useful arrays from the likes of EqualLogic (now Dell) and LeftHand (HP) struggle to grow to even a dozen nodes. And clustering has proved devilish for NetApp to implement.

InfiniBand was a white knight, bringing DMA and nanosecond latency and enabling hundred-node monsters like Isilon to thrive. Look “under the skirt” of any reasonably scalable clustered storage solution and you’re likely to find InfiniBand HCA’s and a Mellanox switch in the middle. It’s as if Mr. Spock could mind-meld with the whole crew!

Clustering sounds great, but it’s awfully taxing to keep all the nodes consistent!

Hyper-Scale Storage

But web services and “big data” need something more than a massive cluster. Hyper-scale workloads need massive-scale storage with real flexibility, and that requires breaking free of the tightly-coupled model of yore. Hyper-scale storage is built of autonomous nodes, each handling a piece of the dataset and I/O workload.

Traditional storage protocols like SCSI (for FC and iSCSI block access) and NFS or SMB (for NAS file access) don’t work well in a thousand-node web-scale model, however. Sure, these old standbys are great for accessing a single node or cluster, but it takes a modern object-based storage protocol to truly leverage hyper-scale storage.

This is exactly the architecture that the latest storage arrays are adopting: Object storage inside, with loosely-coupled nodes offering truly dynamic scaling. Although many allow native API access, most of these products also include an integrated object-to-file gateway, with VMware-friendly NFS or Windows-oriented SMB as the front-end protocol. These aren’t the ideal protocols for scaly-storage access, but at least they’re compatible with existing applications.

By finally divorcing data storage from legacy RAID, these systems offer compelling advantages. Many include integrated tiering, with big, slow disks and flash storage acting in concert. Some have even talked of adding tape to the mix, with Amazon’s Glacier perhaps the only of these in production. They can also include remote replication, data protection features, and more.

The “new architecture” is hyper-scaly: Loosely-coupled nodes teaming up to massive proportions

Learn more about Exablox and Cleversafe in these Tech Field Day videos:

  • Exablox Presents at Storage Field Day 3
  • Cleversafe Presents at Storage Field Day 3

Stephen’s Stance

Loosely-coupled object storage is the future: No more monoliths or clusters. The new wave of startups recognize this, with companies and projects as diverse as Amazon S3, Cleversafe, Ceph, Exablox, Gridstore, and Convergent.io moving rapidly to offer object-based storage. Object storage is nothing new (hello Centera, HCP, Caringo, etc) but perhaps it has finally found its place at the center of the enterprise storage universe!

Disclaimer: Exablox, Cleversafe, NetApp, EMC, Dell (EqualLogic and Caringo), and HP (LeftHand) have sponsored my Tech Field Day event, and I recorded a video series for TrueBit.tv focused on Convergent.io.

I found the fabulous pangolin wallpaper on BlenderArtists

You might also want to read these other posts...

  • Electric Car Over the Internet: My Experience Buying…
  • GPS Time Rollover Failures Keep Happening (But…
  • Liberate Wi-Fi Smart Bulbs and Switches with Tasmota!
  • Tortoise or Hare? Nvidia Jetson TK1
  • Introducing Rabbit: I Bought a Cloud!

Filed Under: Computer History, Enterprise storage, Features, Virtual Storage Tagged With: Amazon S3, Caringo, Centera, Ceph, CLARiiON, Cleversafe, convergent.io, EqualLogic, Exablox, Gridstore, HCP, hyper-scale, InfiniBand, Isilon, LeftHand, Mellanox, modular arrays, monolithic arrays, NetApp, object storage, RAID 4, scalability, scale-out, Scale-up, Symmetrix

Primary Sidebar

My favorite things in life don’t cost any money. It’s really clear that the most precious resource we all have is time.

Steve Jobs

Subscribe via Email

Subscribe via email and you will receive my latest blog posts in your inbox. No ads or spam, just the same great content you find on my site!
 New posts (daily)
 Where's Stephen? (weekly)

Download My Book


Download my free e-book:
Essential Enterprise Storage Concepts!

Recent Posts

Electric Car Over the Internet: My Experience Buying From Vroom

November 28, 2020

Powering Rabbits: The Mean Well LRS-350-12 Power Supply

October 18, 2020

Tortoise or Hare? Nvidia Jetson TK1

September 22, 2020

Running Rabbits: More About My Cloud NUCs

September 21, 2020

Introducing Rabbit: I Bought a Cloud!

September 10, 2020

Remove ROM To Use LSI SAS Cards in HPE Servers

August 23, 2020

Test Your Wi-Fi with iPerf for iOS

July 9, 2020

Liberate Wi-Fi Smart Bulbs and Switches with Tasmota!

May 29, 2020

What You See and What You Get When You Follow Me

May 28, 2019

GPS Time Rollover Failures Keep Happening (But They’re Almost Done)

April 6, 2019

Symbolic Links

    Featured Posts

    Virtual Machine Mobility: Of What, and to Where and in What State?

    January 16, 2012

    Why You Should Never Again Utter The Word, “CIFS”

    February 16, 2012

    Rocking Out With the Topping VX1 Desktop/Bookshelf Amplifier

    October 6, 2015

    Top VMware Blogs 2014: How I Voted

    February 25, 2014

    Frequent Flier Kung Fu for Novices

    March 12, 2012

    Debit or Credit? Always Choose Credit!

    December 19, 2013

    Ten Terrible Apple Products

    June 14, 2012

    Generation 3 drobo: Fall In Love All Over Again

    April 9, 2015

    The I/O Blender Part 1: Ye Olde Storage I/O Path

    May 23, 2012

    The Rack Endgame: A New Storage Architecture For the Data Center

    September 3, 2014

    Copyright © 2021 · Log in