I stepped into a hornet nest this week when I posted a write-up about a new flash storage array from Pure Storage. The controversy had nothing to do with the underlying technology, which seems quite sound. Rather, it was all about pricing, with Pure’s competitors calling foul on their price comparisons.
You’re Not Buying Gigabytes
In a world of 3 TB drives, storage capacity is almost irrelevant. Capacity is what people think they are getting when they buy enterprise storage devices, but capacity is only one aspect of the purchase, and it’s not a very significant one in most cases.
So what are enterprise storage buyers buying?
- Performance, especially I/O operations (IOPS), is much more critical than capacity in most cases, and it takes lots of spindles or expensive flash chips to deliver it.
- Data protection features like snapshots are increasingly important, and often cost extra.
- Compatibility is paramount, as is long-term supportability from all vendors involved.
- Integration and management features are often the deciding factor in purchases, especially when it comes to server virtualization applications.
- High availability and product support are “must-haves” that can multiply the cost of a solution.
- Power, cooling, and floor space can be very important for some applications and entirely inconsequential in others.
- Capacity is sometimes important, but many applications require just a few TB or less and thin provisioning, data deduplication, and compression are really blurring the lines here.
So although a typical customer will say “I need 200 GB for this application” they likely need nothing of the sort. They really need 100 IOPS, snapshots, a line on the HCL, VAAI and vCenter plugins, and redundant everything. Even the capacity number is questionable: Most applications grow over time, and few need much capacity really.
Since you can’t buy a 1 GB storage array and can’t fill a 10 TB device to 100%, pricing per GB is complete nonsense. Plain old storage space just sort of tags along for the ride once you build a system capable of meeting all these other needs.
Data Reduction or Pricing Obfuscation?
Utilization of storage capacity has always been terrible, but improving capacity efficiency is worthless. The best you can do is over-tax your array or put all your data “eggs” in too few drive “baskets”. Achieving impressive capacity utilization just concentrates I/O, and this is the last thing you want to do with spinning hard disk drives.
This is why I suggest redirecting the conversation away from capacity requirements. The amount of GB to be used and the efficiency of that storage doesn’t matter all that much except for certain massive and rare applications. Once the array is big enough to handle the data, everything else is a wash.
This is also why I’m skeptical of data reduction technologies. Most applications would be better off optimizing for performance not reducing capacity used. And data reduction techniques like compression and deduplication quickly lead down the “your mileage may vary” rat hole.
Comparing Apples to Apples
Also read Grapples and Tangelos: Why it’s Impossible to Compare Fairly
There is only one way to do a real fair comparison between different storage devices: Specify all the requirements and let each vendor put forward whatever they have that meets all of them. Who really cares if vendor A’s disk-based solution is 10% utilized while vendor B’s flash array needs 1/5 the capacity? As long as you have a place to put it (and enough power to feed it) it’ll still work fine.
One serious challenge in enterprise storage is the rise of flash memory as a storage medium. Flash chips are expensive on a data capacity basis but amazingly cheap in terms of performance and environmental efficiency. Put another way, an SSD can’t storage as much data as a hard disk, but it delivers massive I/O capability in a tiny, rugged, low-power footprint.
Since most enterprise applications need only a few hundred GB of capacity, a few SSDs can be a compelling alternative to a “refrigerator” full of disks. It can be hard to convince the boss, but you really can fit a whole datacenter’s worth of storage I/O into a few rack units!
Pure and Nimbus
This is the issue facing flashy solid state devices from many companies, and the root of my headaches this week. Pure Storage hasn’t finalized pricing yet, but are claiming that their new device costs $5 per usable gigabyte. This is incredibly cheap for an array that can blow the doors off most enterprise gear!
Nimbus Data, on the other hand, sells their all-flash enterprise storage array for about $10 per GB. But this is not the end of the story, and Pure might even be more expensive than Nimbus! Or maybe not. It all depends on what you’re comparing.
Pure claims that their cost is half the price of most comparable flash storage array competitors, but this is where the questions start to appear. Is that $5 gigabyte usable or raw? Does it include high availability? And can I really store any old gigabyte of data there or is that a compressed/deduplicated gigabyte?
It turns out that the real cost of Pure Storage capacity is $20 per GB including RAID and an extra mirrored array for high availability. But since every byte written to the array is thin provisioned, deduplicated, and compressed, many customers will pay much less for actual data stored. And since it’s an all-SSD array, it’ll perform way better than a disk-based system, too.
Muddying the Waters
So why not just call it $5 per GB and be done with it? It’s confusing, that’s why, and your mileage will vary widely. Pure’s own slides show some applications getting 4:1 data reduction and others all the way up to 17:1. So these applications would be paying as low as $1.18 per GB or as high as $5.
But you can’t buy just 1 GB of storage from Pure. Their smallest array (which includes one controller and one shelf of SSDs) provides 5.5 TB of raw capacity, presumably using 24 256 GB SSDs. A high-availability configuration would include two controllers and two shelves of SSDs for 11 TB of raw storage. That’s going to cost almost a quarter of a million dollars according to my calculator. That’s one expensive gigabyte!
Of course no one would buy this array to store just a thousand megabytes. They would buy it to support a bunch of applications that need capacity and performance and efficiency and integration and everything else. And they can buy a mirrored pair of arrays from Pure Storage or Nimbus or Violin Memory or Texas Memory Systems or others at a variety of price points.
The only way to really compare these products is to spec them out on equal footing and see what the price tag looks like. These comparisons would include data reduction, but they would also have to bring in high availability and every other requirement of the applications they will support.
It’s way too difficult for me to do the pricing math for these systems, so I’m throwing in the towel. I’m thrilled to see all-flash arrays made available to IT buyers. This wouldn’t be possible without clever use of thin provisioning and data reduction, as well as smart software to overcome the limits of SSD.
I’m going to guess that Pure and Nimbus will cost about the same for similar configurations, though I’ll bet each believes they’re cheaper. Rather than get in the middle, I invite each company to post a comment below stating their case. I’ll even embed their responses into a future blog post on the subject if they get too long. Just don’t ask me to be the referee.
Update: Pure Storage responds with an outline of their pricing:
Image credit: Rotten Apple by Wappas
Dave Wright says
This is a great and timely article. It’s amazing to see how many analysts have been throwing around $/GB numbers from different vendors that are not only irrelevant to many customers (because capacity isn’t their limiting factor), they aren’ t the same metric (e.g. raw versus usable vs effective). Trying to say any system is cheaper or more expensive based on a random $/GB number is impossible.
$/GB numbers used to be a lot more comparable. When everyone ran on disk, and everyone had basically the same RAID options, the performance and capacity you got out of any given system was basically just a function of the number of spindles and the conversion from raw to usable gigabytes didn’t very much. Importantly, usable was always lower than raw.
However, with the introduction of data reduction technologies, suddenly comparing raw numbers makes no sense at all. Usable space can actually be larger than raw, but as you point out it can vary depending on the data set.
The only way to even start to make sense is to look at some actual efficiency numbers with some real data, which is where tools like SolidFire’s eScanner ( http://solidfire.com/products/tools/escanner/ ) or Pure’s PRE tool come in. They suddenly make the numbers a lot less squishy.
At the end of the day however, none of this apples to oranges comparison issue should detract from the reality of the situation – all-flash based arrays with data reduction technology CAN be competitive on a $/GB basis with disk based systems, while offering dramatically better metrics in almost every other area ($/IOPS, GB/watt, etc). The myth that flash is too expensive for most data needs to go away (see http://solidfire.com/blog/just-how-expensive-is-flash/ )
Matt Kixmoeller says
Stephen – thanks for this great post, you are dead-on in articulating why $/GB raw is no longer the way to look at the cost of storage for modern disk arrays.
We posted a response on the Pure Storage Blog that gives more details on how we derive our pricing, and some of the technology innovations that make delivering all-flash storage at under the cost of spinning disk possible.
Thank you for responding in such a long and thoughtful way on your blog. I’ve added the link to the post above!
Apparently this comment was auto-flagged by Disqus as spam. I’ve fixed that and whitelisted you!
I agree that these numbers used to be more meaningful when RAID5 was RAID5. Flash changes everything, and can be competitive as long as customers aren’t looking for a “100 TB array” to hold their 5 TB data set!
Apparently this comment was auto-flagged by Disqus as spam. I’ve fixed that and whitelisted you!