Flash memory is awesome, but SSD isn’t a drop-in replacement for disk. Flash is totally different from spinning disk, yet most storage arrays still treat it as disk-like block storage. That’s why I was pleased to hear from startup Pure Storage, who are asking the core question, “what should an all-flash storage device look like?”
Another Storage Startup?
Pure Storage is a storage startup led by veterans of Veritas, NetApp, Sun, and Apple. The company has raised $55 million from key investors, including flash giant, Samsung.
Pure is working on what they call “the first all-flash enterprise array”, a slogan that will certainly draw the daggers of Nimbus Data, which has been selling an all-flash enterprise array for over a year. Like the Numbus S-Class, the Pure array relies on inline data reduction and compression to optimize the cost of storage, bringing flash capacity within reach of enterprise customers.
The Pure Storage FlashArray FA-300 array features active/active controllers with 12 CPU cores. These controllers attach to 24-bay drive shelves full of 2.5″ SSDs, and multiple units are clustered with 40 Gb InfiniBand. Pure Storage boasts that this combination allows 300,000 read and 180,000 write IOPS with less than one millisecond of latency. These are impressive performance numbers, but that’s not really the focus of the company.
Most beta customers use Pure Storage for a combination of performance and capacity. They are pleased to not have to worry about storage performance, but are also interested in reducing floor space and power demands of many-spindle storage arrays.
The smallest Pure Storage array (which includes one controller and one shelf of SSDs) provides 5.5 TB of raw capacity, presumably using 24 256 GB SSDs. A high-availability configuration would include two controllers and two shelves of SSDs for 11 TB of raw storage.
You should also read my follow-up piece on Pure’s pricing, When Pricing Gets Squishy Competition Heats Up
Optimized I/O
Pure also relies on MLC flash, a technical choice that will likely be the target of competitors. But the company insists they can overcome the limitations of inexpensive MLC flash (slower writes, reduced longevity) through intelligent software optimized for just this storage medium.
All I/O is thin provisioned and zero-detected, de-duplicated, and compressed in-line before it hits the SSDs. Since flash excels at random reads, data de-duplication does not have the performance impact most folks assume. In fact, de-duplicating data actually improves performance since less writing is required. Pure Storage uses a 512 B chunk rather than the larger chunks used by competitors, and they claim this gives a capacity advantage.
Pure’s array was designed from the ground up around flash, with minuscule latency and no tiering to spinning disk. The Pure Storage array does not use raw NAND flash but still relies on SSD (likely 470-series SATA drives sourced from partner Samsung). The Pure Storage “Purity” software optimizes I/O for these SSDs rather than relying on the in-drive software. They “cook” the data to the optimum chunk size, so the drive never needs to re-arrange I/O internally and performance does not diminish over time. Another optimization is I/O scheduling so drives are never written to and read from at the same time. Pure Storage also moves data over time for wear leveling, though it’s not clear how this interacts with the similar functionality already present in the SSDs.
Since flash memory has unique failure patterns, Pure Storage designed their own “RAID 3D” system to protect data. SSD drives sometimes fail entirely, but unrecoverable read errors (URE) are much more common. And as NAND flash ages (and device generations get finer geometry), error rate increases. Data is collected into a segment before writing, and each segment is written to all available drives. If a drive fails, data is re-protected in the background with at least dual parity. Since flash is so fast, rebuilding parity is much quicker than spinning disk-based systems. This wide striping also makes performance more consistent as drives fail and the system fills.
The Pure Storage Use Case
Pure Storage’s pricing is based on a 5:1 capacity reduction target, though the company claims that they often beat this in production, especially for VMware environments. Real-world data reduction cited by Pure Storage ranges from 4:1 for an Oracle environment to 17:1 for VMware.
The Pure Storage array is in its final beta round and the company expects to ship GA product by the end of the year. Key use cases are VMware and database environments – two high-I/O applications that have traditionally benefitted from flash storage.
Stephen’s Stance
It’s great to see fresh thinking in storage, and Pure Storage comes out of the gate with some impressive credentials: A top-tier team, excellent technical capabilities, and reasonable pricing. But it takes more than a great product to succeed in storage, and building awareness and sales are the next challenge for the company.
Leave a Reply