November 1, 2014

Thinking About Storage In a New Way, From Cloud to Flash, with Dropbox and Fusion-io

I’ve been a storage revolutionary for quite a while, looking for new ways of data storage rather than technologies that perpetuate the same old approaches. That’s why I’m excited about the implications of two very different API access methods announced by Dropbox at DBX and by Fusion-io today at OSCON.

We need smarter, more integrated storage.

Dropbox App Storage

On the consumer side, there’s Dropbox, who announced Sync and Datastore APIs at their DBX conference this month. These APIs are interesting in and of themselves, but more so when one takes a big picture look at them: Dropbox is challenging the whole concept of mobile device storage, suggesting that temporary caches of a cloud datastore are more relevant.

It would be foolish to argue that distributed sync and key/value datastores aren’t taking over the mobile world, but these things are devilishly difficult to get right. Witness Apple’s continuing stumbles with iCloud, for example. But Dropbox has done a fantastic job of making sync work in the real world, and now they’re opening those systems to developers.

But Dropbox is also explicitly challenging the whole notion of file-based storage. This is another huge, and overdue, revelation. Why bother with files when what today’s apps really need is key/value lookups and blob storage? Watch the keynote above and consider what they’re saying: It’s a new world, and this is what storage ought to look like.

Fusion-io NVM Access

After taking a week to digest the implications of Dropbox’s mobility-enhancing storage strategy, I was contacted by Fusion-io regarding their announcements at Open Source Convention (OSCON) 2013. After a discussion with Brent Compton, Sr. Director of Product Management, I am convinced that this seemingly-unrelated announcement actually has a lot in common!

Fusion-io is announcing three contributions to open source:

  1. A key-value interface to flash, NVMKV – This is an API spec and library along with source code enabling applications directly to store data in non-volatile memory (e.g. Fusion-io flash cards) using a key/value pair rather than conventional file or block I/O.
  2. A modification of the Linux VM subsystem enabling better use of demand paging from non-volatile memory. Paging was basically given up for dead among Linux server guys, but makes a whole lot more sense in a NVM world!
  3. API specs for the Fusion-io flash translation layer enabling atomic access to non-volatile memory, including vectored atomic writes. This has already been implemented in MariaDB 5.5.31 and Percona Server 5.5.31-30.

Let’s take a moment to consider the Fusion-io announcement in a larger context: They’re not talking about conventional storage paradigms anymore. Instead, Fusion-io is pushing specific enhancements to real-world applications to use storage in a new way. This is the Internet datacenter flipside of the Dropbox announcement!

Here’s an example: MySQL databases (with Innodb) use a double write technique to ensure consistency: They write all data twice, then perform an fsync(), then proceed knowing the data is safely written. This is necessary because block storage can result in partial writes: Perhaps the system crashes after one or two 4K blocks were written, losing the remaining blocks and leaving the data in an inconsistent state. But Fusion-io’s atomic write API pushes responsibility for data consistency to the storage device: Issue an atomic write and the ioMemory card will ensure the data has been written completely. So MySQL can skip the double write and just commit data directly.

Atomic writes are practical for non-volatile memory devices because this is how they already function internally: Flash memory is handled in this way by the controller, so it’s a simple matter to expose this to the application and enable atomic writes. This has never been possible before because spinning disks just don’t work this way!

Fusion-io can even do vectored I/O, enabling multiple disjoint buffers to be updated in one atomic operation. Known as scatter-gather, this is another new paradigm for data storage since disks simply could not do this.

Then there’s the key/value API. This has exciting implications for databases, sure, but it goes well beyond that. Just like mobile applications, today’s “big data” systems (yeah, I said it) need key/value access not just block storage. This has long been handled by intermediary applications (databases) and those will likely continue. But instead of managing data on block storage, databases can commit a shard to a NVM key/value interface like the one Fusion-io just offered.

Stephen’s Stance

It’s not that we need a new storage array or company. Rather, we need a whole new way of “doing storage” that reflects the changing reality of both clients (mobile devices, web applications, etc) and storage targets (flash vs. disk). We already live in this new world of key/value datastores but storage technology has been slow to adapt. That’s why I’m excited about Dropbox, Fusion-io, and so many other “new ideas” companies!

It’s important to recognize that the “old school” companies aren’t completely unaware of this shift. The SNIA NVM Programming working group includes lots of familiar names in storage (Dell, EMC, HP, IBM, Intel, NetApp, Oracle, etc), all looking beyond today’s spinning disks. Just like me, they recognize that non-volatile memory is the future.