We Need a Storage Revolution - Stephen Foskett, Pack Rat

Revolution Array — I think this sentiment is just as valid today as when I posted it in 2008!

Although many discussions in the storage industry focus on the relative merits of one protocol or another, the conversation occasionally turns to the core issue at hand: We continue to patch together a system based on outdated concepts. Most storage protocols continue to mimic direct attached storage, and most of our so-called networks act as point to point channels. An ultra-modern virtualized storage infrastructure with all the latest bells and whistles still holds the concepts of block and file at its core. Whenever the storage industry has tried to bring about real storage management they have been stymied by a lack of context for data.

No amount of virtualization, and no new protocol, will fix this. Put simply, we need a storage revolution.

Channels, Blocks, and Files

Most innovation in the 1980s and early 1990s focused on moving storage out of the server. SCSI allowed disk to exist in a separate cabinet, RAID allowed multiple physical disks to become a single virtual one, and these were mixed to become the prototype storage array. Although SCSI allowed one-to-many connectivity, it was never a true peer-to-peer network, even once it was mixed with network concepts in the form of Fibre Channel.

Even today, SAN storage is focused on providing faster, more flexible, and feature-packed direct-attached storage. A modern virtual SAN hides a complex arrangement of caching, data protection, tiered storage, replication, and deduplication, masquerading the lot as a simple, lowly disk drive. It is sad but true that all of our work as an industry has been dedicated to recreating what we started with.

Networked file-based storage is no better. Although NAS devices have all the advanced features of their SAN cousins, they must present a simple file tree to the host to retain compatibility. File virtualization merely presents a larger homogenous tree.

Inside the server, too, features and complexity are hidden to retain a familiar file system format. Volume managers can do anything a virtualization device can, but must present their output as a simple (though virtual) disk drive. File systems, too, have added features but still present a familiar tree of mount points, inodes, and files. Even ZFS, possibly the most advanced combination of volume management and file system technology yet, must present a simple tree of storage to applications.

The Metadata Roadblock

This outdated paradigm, of disks and file trees, is ill-suited to today’s storage challenges. Data must be categorized so actions can be taken to preserve or destroy it based on policies. Data must be searchable so users and applications can find what they want. Data must be flexible so it can be used in new ways. Our antiquated notions are not capable of meeting these challenges.

One simple problem is that we lack context for our data. Most file systems merely assign to a file a name, location, owner, and security attributes. The most advanced can contain extended metadata, but this is rarely seen in practice since many applications cannot agree on how to use this data. Microsoft’s Office suite can store and share extended file attributes, for example, but these live inside the file rather than in the file system. The promise of expanded Office attributes is only realized in conjunction with a content management system like SharePoint which lies above the lowly file system.

What if the storage system could keep this data instead? What if it could logically group files according to project or client, mining keywords and authors, and maintaining revisions? These concepts are not new, having been implemented in content management systems for years, and certain elements appeared in file systems, like Apple’s HFS and VMS’ Files-11, for decades.

Cut Down the Tree

File metadata would allow advanced features, but truly taking advantage of them requires a more fundamental shift in the way applications access files. Rather than sticking to a traditional hierarchy of directories in a tree (which was, after all, simply a primitive metadata system), we should remove the tree altogether. Allow files to become data objects, identified by arbitrary attributes and managed according to an overarching policy.

This future vision is decidedly different from our current notion of storage, but is not so far off. Many organizations now rely on central data warehouses based on SQL-language relational databases. As many storage managers have grumbled, databases tend to ignore storage management concepts entirely, managing their own content independently.

But not all applications need a database back-end, so another initiative seeks to provide generic object storage for wider use. Called content-addressable storage or CAS, these devices have traditionally been used only for archival purposes, since that was their first market application. As vendors break free of proprietary interfaces in favor of open ones like XAM, CAS could transform storage itself by eliminating both file and block storage at once.

Similar concepts are already at work in the so-called Web 2.0 world. Non-traditional databases like Google BigTable, Amazon S3, and Hadoop allow massive scalability for object storage. API-sharing initiatives with many Web 2.0 companies can be seen as similar prototypical object storage frameworks. Any of these could be leveraged to provide a new world of data storage, and many are gaining traction even now.

Stephen’s Stance

Although traditional block storage is here to stay for disk drives, and tree-type file systems are likely to remain the foundation of operating system storage, new object-based concepts could change the world in fundamental ways. As applications become “web aware”, they also become object aware, increasing the likelihood of such a storage revolution. For the majority of applications, this new world would be a welcome one indeed.

You might also want to read these other posts...

Comments

stevetodd says

September 29, 2008 at 12:29 am

Stephen,
One concern I have with CAS eliminating block/file is the transactional semantics of changing data. Block and file approaches handle changing data quite well; CAS (and XAM) are geared towards fixed content (unchanging data). But I do agree with the stance that you are taking and the direction that your thoughts are going: concepts like CAS solve problems in ways that block and file cannot.
Steve
stevetodd says

September 29, 2008 at 4:29 am

Stephen,
One concern I have with CAS eliminating block/file is the transactional semantics of changing data. Block and file approaches handle changing data quite well; CAS (and XAM) are geared towards fixed content (unchanging data). But I do agree with the stance that you are taking and the direction that your thoughts are going: concepts like CAS solve problems in ways that block and file cannot.
Steve
Obdurodon says

April 30, 2011 at 5:19 pm

That’s exactly my concern too, Steve. I think what the world really needs is a system in which objects are named or found in a more sophisticated way than a POSIX-style hierarchy, but once found have POSIX-like semantics such as efficient random-byte updates in the middle of large objects. In more specific terms, something better than open() and mkdir() but exactly the same read() and write(). In fact, I think you and I and Mich discussed this idea at some length back in 2001 or so. Should’ve gone out and worked on it. 😉
Jay Sharp says

May 20, 2011 at 10:22 am

Amen!
Caitlin Bestler says

October 13, 2011 at 11:39 pm

I totally agree on eliminating the POSIX hierarchy as being an official part of the file system. The “/” separator can be demoted to being a human convention that has no meaning to the file system itself, just as “.extension” has been since Unix became dominant.

When hierarchical directories were created they served a real purpose — they limited how much directory information had to be read at once. Well systems today have a lot more than 64KB of RAM to work with, they can support gigantic flat directories with long file names.

I am less certain about Jeff Darcy’s comment. I think the other POSIX element that S3, GFS, HDFS, SWift/etc. have eliminated is partial file update. I think the Facebook Haystack designers hit the nail on the head when they championed a post-POSIX paradigm “where data is written once, read often, never modified, and rarely deleted.”