I’ve never been a fan of thin provisioning as a storage management tool. Don’t get me wrong, I love having thin provisioning in my toolkit to overcome the limitations of conventional filesystems. Thin provisioning just gets under my skin when folks try to use it to solve business problems like long deployment time and slow purchasing cycles. If you attended any of the thin provisioning sessions I’ve presented at Storage Decisions, Interop, E-Storm, or elsewhere then you’ve heard my wistful dreaming of real automatic provisioning without the hackery of thin provisioning systems. But perhaps I didn’t mention that actual automatic provisioning actually exists today! It’s one of the many things I love about API-driven cloud storage!
Why Shouldn’t People Use Thin Provisioning?
I’ve gone into this many times, but I probably said it most-succinctly in my post, Use Process Solutions For Process Problems, Technical Solutions For Technical Ones:
I have a long-standing love/hate relationship with thin provisioning, one of the many proposed technical solutions to the utilization problem. Thin provisioning eliminates many technical challenges: It simplifies adding capacity to the Drobo that serves as my home office storage center; the ability to automatically grow VMware images makes virtualization practical in the tight confines of a laptop; and it contribute to the usefulness of advanced solid-state storage systems like the new Nimbus S-Class. But I have serious reservations about using thin provisioning to over-subscribe enterprise storage systems due to failures of capacity planning and IT-to-business communication. Thin provisioning will only make process issues worse.
This isn’t always the case, of course. Some people (including me) use thin provisioning to solve technical issues relating to inflexible filesystems. As Mr. Backup notes, you shouldn’t go around saying that everyone using thin provisioning is stupid. It’s just that some people (and vendors) over-rely on thin provisioning and use it to cover up a more-serious business problem.
So What’s Really Wrong With Thin Provisioning?
The biggest problem with most thin provisioning implementations is that they’re not really all that thin most of the time. I don’t blame the storage vendors: It’s really, really hard to “do” thin provisioning with conventional filesystems and block storage.
I spend lots of time talking about this in my “thin session” presentation, but I’ll sum it up here:
- There’s a lack of information exchange between the application, file system, volume manager, and array controller so no one “knows” what to thin and when
- De-allocating on delete is a pain because most filesystems don’t really delete data
- There are two thin provisioning options, neither of which are simple or bulletproof:
- You can add smarts to the server, reporting back to the array when data is deleted
- Or you can add smarts to the array, snooping on the filesystem or reclaiming zeroes
- Thin provisioning granularity has an impact on effectiveness, but not as much as you might think
- Most arrays use lazy and ineffective thin reclamation; it takes real engineering to have in-line reclamation that doesn’t kill performance
- Filesystems just weren’t designed to be thinned – they fragment, have alignment issues, etc…
It all boils down to a simple fact: Conventional systems expect to be stored “fat” on local disks, not thinned and virtualized and mangled. It’s possible to make it work, but takes so much engineering and processing power that you start wondering if it’s all really worth the bother.
We Need Automatic Provisioning Instead
I have a dream for automatic provisioning rather than reverse-engineering filesystem layouts, adding shims and semaphores, and hunting for zeroes. I want a “storage platform” that has a concept of data stored and deleted and allows applications to communicate needs beyond basic provisioning. In short, I want cloud storage.
Most people get so bound up thinking about the “cloud” part of cloud storage (service providers, REST, public/private, etc) that they overlook the obvious “storage” benefits! Cloud storage protocols enable applications to do amazing things with storage, decoupling them from the old assumptions about “my disk” and “my filesystem”. Yes, the Internet has a speed limit (both for throughput and latency). Yes, cloud storage is more expensive on a per-used-bit basis than on-site hardware. But these limits evaporate when one looks at total cost of ownership or deploys local equipment.
Real automatic provisioning is enabled by cloud storage access methods, made real by cloud gateways, and goes way beyond what any conventional thin provisioning system is capable of:
- Imagine actually paying for exactly and only the storage capacity you are using! I use cloud storage services from Amazon and Nirvanix to host all of the images on this blog as well as the Field Day Roundtable Podcast video files. There is zero waste here: I’m paying only for used capacity and data transfer, and not a dime for empty space.
- Imagine having unlimited scalability with no migrations! I don’t care where my service providers store my data and what equipment they use. They have an SLA to meet, and they’ve always met it. Internal or private clouds could do the same, liberated from the vendor support matrix lock-in game.
- Imagine having your “bill” immediately reduced as soon as you delete data! I can temporarily use all the capacity I want, delete what I no longer need, and even switch providers with zero capacity “inertia” and clean-up. Public and private providers can immediately repurpose that capacity without any kind of reclamation process.
One of the dings against cloud storage is that it’s hard to start using. That is, since it uses a “proprietary protocol” it can’t be used by real-world applications. Well, more and more applications are supporting cloud providers directly. And cloud gateway products from companies like Nasuni, Cirtas, StorSimple, and others abound, allowing regular applications to go there, too. As Nasuni’s Rob Mason says, “there’s always more space available.”
We need to “get past” thin provisioning and the rest of the technical cruft that comes from using outdated concepts like block storage and filesystems. We need revolutionary storage that lets applications communicate in more-effective ways about capacity usage, retention requirements, replication, and similar needs. In short, we need cloud storage for a whole lot of reasons other than outsourcing of management. And it’s getting easier and easier to use cloud storage thanks to integration, cloud gateways, and the fact that the APIs are really simple and easy to use!
Watch my presentation, Extreme Tiered Storage: Flash, Disk, and Cloud for more detail on this topic!
Disclaimer: Although this is not a paid, sponsored, or for-hire article, Nasuni is a client of my consulting company, Foskett Services. I worked for Nirvanix and remain their customer. I am also a customer of Amazon and former customer or Rackspace. Symantec, maker of the Veritas storage products including the Thin API referenced here, is also a client and sponsor of Gestalt IT activities. I have lots of storage array companies as occasional clients and sponsors, including many like 3PAR, HP, Data Robotics, and EMC who make thin provisioning arrays.
Image Credit: “Push Pad to Open Automatic door. Right…” by @davestone