Processing and Scheduling Thin Provisioning

February 22, 2011 by Stephen 3 Comments

Although the core issues with thin provisioning revolve around communication, it presents unique challenges to the storage array as well. We talked about granularity of pages, and the comments for that piece were extremely enlightening. Now let’s consider another key factor: Scheduling.

Note that the “provisioning” part is relatively easy to do on the fly: An array just has to allocate additional capacity as writes come in, which is something it does anyway. It’s the thin reclamation that poses a challenge, since this involves zero detection across a whole page of data in many cases.

Just like de-duplication, thin provisioning challenges the resources of the storage array to do background number crunching. And just like dedupe, the array engineers have a choice of when to do the reclamation processing: Well after writing or “in-line”. The extreme ends of this spectrum fall into two equally disappointing categories: Wholly ineffective or ridiculously intensive.

Let’s start with the “intensive” side: You could have the controller do thin provisioning automatically; that’s kind of what IBM does with SVC, for example, and 3PAR claims to do this too. The trouble is that the controller has to literally watch everything, and it’s got to reassemble whole pages, perhaps 42 MB or even one GB in cache. If it didn’t have all that data, it would have to go fetch it, put it into cache, look at it, make sure it was all zeros, then get rid of it. It’s really, really difficult to do automatic, in-line, thin provisioning. It’s a good thing to do, but it’s a hard thing to do.

So most vendors schedule thinning for later. In the “10 terabytes of zeros” example, they’re actually going to write 10 terabytes to disk, or at least through to cache. Then, at some point in the future, they’ll go back and reclaim that space. Some are pretty aggressive and reclaim capacity very frequently. Others are fairly lazy: The Drobo seems to reclaim only once or twice a day. A lot of people who have them are surprised when the thing springs to life and starts going, “Bada-bada-bada-bada-bada-bada.” Apparently it’s reclaiming storage at that time.

Some thin provisioning systems are even manually-initiated, and this is really pretty ineffective. The storage administrator has better things to do than reclaim storage all the time, so they are probably going to set a cron job to do it regularly at a specified time. If the system only does it on demand, that means that it doesn’t have the horsepower to do it automatically. Ergo, it’s sometimes going to conflict with “real work” and cause a problem.

I would look for a system that was fairly aggressive with thin reclamation. I was talking to the guys at Nimbus Data, for example, and they claim to do thin provisioning in-line all the time. I hope that we see more storage arrays that are doing that, and less that are doing it manually, on demand, because that’s just not as useful.

But considering that thin provisioning used to be almost useless, the fact that it’s now at least somewhat useful is gratifying.

You might also want to read these other posts...

Comments

Anthony Vandewerdt says

February 23, 2011 at 9:08 pm

Hi Steve, great post (as always).
Your correct in that the IBM SVC (model CF8) and Storwize V7000 does ‘zero detect’ on write (at point of ingress). This is possible when you have plenty of CPU power and fast memory throughput.
It also does zero detect if you want to create a volume copy (if you want the secondary to be thin provisioned). This is great for converting thick to thin on the fly.

The IBM XIV also does zero detect on the fly during Migrations (when we are pulling data off old storage and moving it into the XIV) and during replication (it doesn’t send zeros to its mirror partner). It also does zero detect during scrubbing (the process that runs to ensure data is confirmed to be readable and have good ‘parity’), to ensure no empty blocks get reported as used space. The scrubbing process runs constantly working its way through the entire machine over the course of several days.
sfoskett says

February 23, 2011 at 10:37 pm

Glad to have the confirmation about SVC. Thin on the fly is really an unusual feature, something that surprised me during my research.

And thanks about XIV too. Good to know.
Basil says

February 23, 2011 at 10:56 pm

ThP on the fly is one of the impressive features of 3PAR, it can be done without almost any impact due to specialized ASIC with zero-detection in silicon. So, with 3PAR you’ll get Thinly Provisioned writes, migrations, replications, physical copies. Also you’ll get deep integration with a number of reclamation/ThP frameworks in Oracle ASM, Veritas API, VMWare, etc.
And all of this with the 16K blocks!
Yes, I’m really impressed by the 3PAR ThP technologies:)

GPS Time Rollover Failures Keep Happening (But They’re Almost Done)

This is week “1111111111” in the GPS system. Tomorrow morning it will roll over to week “0000000000”. How well will various systems handle this change? Not well, judging by what we’ve seen so far!

Ranting and Raving About the 2018 iPad Pro

I remain enthusiastic about the iPad Pro, despite getting a scratched screen and my concerns about durability. It’s a worthy successor to the original and offers enough improvements that I’d recommend the upgrade for just about anyone who uses their iPad for serious work. It’s still not yet a laptop replacement, but this is due more to a lack of desktop-class software for iOS than anything in Apple’s control.

The Best Mac OS X Terminal Font: Glass TTY VT220

October 6, 2015

More than five years ago, I blogged about a “stupidly cool” terminal font. Now that Mac OS X isn’t a big cat anymore, I figured it was time to repeat that: If you’re an old-school computer nerd like me, Glass TTY VT220 is the coolest terminal font for Mac OS X!

It’s Time To Move Beyond Passwords (Especially On Web Sites)

January 8, 2016

Sure, single sign-on puts all your eggs in one basket. But this is vastly preferable to trusting that hundreds of third-party baskets are secure, especially when they prove on a weekly basis that they aren’t! It’s time to put distributed passwords behind us and switch to systems like SAML, both for businesses and consumers.

Follow the Yellow Brick Road to the Software-Defined Future

November 29, 2012

The Software-Defined Datacenter is a great concept, but it just won’t work. The big enterprise companies will never allow VMware (and daddy EMC) to commoditize them out of existence, so useful implementations will be rarer than ruby slippers. The best we can hope for is point enhancements to enable greater virtual machine mobility through SDN and improved storage integration.

A Complete List of VMware VAAI Primitives

November 10, 2011

VMwareâ€™s introduced the â€œvStorage APIs for Array Integrationâ€ (VAAI) in vSphere 4.1, and block-heads like me went nuts. Weâ€™ve been trying to integrate storage and servers for decades, and VMwareâ€™s APIs finally allowed this to work in truly seamless fashion. But the world of VAAI is a thicket of bizarre naming and puzzling functionality. Some VAAI primitives are ignored or even hidden! Letâ€™s take a look at the complete list.

Generation 3 drobo: Fall In Love All Over Again

April 9, 2015

I remain a huge fan of drobo generally, and the third-generation drobo remains the best choice for home storage. It’s the perfect storage device for the long haul, and the performance improvements make it a no-brainer. Get one.

Storage Changes in VMware vSphere 5

July 16, 2011

Once again, VMware added a ton of new storage enhancements to vSphere. With storage rapidly becoming the limiting factor in scalability and performance of virtual machine environments, this is no surprise. Also not surprising is the fact that major features like Policy-Driven Storage and Storage DRS (along with SIOC) are exclusive to “Enterprise Plus” licenses.

The Fat Middle: Today’s Enterprise Storage Array

August 31, 2014

Ask any project manager if it’s possible to deliver something that is fast, good, and cheap, and they’ll laugh. The phenomenon known as the Iron Triangle limits just about everything in the world from meeting all three conflicting requirements. Yet, for the last two decades, enterprise storage array vendors have been trying to deliver just this. How’s that working out?

What’s (Still) Wrong With Dropbox For Business

April 17, 2013

I am a heavy (and paying) user of Dropbox, using it both for business and personal storage and synchronization. Although I find the service incredibly useful, Dropbox is far from perfect, especially for business users. So I thought I would take a few moments to talk about what I’d like to see Dropbox improve.

You might also want to read these other posts...

Reader Interactions

Comments

Leave a Reply