Although the core issues with thin provisioning revolve around communication, it presents unique challenges to the storage array as well. We talked about granularity of pages, and the comments for that piece were extremely enlightening. Now let’s consider another key factor: Scheduling.
Note that the “provisioning” part is relatively easy to do on the fly: An array just has to allocate additional capacity as writes come in, which is something it does anyway. It’s the thin reclamation that poses a challenge, since this involves zero detection across a whole page of data in many cases.
Just like de-duplication, thin provisioning challenges the resources of the storage array to do background number crunching. And just like dedupe, the array engineers have a choice of when to do the reclamation processing: Well after writing or “in-line”. The extreme ends of this spectrum fall into two equally disappointing categories: Wholly ineffective or ridiculously intensive.
Let’s start with the “intensive” side: You could have the controller do thin provisioning automatically; that’s kind of what IBM does with SVC, for example, and 3PAR claims to do this too. The trouble is that the controller has to literally watch everything, and it’s got to reassemble whole pages, perhaps 42 MB or even one GB in cache. If it didn’t have all that data, it would have to go fetch it, put it into cache, look at it, make sure it was all zeros, then get rid of it. It’s really, really difficult to do automatic, in-line, thin provisioning. It’s a good thing to do, but it’s a hard thing to do.
So most vendors schedule thinning for later. In the “10 terabytes of zeros” example, they’re actually going to write 10 terabytes to disk, or at least through to cache. Then, at some point in the future, they’ll go back and reclaim that space. Some are pretty aggressive and reclaim capacity very frequently. Others are fairly lazy: The Drobo seems to reclaim only once or twice a day. A lot of people who have them are surprised when the thing springs to life and starts going, “Bada-bada-bada-bada-bada-bada.” Apparently it’s reclaiming storage at that time.
Some thin provisioning systems are even manually-initiated, and this is really pretty ineffective. The storage administrator has better things to do than reclaim storage all the time, so they are probably going to set a cron job to do it regularly at a specified time. If the system only does it on demand, that means that it doesn’t have the horsepower to do it automatically. Ergo, it’s sometimes going to conflict with “real work” and cause a problem.
I would look for a system that was fairly aggressive with thin reclamation. I was talking to the guys at Nimbus Data, for example, and they claim to do thin provisioning in-line all the time. I hope that we see more storage arrays that are doing that, and less that are doing it manually, on demand, because that’s just not as useful.
But considering that thin provisioning used to be almost useless, the fact that it’s now at least somewhat useful is gratifying.
Anthony Vandewerdt says
Hi Steve, great post (as always).
Your correct in that the IBM SVC (model CF8) and Storwize V7000 does ‘zero detect’ on write (at point of ingress). This is possible when you have plenty of CPU power and fast memory throughput.
It also does zero detect if you want to create a volume copy (if you want the secondary to be thin provisioned). This is great for converting thick to thin on the fly.
The IBM XIV also does zero detect on the fly during Migrations (when we are pulling data off old storage and moving it into the XIV) and during replication (it doesn’t send zeros to its mirror partner). It also does zero detect during scrubbing (the process that runs to ensure data is confirmed to be readable and have good ‘parity’), to ensure no empty blocks get reported as used space. The scrubbing process runs constantly working its way through the entire machine over the course of several days.
sfoskett says
Glad to have the confirmation about SVC. Thin on the fly is really an unusual feature, something that surprised me during my research.
And thanks about XIV too. Good to know.
Basil says
ThP on the fly is one of the impressive features of 3PAR, it can be done without almost any impact due to specialized ASIC with zero-detection in silicon. So, with 3PAR you’ll get Thinly Provisioned writes, migrations, replications, physical copies. Also you’ll get deep integration with a number of reclamation/ThP frameworks in Oracle ASM, Veritas API, VMWare, etc.
And all of this with the 16K blocks!
Yes, I’m really impressed by the 3PAR ThP technologies:)