Granularity of Thin Provisioning Approaches

January 10, 2011 By Stephen 12 Comments

Although I consider it the main stumbling block for thin provisioning, communication (or lack thereof) is being addressed with metadata monitoring, WRITE_SAME, the Veritas Thin API, and other ideas. But communication isn’t the only issue.

Let’s talk about page sizes. You’ll often see vendors tossing this “softball” objection at their competitors, claiming that their (smaller) page size makes for more-effective thin provisioning. And that’s true, to a some extent, but perhaps not the end of the story.

Look at the top block in this stack. The light background box is the page, and the colored boxes represent data. If your storage is written in “pages” of this size, you can’t thin it.

What if we used a smaller page? What if my page is a quarter of that size, as in the second row? I still can’t thin it out, because my data is spread all over the place.

Remember worrying about fragmentation back in the days of DOS and Windows and FAT filesystems? It’s kind of like this.

Because we’re using zero page reclaim, the whole page has to be zero to be reclaimed. If your data is all over the place, if there’s even one bit that’s not zero on a page, we’re not going to reclaim that whole page.

Now let’s return to our illustration. If we use a little bit smaller page, as in the bottom two rows, we can reclaim some space. If we use a really tiny page, we can reclaim half the space even.

We’re still not reclaiming all the space, though. At the beginning of this series, I showed the “simplified perfect-world” thin provisioning illustration. In that picture, the half-empty barrel was perfectly reclaimed thanks to this technology. We will never get there unless we are using really minuscule pages. But we can get somewhat close. Maybe we can thin out three-quarters of the empty space.

But some vendors use really big pages. Some folks made fun of Hitachi for using 42 megabyte pages, since, if there’s one bit in 42 megabytes of potential ones or zeros, the Hitachi will not thin that. It also won’t migrate it for automated storage tiering. But others use even-bigger pages; up to a gigabyte in size. And 42 MB isn’t that bad in practice.

I know of a company that’s doing four-kilobyte pages. And EMC actually allocates one-gigabyte slices of storage for writing on the CLARiiON, even though their thin size is 8 KB. So is the CLARiiON page size 8 KB or 1 GB? It’s very confusing to me (and probably the customer too)…

The trouble with 4 K or 8 K pages is it makes an awful lot of pages to keep track of. Consider the analogy of hard disk drive sector sizes. An ATA disk could only get to 2.1 terabytes until recently, because they still used 512-byte sectors. And 512 bytes times the biggest 32-bit number is 2048 GB. So 512 bytes makes for greater efficiency in theory, but hurts scalability in practice. So, the disk drive industry is moving to 4 K sectors.

It’s exactly the same thing as with thin provisioning. So, you’ve got to keep track of all these gazillions and gazillions of pages. So, from a vendor perspective, you can save a lot of horsepower and make it a lot easier to implement if you have bigger pages. It also means you’re not moving stuff around as much when using these big pages for automated tiering.

I’m not going to throw rocks at HDS or anyone else over page sizes. I actually don’t think 42 MB is that bad, because the biggest problem with underutilization is not inside a file system. In my experience, the big problem is storage that’s not used at all.

When I used to do storage assessments, it was very common to find LUNs that were allocated ant not used at all; not even touched. Your page size doesn’t matter if a LUN is not even touched: It’s going to be thinned out no matter what. So, regardless of the page size, thin provisioning will probably save more space outside a filesystem than within one, especially if your systems administrators are doing a reasonably good job of storage management. And even if they’re not doing a good job, there’s probably 42 megs of zeros that can be thinned out anyway.

So, I’m not as worried about the size of the pages. Granularity is an architectural decision, and larger pages are not the end of the world. Ask your vendor if they support thin provisioning and what the granularity or page size is, and think about how that’s going to affect you. At the end of the day, it’s probably going to yield about the same result no matter what the page size is.

You might also want to read these other posts...

Comments

Storagezilla says

January 10, 2011 at 6:13 pm

Since slices and extents are a function of FAST I’ll address it like this, 8KB is currently the smallest unit of FAST granularity, the next smallest is 64KB FAST Cache page. Which is 8x8KB from storage as selected via the Clariion caching algorithims and then copied into FAST Cache.

A slice is 125000 8KB extents grouped together for data movement between storage pools. A slice could be twice the size or half the size of what it is now but it’ll still be a multiple of 8KB.

So by definition of what you’re covering and until I see otherwise, a “Page” is 8KB.
sfoskett says

January 10, 2011 at 6:36 pm

Does this mean that thin the granularity of thin provisioning on CLARiiON is 8 KB then? That’s the core question posed by this article, and I’m not sure of the answer still.

I really appreciate the additional info, by the way!
sfoskett says

January 10, 2011 at 6:46 pm

Testing Disqus to see if it will redirect to the wrong page…
sfoskett says

January 10, 2011 at 6:48 pm

Testing again. Don’t bother clicking through.
sfoskett says

January 10, 2011 at 6:56 pm

This is a test to diagnose Disqus. Don’t bother with it.
Fabio Rapposelli says

January 10, 2011 at 7:05 pm

You’re saying that a slice is 125000 * 8KB (which is roughly a GB) and that it can be twice or half the size, does that means that FAST can move 500MB (or 2000MB) slices between tiers?.
Storagezilla says

January 10, 2011 at 7:54 pm

I said a slice could be twice or half the size, or any size that’s a multiple of 8KB. But it isn’t, it’s 1GB.
Storagezilla says

January 10, 2011 at 8:03 pm

That’s an awkward way of saying it but since we’re only looking at the tip of the architecture iceberg, the majority of which is non-public, Let me put it like this.

Since extents are only allocated as they’re written I’d say the granularity is 8KB but we gather those 8KB extents into 1GB slices and it’s the 1GB slice which is shown to the end user when they’re looking at things in Unisphere.

To cut this down to the core, when Clariion goes about reclaiming zeros it’s at a granularity of 8KB.
sfoskett says

January 10, 2011 at 8:52 pm

So if a Clariion (I detest the “correct” capitalization) finds an entire 8KB page of zeroes, it’s removed and available for 8 KB of real data. But it also groups those (intelligently, according to the whitepaper I read and linked) 8 KB pages into 1 GB slices on various media. So an array will always be “using” 1 GB of capacity on any tier, but it almost all that space could be available for data anyway.

This is confusing, but it doesn’t really matter HOW the Clariion does it. What matters is the result: It’ll have post-thin utilization like the bottom row of my illustration, not the top row. Or at least, that’s what it looks like to me!
Storagezilla says

January 10, 2011 at 11:35 pm

Out of the the options in your chart, yes the bottom row.

But we deal with fragmentation by combining the free space into slices and deallocating all of it into the pool as free space.

And we’re now done talking about the mechanics.
the storage anarchist says

January 12, 2011 at 2:17 am

For reference/completeness:

* the Symmetrix Virtual Provisioning chunk/page size is 12x64KB tracks, or 768KB. VMAX FAST VP relocates as little 7.68MB at a time (10 VP chunk/pages) – less, actually, if all the VP pages haven’t been written yet.

* I think the DS8700 chunk/page size is 1GB – at least, 1GB is the Easy Tier unit of relocation.

* I don’t think the DS8800 supports thin provisioning yet (and if it does, it does NOT yet support Easy Tier)

* the SVC/DS7000 supports variable sized chunk/pages, but the smaller they are, the less total capacity (as the max number of pages is fixed).

It is possible and even efficient to track utilization stats for very small chunk/page/extents, but you have to be innovative on how you allocate the meta-data.

GPS Time Rollover Failures Keep Happening (But They’re Almost Done)

This is week “1111111111” in the GPS system. Tomorrow morning it will roll over to week “0000000000”. How well will various systems handle this change? Not well, judging by what we’ve seen so far!

The 2018 iPad Pro is a Beast!

The third-generation iPad Pro is a great machine but also a bellwether of change at Apple. It will be very hard for the rest of the mobile and client computing industry to keep up with this kind of progress!

Nimble Storage Rolls Out an All-Flash Array

February 24, 2016

It took longer than I expected for Nimble Storage to introduce an all-flash array, but their AF7000 looks to be a very credible offering. They’re targeting XtremIO and Pure with their marketing, but I expect HP, Dell, and especially NetApp to be cross-shopped more frequently. In that fight, I expect the Nimble AF7000 to be very attractive indeed!

What’s the Deal with Containers?

October 21, 2016

Lately, it seems like everyone just can’t stop talking about containers. But I’m sensing a distinct lack of real understanding of the technology from many people, not to mention lots of confusion about what containers really mean for today’s datacenter folks. So I set about learning more and figuring out for myself what the deal is with containers. Here’s where I’m at.

The End of Unlimited Data – Part 1: The Buffet

June 2, 2010

The headlines are all over the blogs: AT&T announced today that they are doing away with “unlimited” data plans and replacing them with limited packages at lower prices. I’m not going to repeat the news – Boy Genius Report has the best overview I’ve seen. Instead, I’m going to explain why I think this is a positive move for everyone involved, including AT&T, the customer, and the US wireless phone industry as a whole. This first post talks about the buffet mentality that got us here.

SMB 3 is Going to be Huge, in both Scope and Impact

May 6, 2012

Microsoft is about to release the third major revision to their ubiquitous network storage protocol, SMB. Windows Server 2012 and Hyper-V 3 will really highlight this technology, and I predict it will transform the way people think about networked storage for Windows systems. But SMB 3 is big in another way, too: there are tons of new features, and not all will be implemented by everyone.

Thoughts on the Modern Miracle of 3D Printing

July 28, 2015

Although I enjoy my AIO Robotics Zeus 3D printer, I’m under no illusions that it’s an economical or practical device. It’s a toy that takes me into the future, and I love being there!

It’s Time To Move Beyond Passwords (Especially On Web Sites)

January 8, 2016

Sure, single sign-on puts all your eggs in one basket. But this is vastly preferable to trusting that hundreds of third-party baskets are secure, especially when they prove on a weekly basis that they aren’t! It’s time to put distributed passwords behind us and switch to systems like SAML, both for businesses and consumers.

My 2012 Project: Improving Energy Efficiency

January 3, 2012

I am in the process of upgrading my own home to make it more energy efficient. I do this mainly as an exercise of faith and science, since my electric and gas bills are not currently all that expensive. But I just can’t countenance burning 10 times more electricity than I need to, even if I can afford it. It’s also an exercise in geekiness, since today’s lighting alternatives and appliances have an undeniable techno-cool factor about them.

How To Keep Your Family Activities In Sync With A Shared Google Calendar

April 18, 2010

Smartphones, computers, and iPads are proliferating in families today. Although my three kids do not (yet) have their own mobile phones, we still have quite a few devices with calendar functions: An iPhone each for me, my wife, and our au pair along with an iPad and a few computers. Using Google Calendar, we have set up an awesome shared calendar to keep all of our activities in sync. Read on for instructions!

You might also want to read these other posts...

Reader Interactions

Comments

Leave a Reply