March 21, 2014

What is WRITE_SAME? Green Eggs and Ham!

State of the Art Thin Provisioning Series

One of the topics I've often written and spoken about is thin provisioning. This series of 11 articles is an edited version of my thin provisioning presentation from Interop New York 2010. I hope you enjoy it!

One of the sticky wickets that holds back thin provisioning is the need to communicate when capacity is no longer needed. Enterprise storage arrays can reclaim zeroed pages, but writing all those zeros can really fill up an I/O queue. This is where WRITE_SAME comes into the picture.

This is a really terrible name. It’s all-capital letters and has an underscore in the middle of it. We sound like engineers.

But WRITE_SAME is an interesting idea: Imagine you wanted to delete a terabyte of data using a storage system with zero page reclaim? You’d have to write a terabyte of zeroes. Well, that’s a lot of IO. You’re basically pouring zeroes across your PCI bus, HBA,network, and array.

Instead, imagine we could just say, “You know that page of zeroes that I just wrote? Can you please write that a million more times for me? Hey, thanks a lot.”

You could do it in one command. That’s what WRITE_SAME is. It’s a SCSI command that says, “That last thing that I just wrote, can you please write it again, and again, and again? Can you please write it a thousand times? Can you please write it over here, over there?” I sound like Dr. Seuss: You can write it in a car. You can write it at the bar. You can write it on a bike. You can write it with a pike.

This conserves IO, and is a really good thing. WRITE_SAME makes zero page reclaim that much more effective. Now if only we had a system that would actually use this command!

It’s popular with array vendors, because all they have to do is say, “Hey look, I already support zero page reclaim. It’s up to you guys up there in the stack to implement the rest of this problem. It’s not our problem. It’s your problem.”

As an aside, consider that, if you’re an array vendor, any problem that reduces the use of disk capacity is your problem. So, they may not all be that eager to have this work, I think, but I’m sure they’ll come around.

But imagine if you did this to an un-thin array. Imagine if the array didn’t support zero page reclaim on ingest and instead was post-processing. You could end up writing a terabyte of zeros on the back end of your storage system, or 10 terabytes or 100 terabytes of data, only to reclaim it later that day, or later in the week or later in the month. And what if your system didn’t support it at all? Suddenly, you’re flooded with IO requests on the storage-array side. So, basically, you’re conserving IO across the host and the network, but you’re potentially generating massive IO on the storage side – which is kind of a problem.

So, there are some issues here with this as well. But, we’re getting there.

  • the storage anarchist

    The suspense is killing me!

    WRITE_SAME itself is really NOT for Zero Page Reclaim…at least, it’s not an efficient approach since it can actually write the zeros on targets that aren’t “thin” (and “thin” is virtually transparent to hosts, file systems and applications). The real utility of WRITE_SAME is to reduce SAN traffic, something very helpful (for example) when VMware needs to re-initialize a VMDK for reuse (which is why it is now part of VAAI).

    I know you’re probably setting up for the next page to discuss WRITE_SAME (UNMAP) and the new UNMAP commands – two capabilities that can be advertised by the targets so that the host software can specifically say “I don’t need these blocks anymore.”

    Leaving your blog audience with the impression that WRITE_SAME(0×0000) is a useful approach to space reclamation is a bit misleading. Such cliffhangers work in live presentations where there are mere seconds until the next slide…here, not so much (IMHO).