<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:series="http://unfoldingneurons.com/"
	>

<channel>
	<title>Stephen Foskett, Pack Rat &#187; data deduplication Archives  &#8211; Stephen Foskett, Pack Rat</title>
	<atom:link href="http://blog.fosketts.net/tag/data-deduplication/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.fosketts.net</link>
	<description>Understanding the accumulation of data</description>
	<lastBuildDate>Fri, 10 Feb 2012 17:40:43 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=</generator>
<atom:link rel="hub" href="http://pubsubhubbub.appspot.com" />
	<atom:link rel="hub" href="http://superfeedr.com/hubbub" />
			<item>
		<title>Microsoft Adds Data Deduplication to NTFS in Windows 8</title>
		<link>http://blog.fosketts.net/2012/01/03/microsoft-adds-data-deduplication-ntfs-windows-8/</link>
		<comments>http://blog.fosketts.net/2012/01/03/microsoft-adds-data-deduplication-ntfs-windows-8/#comments</comments>
		<pubDate>Tue, 03 Jan 2012 21:59:06 +0000</pubDate>
		<dc:creator>Stephen</dc:creator>
				<category><![CDATA[Enterprise storage]]></category>
		<category><![CDATA[Everything]]></category>
		<category><![CDATA[Gestalt IT]]></category>
		<category><![CDATA[Personal]]></category>
		<category><![CDATA[Terabyte home]]></category>
		<category><![CDATA[Virtual Storage]]></category>
		<category><![CDATA[CSV]]></category>
		<category><![CDATA[data deduplication]]></category>
		<category><![CDATA[deduplication]]></category>
		<category><![CDATA[file system]]></category>
		<category><![CDATA[Hyper-V]]></category>
		<category><![CDATA[I/O]]></category>
		<category><![CDATA[Microsoft]]></category>
		<category><![CDATA[NTFS]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[Rick Vanover]]></category>
		<category><![CDATA[server]]></category>
		<category><![CDATA[Windows 8]]></category>

		<guid isPermaLink="false">http://blog.fosketts.net/?p=6475</guid>
		<description><![CDATA[The next version of Microsoft Windows Server includes integrated data deduplication technology. Microsoft is positioning this as a boon for server virtualization and claims it has very little performance impact. But how exactly does Microsoft's de-duplication technology work?]]></description>
			<content:encoded><![CDATA[<div id="attachment_6628" class="wp-caption aligncenter" style="width: 310px;  border: 1px solid #dddddd; background-color: #f3f3f3; padding-top: 4px; margin: 10px; text-align:center; display: block; margin-right: auto; margin-left: auto;"><a href="http://static.fosketts.net/wp-content/uploads/2012/01/Microsoft-Windows-8-Dedupe-Stack.jpg" ><img class="size-medium wp-image-6628 " title="Microsoft Windows 8 Dedupe Stack" src="http://static.fosketts.net/wp-content/uploads/2012/01/Microsoft-Windows-8-Dedupe-Stack-300x225.jpg" alt="" width="300" height="225" /></a><p style=' padding: 0 4px 5px; margin: 0;'  class="wp-caption-text">Windows 8 server editions will include a filter driver for NTFS for data deduplication</p></div>
<p>The next version of Microsoft Windows Server includes <strong>integrated data deduplication technology</strong>. Microsoft is positioning this as a boon for server virtualization and claims it has very little performance impact. But how exactly does Microsoft&#8217;s de-duplication technology work?</p>
<h3>Introducing Windows 8 Deduplication</h3>
<p>Let&#8217;s make one thing clear right from the start: Microsoft started from a clean sheet and invented their own deduplication technology. This is not a licensed, cloned, or copied feature as far as I can tell. There are some clever aspects to it, along with a few head scratchers for folks like me who&#8217;ve seen lots of different deduplication approaches.</p>
<p><strong>Microsoft&#8217;s deduplication is layered onto NTFS in Windows 8</strong>, and will be a feature add-on for Server users. It is implemented as a filter driver on a per volume basis, with each volume a complete, self describing unit. It is cluster aware, and fully crash consistent on all operations. This is a pretty neat trick: As is typical for Microsoft, deduplication will be a simple, transparent feature.</p>
<p>Now let&#8217;s talk for a moment about what Windows 8 deduplication is not.</p>
<ul>
<li>It is a <strong>server-only</strong> feature, like so many of Microsoft&#8217;s storage developments. But perhaps we might see it deployed in low-end or home servers in the future.</li>
<li>It is <strong>not supported on boot or system volumes</strong>.</li>
<li>Although it should work just fine on removable drives, <strong>deduplication requires NTFS</strong> so you can forget about FAT or exFAT. And of course the connected system must be running a server edition of Windows 8.</li>
<li>Although <strong>deduplication does not work with clustered shared volumes</strong>, it is supported in Hyper-V configurations that do not use CSV.</li>
<li>Finally, deduplication does not function on encrypted files, files with extended attributes, tiny (less than 64 kB) files, or re-parse points.</li>
</ul>
<h3>Some Technical Details on Deduplication in Windows 8</h3>
<p>Microsoft Research spent 2 years experimenting with algorithms to find the &#8220;cheapest&#8221; in terms of overhead. <strong>They select a chunk size  for each data set</strong>. This is typically between 32 KB and 128 KB, but smaller chunks can be created as well. Microsoft claims that most real-world use cases are about 80 KB. The system processes all the data looking for &#8220;fingerprints&#8221; of split points and selects the &#8220;best&#8221; on the fly for each file.</p>
<p>After data is de-duplicated, Microsoft compresses the chunks and stores them in a special &#8220;chunk store&#8221; within NTFS. This is actually  part of the System Volume store in the root of the volume, so dedupe is volume-level. The entire setup is self describing, so a deduplication NTFS volume can be read by another server without any external data.</p>
<p>There is some redundancy in the system as well. Any chunk that is referenced more than x times (100 by default) will be kept in a second location. All data in the filesystem is checksummed and will be proactively repaired. The same is done for the metadata. The deduplication service includes a scrubbing job as well as a file system optimization task to keep everything running smoothly.</p>
<p>Windows 8 deduplication cooperates with other elements of the operating system. <strong>The Windows caching layer is dedupe-aware</strong>, and this will greatly accelerate overall performance. Windows 8 also includes a new &#8220;express&#8221; library that makes compression &#8220;20 times faster&#8221;. Compressed files are not re-compressed based on filetype, so zip files, Office 2007+ files, etc will be skipped and just deduped.</p>
<p>New writes are not deduped &#8211; <strong>this is a post-process technology</strong>. The data deduplication service can be scheduled or can run in &#8220;background mode&#8221; and wait for idle time. Therefore, I/O impact is between &#8220;none and 2x&#8221; depending on type. Opening a file is less than 3% greater I/O and can be faster if it&#8217;s cached. Copying a large file can make some difference (e.g. 10 GB VHD) since it adds additional disk seeks, but multiple concurrent copies that share data can actually improve performance.</p>
<h3>Stephen&#8217;s Stance</h3>
<p>Although I am intrigued by Microsoft&#8217;s new deduplication technology in Windows 8 server, I still have many questions about its usefulness and impact on performance. Concentrating duplicate data in the system volume makes sense from a technical perspective, but could lead to an I/O hotspot on the disk. This is especially true for external caching storage systems, since there is no integration between Microsoft deduplication and storage array features. I am particularly concerned about the use of deduplication with VHD files in Hyper-V, since it could eat up valuable system RAM and impact I/O performance.</p>
<p>If you would like to try Microsoft deduplication for yourself, I am happy to report that it is included in <a rel="nofollow" href="http://msdn.microsoft.com/en-us/windows/br229518" >the developer preview of Windows 8 that is available on Dev Center</a>. Here are <a rel="nofollow" href="http://social.msdn.microsoft.com/Forums/zh/windowsdeveloperpreviewgeneral/thread/3f601771-1400-47c4-9aec-bb9bc45b2d85" >a few commands</a> to get you started, and read <a href="http://www.techrepublic.com/blog/networking/configuring-windows-server-8-deduplication/4918" >Rick Vanover&#8217;s post</a> too!</p>
<pre>Import-Module ServerManager
Add-WindowsFeature -name FS-Data-Deduplication
Import-Module Deduplication
Enable-DedupVolume E:
get-dedupvolume</pre>
<blockquote><p>Note: I am a Microsoft MVP and Microsoft briefs me on upcoming technologies under NDA. This post is based on a Microsoft briefing from November which was said at the time not to be covered by any NDA. All of this information could be gleaned by experimenting with the Windows 8 developer preview, but it&#8217;s much easier to just go to the source.</p></blockquote>
<div id="crp_related"><h3>You might also want to read these other posts...</h3><ul><li><a href="http://blog.fosketts.net/2009/05/05/windows-storage-server-2008/"  rel="bookmark" class="crp_title">I Can Finally Talk About Windows Storage Server 2008!</a></li><li><a href="http://blog.fosketts.net/2008/09/25/deduplication-ready-prime-time/"  rel="bookmark" class="crp_title">Is Deduplication Ready for Prime Time?</a></li><li><a href="http://blog.fosketts.net/2008/08/19/windows-7-server-windows-server-2008-r2/"  rel="bookmark" class="crp_title">Windows 7 Server == Windows Server 2008 R2</a></li><li><a href="http://blog.fosketts.net/2009/05/27/windows-7-hands/"  rel="bookmark" class="crp_title">Windows 7 Is Here! In My Hands! But Why 8 DVDs?</a></li><li><a href="http://blog.fosketts.net/2008/09/16/deduplication-primary-storage/"  rel="bookmark" class="crp_title">Deduplication Coming to Primary Storage</a></li></ul></div><script src="http://feeds.feedburner.com/~s/sfoskett?i=http://blog.fosketts.net/2012/01/03/microsoft-adds-data-deduplication-ntfs-windows-8/" type="text/javascript" charset="utf-8"></script><hr />
<p><small>© sfoskett for <a href="http://blog.fosketts.net">Stephen Foskett, Pack Rat</a>, 2012. |
<a href="http://blog.fosketts.net/2012/01/03/microsoft-adds-data-deduplication-ntfs-windows-8/">Microsoft Adds Data Deduplication to NTFS in Windows 8</a>
<br/>
This post was categorized as <a href="http://blog.fosketts.net/category/everything/enterprisestorage/" title="View all posts in Enterprise storage" rel="category tag">Enterprise storage</a>, <a href="http://blog.fosketts.net/category/everything/" title="View all posts in Everything" rel="category tag">Everything</a>, <a href="http://blog.fosketts.net/category/gestaltit/" title="View all posts in Gestalt IT" rel="category tag">Gestalt IT</a>, <a href="http://blog.fosketts.net/category/everything/personal/" title="View all posts in Personal" rel="category tag">Personal</a>, <a href="http://blog.fosketts.net/category/everything/terabytehome/" title="View all posts in Terabyte home" rel="category tag">Terabyte home</a>, <a href="http://blog.fosketts.net/category/everything/virtualstorage/" title="View all posts in Virtual Storage" rel="category tag">Virtual Storage</a>. Each of my categories has its own feed if you'd like to filter out or focus on posts like this.<br/>
</small></p>]]></content:encoded>
			<wfw:commentRss>http://blog.fosketts.net/2012/01/03/microsoft-adds-data-deduplication-ntfs-windows-8/feed/</wfw:commentRss>
		<slash:comments>14</slash:comments>
		</item>
		<item>
		<title>How Does Dropbox Store Data?</title>
		<link>http://blog.fosketts.net/2011/07/11/dropbox-data-format-deduplication/</link>
		<comments>http://blog.fosketts.net/2011/07/11/dropbox-data-format-deduplication/#comments</comments>
		<pubDate>Mon, 11 Jul 2011 16:30:09 +0000</pubDate>
		<dc:creator>Stephen</dc:creator>
				<category><![CDATA[Apple]]></category>
		<category><![CDATA[Enterprise storage]]></category>
		<category><![CDATA[Everything]]></category>
		<category><![CDATA[Personal]]></category>
		<category><![CDATA[Terabyte home]]></category>
		<category><![CDATA[cloud storage]]></category>
		<category><![CDATA[data deduplication]]></category>
		<category><![CDATA[deduplication]]></category>
		<category><![CDATA[Dropbox]]></category>
		<category><![CDATA[Mac OS X]]></category>
		<category><![CDATA[MD5]]></category>
		<category><![CDATA[SHA-1]]></category>
		<category><![CDATA[TrueCrypt]]></category>

		<guid isPermaLink="false">http://blog.fosketts.net/?p=5863</guid>
		<description><![CDATA[Dropbox recently clarified (via their blog and privacy policy) that they "de-duplicate" user files. This has been known for quite a while, and is obvious to anyone who's had a large file "upload" instantly. But how exactly does Dropbox store files? Are they really de-duplicated or just single-instanced? I set out to discover the answer.]]></description>
			<content:encoded><![CDATA[<p>Dropbox recently clarified (via their <a href="http://blog.dropbox.com/?p=846" >blog</a> and <a href="https://www.dropbox.com/terms#privacy" >privacy policy</a>) that they &#8220;de-duplicate&#8221; user files. This has been known for quite a while, and is obvious to anyone who&#8217;s had a large file &#8220;upload&#8221; instantly. But how exactly does Dropbox store files? Are they really de-duplicated or just single-instanced? I set out to discover the answer.</p>
<h3>Single Instance Storage</h3>
<p>It&#8217;s fairly simple for a system to eliminate duplicate data by storing only a single instance of multiple identical files. In other words, if you and I both upload &#8220;Presentation.pptx&#8221; and it&#8217;s bit-for-bit identical, it would be a simple matter to store just one copy.</p>
<p>Dropbox definitely does this. I proved it with a simple experiment:</p>
<ol>
<li>Create a new 10 MB encrypted disk image in TrueCrypt (so it&#8217;ll be 100% unique, random data)</li>
<li>Move it to the Dropbox folder and wait a few minutes as it uploads</li>
<li>Copy the file with a new name to the folder and notice that it &#8220;uploads&#8221; instantly</li>
</ol>
<p>Dropbox is at least single-instancing storage. This helps users, since it speeds uploads and reduces bandwidth usage. It helps Dropbox in the same way, but goes further since they still &#8220;charge&#8221; files against your account whether they&#8217;re single-instanced or not.</p>
<p>Note that this single-instancing works across users and geographies. I gave a file to a friend to upload to a different Dropbox account, and saw the same &#8220;acceleration effect.&#8221; This would be quite useful to users and the company for files like iTunes songs which are identical and widespread.</p>
<h3>Clashing MD5 Hashes?</h3>
<div id="attachment_5866" class="wp-caption aligncenter" style="width: 310px;  border: 1px solid #dddddd; background-color: #f3f3f3; padding-top: 4px; margin: 10px; text-align:center; display: block; margin-right: auto; margin-left: auto;"><a href="http://static.fosketts.net/wp-content/uploads/2011/07/HashClash.png" ><img class="size-medium wp-image-5866" title="HashClash" src="http://static.fosketts.net/wp-content/uploads/2011/07/HashClash-300x64.png" alt="" width="300" height="64" /></a><p style=' padding: 0 4px 5px; margin: 0;'  class="wp-caption-text">Three files with identical sizes and MD5 hashes but different names? Creepy!</p></div>
<p>A global single-instance storage system sounds great, but it opens the door to hash collision issues. Imagine if you and I both uploaded identical files. Both would have the same &#8220;fingerprint&#8221; and Dropbox would only store it once. Now imagine instead that, out of coincidence or malice, I uploaded a file with the same fingerprint as yours but different contents. This is <a href="http://www.schneier.com/blog/archives/2005/02/cryptanalysis_o.html" >not so far-fetched as it seems</a>, and could lead to all sorts of security nightmares.</p>
<p>A common and compromised file checksum method is MD5, so I decided to test how Dropbox handles files of identical size, name, and MD5 hash using the &#8220;<a href="http://www.win.tue.nl/hashclash/Nostradamus/" >Nostradamus Attack</a>&#8221; PDFs generated by Marc Stevens. My tests show that Dropbox correctly handled the files I tried, and no combination of uploading and naming could force it to incorrectly store the right file. So Dropbox either doesn&#8217;t use MD5 or uses a combination of hashing and other mechanisms. Testing other schemes is left as an exercise to the reader.</p>
<p>One more thought: The fact that de-duplication is mentioned in the &#8220;privacy&#8221; section of the Dropbox policies raises my eyebrows, since it indicates that they see this hash collisions as a matter of privacy rather than data corruption. This indicates that Dropbox is both aware of and susceptible to hash collision attacks generally, though obviously not as simply as creating a bogus MD5 match.</p>
<blockquote><p>Note: Dropbox is well aware of this issue, having <a href="http://razorfast.com/2011/04/25/dropbox-attempts-to-kill-open-source-project/" >recently squashed</a> an open-source exploit called <a href="http://forwardfeed.pl/index.php/2011/04/24/dropship-successor-to-torrents-eng/" >Dropship</a>!</p></blockquote>
<h3>Sub-File De-Duplication</h3>
<p>Data de-duplication is like single-instancing, but it applies to some subset of data. Some enterprise storage systems de-duplicate at multi-megabyte levels, while others are far more granular.</p>
<p>To test whether Dropbox de-duplicates data, I devised a simple experiment:</p>
<ol>
<li>Create a new local copy of my existing random TrueCrypt file</li>
<li>Add a single byte to the end using the &#8220;cat&#8221; command</li>
<li>Copy the resulting file to Dropbox</li>
<li>Watch as Dropbox takes just a few seconds to upload the new file</li>
</ol>
<p>This test proves that Dropbox does indeed de-duplicate at the sub-file level. Since it took a bit longer to upload that would be expected for a single byte, we can see that Dropbox &#8220;chunks&#8221; files for hashing and uploading.</p>
<h3>De-Duplication Granularity</h3>
<p>The next question is just what size chunks or blocks Dropbox uses to de-duplicate data. To test this, I created various blocks of random data using TrueCrypt and experimented to see where the &#8220;stair-steps&#8221; were in terms of upload time.</p>
<p>My tests used four basic building blocks of 512 KB, 1024 KB, 2048 KB, and 4096 KB in size. Guessing that Dropbox used one of these sizes for their chunking system, I assumed these would quickly demonstrate the answer.</p>
<div id="attachment_5870" class="wp-caption aligncenter" style="width: 310px;  border: 1px solid #dddddd; background-color: #f3f3f3; padding-top: 4px; margin: 10px; text-align:center; display: block; margin-right: auto; margin-left: auto;"><a href="http://static.fosketts.net/wp-content/uploads/2011/07/Comparison-of-Dropbox-Transfer-Time-for-Various-Concatenated-Object-Sizes.jpg" ><img class="size-medium wp-image-5870" title="Comparison of Dropbox Transfer Time for Various Concatenated Object Sizes" src="http://static.fosketts.net/wp-content/uploads/2011/07/Comparison-of-Dropbox-Transfer-Time-for-Various-Concatenated-Object-Sizes-300x202.jpg" alt="" width="300" height="202" /></a><p style=' padding: 0 4px 5px; margin: 0;'  class="wp-caption-text">On my Mac, Dropbox clearly uses a 4 MB &quot;chunk&quot; size for deduplication</p></div>
<p>First, I uploaded each file individually and watched as Dropbox took about 30 seconds per MB. This will vary greatly, of course, but the absolute performance doesn&#8217;t matter. Only relative performance matters for demonstrating chunking.</p>
<p>Next, I concatenated each file with itself to create a new file twice as large. This would be ideally &#8220;chunkable&#8221; since it consists of exactly identical data with a nice, clean, evenly-divisible &#8220;border&#8221;. I uploaded each of these and noticed that the &#8220;4096 KB x 2&#8243; file uploaded nearly instantly, while all others took the expected amount of time.</p>
<p>I repeated this with &#8220;x 3&#8243;, &#8220;x 4&#8243;, and &#8220;x 8&#8243; files and noticed that the 4096 KB (4 MB) &#8220;barrier&#8221; was very obvious. Whenever a file contained 4096 KB or less of data Dropbox had seen before, it single-instanced it. Any time it saw a unique &#8220;block&#8221; smaller than this, it uploaded it fresh.</p>
<p>This proves, at least in the case of my own Mac OS X install of Dropbox, that a 4 MB chunk size is used for de-duplication.</p>
<h3> Stephen&#8217;s Stance</h3>
<p>Dropbox is a very useful service, and I appreciate the technology they use to make it work. By single-instancing storage, the company is able to keep costs and transfer time in check and offer a basic service for free for many users. Despite the recent security issue, I continue to use Dropbox myself and would not hesitate to recommend it. But I do suggest using your own encryption for any sensitive data, as demonstrated in my recent post, <a href="http://blog.fosketts.net/2011/07/05/mac-dropbox-encrypted-volume/" >Mac Users, Secure Your Stuff in Dropbox</a>.</p>
<p>I remain somewhat concerned about the privacy and security implications of global de-duplication of shared random data. If they use SHA-1 hashes alone, which I suspect, there is a chance that an object will not be stored correctly once 2^80 (or perhaps <a href="http://www.schneier.com/blog/archives/2005/02/sha1_broken.html" >2^69</a> or even <a rel="nofollow" href="http://lukenotricks.blogspot.com/2009/05/cost-of-sha-1-collisions-reduced-to-252.html" >2^52</a>) objects are stored. This would lead to issues of data corruption or inadvertent disclosure. This is a very remote chance indeed, but &#8220;<a rel="nofollow" href="http://en.wikipedia.org/wiki/Birthday_problem" >birthday problems</a>&#8221; like this work against hashing systems. I would love to hear from Dropbox regarding how they prevent this from happening, including disclosure of their methods of hashing data. It&#8217;s nice to see the company taking responsibility by disclosing this in their privacy policy, though!</p>
<blockquote><p>Update: Dropbox apparently does indeed use raw SHA256 hashes to &#8220;uniquely&#8221; identify data, and <a href="http://news.ycombinator.com/item?id=2478567" >this can be exploited in a number of ways</a>.</p></blockquote>
<div id="crp_related"><h3>You might also want to read these other posts...</h3><ul><li><a href="http://blog.fosketts.net/2011/07/05/mac-dropbox-encrypted-volume/"  rel="bookmark" class="crp_title">Mac Users, Secure Your Stuff in Dropbox</a></li><li><a href="http://blog.fosketts.net/2011/03/03/multiple-macs-sync-dropbox/"  rel="bookmark" class="crp_title">Keep Multiple Macs in Sync with Dropbox</a></li><li><a href="http://blog.fosketts.net/2011/03/01/google-dropbox-revolutionized-laptop-migration/"  rel="bookmark" class="crp_title">How Google and Dropbox Revolutionized My Laptop Migration</a></li><li><a href="http://blog.fosketts.net/2011/03/05/pile-interesting-links-march-4-2011/"  rel="bookmark" class="crp_title">Back From the Pile: Interesting Links, March 4, 2011</a></li><li><a href="http://blog.fosketts.net/2011/11/17/itunes-match-vbr-mp3-files-heres-fix/"  rel="bookmark" class="crp_title">iTunes Match Does Not Like VBR MP3 Files: Here&#8217;s How to Fix It</a></li></ul></div><script src="http://feeds.feedburner.com/~s/sfoskett?i=http://blog.fosketts.net/2011/07/11/dropbox-data-format-deduplication/" type="text/javascript" charset="utf-8"></script><hr />
<p><small>© sfoskett for <a href="http://blog.fosketts.net">Stephen Foskett, Pack Rat</a>, 2011. |
<a href="http://blog.fosketts.net/2011/07/11/dropbox-data-format-deduplication/">How Does Dropbox Store Data?</a>
<br/>
This post was categorized as <a href="http://blog.fosketts.net/category/everything/apple/" title="View all posts in Apple" rel="category tag">Apple</a>, <a href="http://blog.fosketts.net/category/everything/enterprisestorage/" title="View all posts in Enterprise storage" rel="category tag">Enterprise storage</a>, <a href="http://blog.fosketts.net/category/everything/" title="View all posts in Everything" rel="category tag">Everything</a>, <a href="http://blog.fosketts.net/category/everything/personal/" title="View all posts in Personal" rel="category tag">Personal</a>, <a href="http://blog.fosketts.net/category/everything/terabytehome/" title="View all posts in Terabyte home" rel="category tag">Terabyte home</a>. Each of my categories has its own feed if you'd like to filter out or focus on posts like this.<br/>
</small></p>]]></content:encoded>
			<wfw:commentRss>http://blog.fosketts.net/2011/07/11/dropbox-data-format-deduplication/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>The Three Requirements To Overcome Inertia</title>
		<link>http://blog.fosketts.net/2011/01/12/requirements-overcome-inertia/</link>
		<comments>http://blog.fosketts.net/2011/01/12/requirements-overcome-inertia/#comments</comments>
		<pubDate>Wed, 12 Jan 2011 14:36:34 +0000</pubDate>
		<dc:creator>Stephen</dc:creator>
				<category><![CDATA[Computer History]]></category>
		<category><![CDATA[Personal]]></category>
		<category><![CDATA[Virtual Storage]]></category>
		<category><![CDATA[10 GbE]]></category>
		<category><![CDATA[architecture]]></category>
		<category><![CDATA[data deduplication]]></category>
		<category><![CDATA[Ethernet]]></category>
		<category><![CDATA[FCoE]]></category>
		<category><![CDATA[Fibre Channel]]></category>
		<category><![CDATA[inertia]]></category>
		<category><![CDATA[Isaac Newton]]></category>
		<category><![CDATA[Mac OS 9]]></category>
		<category><![CDATA[MS-DOS]]></category>
		<category><![CDATA[Palm]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[return on investment]]></category>
		<category><![CDATA[Token Ring]]></category>
		<category><![CDATA[x86]]></category>

		<guid isPermaLink="false">http://blog.fosketts.net/?p=4741</guid>
		<description><![CDATA[In Philosophiæ Naturalis, Sir Isaac Newton defined inertia. Although he was referring to physical objects, the power of inertia affects companies, markets, and relationships in the same manner.  Humans are creatures of habit, and change is challenging.  When faced with a choice of continuing along the same road or branching off in a new direction, most will choose familiarity.]]></description>
			<content:encoded><![CDATA[<div id="attachment_4742" class="wp-caption aligncenter" style="width: 310px;  border: 1px solid #dddddd; background-color: #f3f3f3; padding-top: 4px; margin: 10px; text-align:center; display: block; margin-right: auto; margin-left: auto;"><a href="http://static.fosketts.net/wp-content/uploads/2011/01/Balanced-Rock-by-softwareguy888.jpg" ><img class="size-full wp-image-4742" title="Balanced Rock by softwareguy888" src="http://static.fosketts.net/wp-content/uploads/2011/01/Balanced-Rock-by-softwareguy888.jpg" alt="" width="300" height="335" /></a><p style=' padding: 0 4px 5px; margin: 0;'  class="wp-caption-text">Once something is in place, it&#39;s hard to get it to move again</p></div>
<p>In Philosophiæ Naturalis, Sir Isaac Newton defined inertia as follows:</p>
<blockquote><p>The vis insita, or innate force of matter, is a power of resisting by which every body, as much as in it lies, endeavours to preserve its present state, whether it be of rest or of moving uniformly forward in a straight line.</p></blockquote>
<p>Although Newton was referring to physical objects, the power of inertia affects companies, markets, and relationships in the same manner.  Humans are creatures of habit, and change is challenging.  When faced with a choice of continuing along the same road or branching off in a new direction, most will choose familiarity.</p>
<h3>Inertia in IT Architecture</h3>
<p>Consider the impact of inertia on IT architecture: once the solution is in place, it tends to remain there for a very long time.  This rule applies to practices, architectures, solutions, and hardware and software.  It explains the continued presence of Token Ring, MS-DOS, Mac OS 9, and Palm organizers in so many companies.  It also explains the curious devotion IT pros field toward solutions that are backward compatible: Ethernet, Windows, Intel x86 architecture, and so on.</p>
<p>Once, while visiting the data center of a midsize financial institution, I spotted a stack of old IBM PC desktop computers in the corner.  The company had purchased a company, which itself had purchased a bank many years ago.  The loans from that long ago and far off institution were still serviced by this archaic hardware and software.  The company&#8217;s IT staff had squirreled away half a dozen replacement computers so they could migrate the application to new old hardware in the event of a failure.  If this isn’t inertia, I don’t know what it is.</p>
<h3>Overcoming Inertia</h3>
<p>An external force is required to overcome inertia, and one must desire to initiate a change.  New products and solutions must not merely be attractive, it must also be compelling enough to overcome this inertia.  In my experience, there are three reasons that companies change direction when it comes to IT architecture:</p>
<ol>
<li>A noticeable irrefutable <strong>return on investment (ROI)</strong></li>
<li><span style="white-space: pre;">A</span><span style="white-space: pre;"> </span>tangible and necessary <strong>performance benefit</strong></li>
<li>A unique and desirable <strong>function</strong></li>
</ol>
<p>Many new technologies show promise in all three areas, including 10 Gigabit Ethernet, Fibre Channel over Ethernet (FCoE), server virtualization, and data deduplication.  But these potential benefits are not necessarily compelling in all IT environments.</p>
<p>A company with a substantial investment in Fibre Channel SAN hardware may find that upgrading to 8 Gb Fibre Channel is more compelling than a switch to converged networking and FCoE.  Many companies have found it hard to justify the additional cost of data compression or deduplication technology when compared with the decreasing cost of capacity or the benefits of improved utilization through better storage management.  The growth of server virtualization has been steady, but the hold-outs indicate that many companies find it hard to justify the technology.</p>
<h3>Stephen&#8217;s Stance</h3>
<p>Contrary to our nerd dreams, mere technical superiority does not guarantee the success of a new product or solution.  It must be better, faster, and cheaper to achieve widespread success.  In short, it must demonstrate a compelling case, or inertia will set in and derail its progress.</p>
<div><em>Image credit: Balanced Rock by </em><a rel="nofollow" href="http://www.flickr.com/photos/30452074@N06/" ><em>softwareguy888</em></a></div>
<div id="crp_related"><h3>You might also want to read these other posts...</h3><ul><li><a href="http://blog.fosketts.net/2010/08/16/dell-3par-enterprise-storage/"  rel="bookmark" class="crp_title">Dell + EqualLogic, Exanet, Ocarina, 3Par = What?</a></li><li><a href="http://blog.fosketts.net/2011/01/17/pile-interesting-links-january-14-2011/"  rel="bookmark" class="crp_title">Back From the Pile: Interesting Links, January 14, 2011</a></li><li><a href="http://blog.fosketts.net/2012/01/05/unresolved-questions-fcoe/"  rel="bookmark" class="crp_title">Eight Unresolved Questions About FCoE</a></li><li><a href="http://blog.fosketts.net/2010/07/22/stephen-fosketts-50-free-capacity-guarantee/"  rel="bookmark" class="crp_title">Stephen Foskett&#8217;s 50% Free Capacity Guarantee!</a></li><li><a href="http://blog.fosketts.net/2011/05/20/fcoe-iscsi-convergence-ethernet/"  rel="bookmark" class="crp_title">FCoE vs. iSCSI &#8211; Making the Choice</a></li></ul></div><script src="http://feeds.feedburner.com/~s/sfoskett?i=http://blog.fosketts.net/2011/01/12/requirements-overcome-inertia/" type="text/javascript" charset="utf-8"></script><hr />
<p><small>© sfoskett for <a href="http://blog.fosketts.net">Stephen Foskett, Pack Rat</a>, 2011. |
<a href="http://blog.fosketts.net/2011/01/12/requirements-overcome-inertia/">The Three Requirements To Overcome Inertia</a>
<br/>
This post was categorized as <a href="http://blog.fosketts.net/category/everything/computerhistory/" title="View all posts in Computer History" rel="category tag">Computer History</a>, <a href="http://blog.fosketts.net/category/everything/personal/" title="View all posts in Personal" rel="category tag">Personal</a>, <a href="http://blog.fosketts.net/category/everything/virtualstorage/" title="View all posts in Virtual Storage" rel="category tag">Virtual Storage</a>. Each of my categories has its own feed if you'd like to filter out or focus on posts like this.<br/>
</small></p>]]></content:encoded>
			<wfw:commentRss>http://blog.fosketts.net/2011/01/12/requirements-overcome-inertia/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>IBM&#8217;s Storwize V7000: 100% SVC; 0% Storwize</title>
		<link>http://blog.fosketts.net/2010/10/07/ibm-storwize-v7000-svc/</link>
		<comments>http://blog.fosketts.net/2010/10/07/ibm-storwize-v7000-svc/#comments</comments>
		<pubDate>Thu, 07 Oct 2010 21:16:34 +0000</pubDate>
		<dc:creator>Stephen</dc:creator>
				<category><![CDATA[Enterprise storage]]></category>
		<category><![CDATA[Everything]]></category>
		<category><![CDATA[Gestalt IT]]></category>
		<category><![CDATA[compression]]></category>
		<category><![CDATA[data deduplication]]></category>
		<category><![CDATA[IBM]]></category>
		<category><![CDATA[scale-out]]></category>
		<category><![CDATA[Storwize]]></category>
		<category><![CDATA[SVC]]></category>
		<category><![CDATA[Tony Pearson]]></category>
		<category><![CDATA[V7000]]></category>

		<guid isPermaLink="false">http://blog.fosketts.net/?p=3852</guid>
		<description><![CDATA[Today, IBM alerted the world that they had not fallen asleep at the wheel by kicking out an awfully-impressive midrange storage array, the Storwize V7000. This seems like an excellent device, filled with proven engineering borrowed from the successful SAN Volume Controller (SVC) line of storage virtualization products. But closer examination (and IBM's own Tony Pearson) reveal that it contains exactly nothing from their Storwize acquisition apart from the name.]]></description>
			<content:encoded><![CDATA[<p>Today, IBM alerted the world that they had not fallen asleep at the wheel by kicking out an awfully-impressive midrange storage array, the Storwize V7000. This seems like an excellent device, filled with proven engineering borrowed from the successful SAN Volume Controller (SVC) line of storage virtualization products. But closer examination (and IBM&#8217;s own <a href="http://twitter.com/az990tony/status/26653205787"  target="_blank">Tony Pearson</a>) reveal that it contains exactly nothing from their Storwize acquisition apart from the name.</p>
<h3>SVC 6.1 + Disk Hardware = V7000</h3>
<p>Let&#8217;s get one thing out of the way immediately: As I&#8217;ve said many times (including on stage at Storage Decisions last month), SVC is about the only IBM storage product I genuinely like. Its well-engineered, useful, and performs well. It&#8217;s just too bad its native habitat is a jungle of weird and expensive IBM gear.</p>
<p>SVC is really an enterprise storage array without any disks, just as HDS&#8217; USP VSP is a storage virtualization engine with disks. It does all sorts of great things, from thin provisioning to replication to automatic tiered storage to painless migration (once you&#8217;re migrated to it, at least). Fibre Channel comes in, magic happens, and Fibre Channel comes out. And it runs on commodity servers, which surely gives IBM a healthy profit margin but doesn&#8217;t seem to translate into lower cost for customers.</p>
<div id="attachment_3855" class="wp-caption aligncenter" style="width: 236px;  border: 1px solid #dddddd; background-color: #f3f3f3; padding-top: 4px; margin: 10px; text-align:center; display: block; margin-right: auto; margin-left: auto;"><a href="http://static.fosketts.net/wp-content/uploads/2010/10/v700-parentage4.png" ><img class="size-medium wp-image-3855" title="v700-parentage4" src="http://static.fosketts.net/wp-content/uploads/2010/10/v700-parentage4-226x300.png" alt="" width="226" height="300" /></a><p style=' padding: 0 4px 5px; margin: 0;'  class="wp-caption-text">Green = SVC 5; Pink = SVC 6.1. No Storwize.</p></div>
<p>The new Storwize V7000 is <a href="https://www.ibm.com/developerworks/mydeveloperworks/blogs/storagevirtualization/?lang=en"  target="_blank">essentially</a> the SVC software running on <a href="https://www.ibm.com/developerworks/mydeveloperworks/blogs/InsideSystemStorage/?lang=en"  target="_blank">server hardware</a> that includes both dual controllers and a bunch of internal hard disk drives. This can connect to up to nine &#8220;dumb&#8221; expansion storage enclosures. Hardware-wise, it&#8217;s very like the typical midrange <a href="http://www.thestoragearchitect.com/2010/08/24/choosing-between-monolithic-and-modular-architectures-part-i/"  target="_blank">modular</a> storage systems sold by EMC (CLARiiON), HP (EVA), HDS (AMS), and NetApp.</p>
<p>Software-wise <a rel="nofollow" href="http://storagebuddhist.wordpress.com/2010/10/07/ibms-new-midrange-v7000-with-easy-tier-external-virtualization/"  target="_blank">the V7000 is all SVC</a>. Much of the software is directly derived from SVC 5.1 (green stuff in IBM&#8217;s diagram), while some new tech is mixed in, too. But pretty much everything (green, blue, pink) is shared with SVC 6.1 other than the hardware. It&#8217;s just incredible what advanced software running on commodity hardware can do, and IBM is right up there with folks like HP and EMC who are adopting this engineering model.</p>
<h3>Where&#8217;s the Storwize?</h3>
<p>Then there&#8217;s that name. This isn&#8217;t just the V7000, it&#8217;s the <a href="http://www-03.ibm.com/systems/storage/disk/storwize_v7000/index.html"  target="_blank">Storwize V7000</a>. When I heard the name, I was expecting that it would include some data reduction/optimization/compression/whatever technology from Storwize, the company IBM <a href="http://www.networkcomputing.com/deduplication/ibm-acquires-storwize.php"  target="_blank">acquired</a> in July. This would match EMC&#8217;s acquisition of Data Domain, Dell&#8217;s buy of Ocarina, and HP&#8217;s rollout of their cool StorOnce software.</p>
<p>But there&#8217;s no Storwize in the V7000 apart from the name. This is a straight-ahead midrange storage system with no special bit-crunching powers apart from the thin provisioning already offered by SVC. I asked the IBM folks about this, and they confirmed that they needed a name and thought Storwize was fitting.</p>
<div id="attachment_3856" class="wp-caption aligncenter" style="width: 310px;  border: 1px solid #dddddd; background-color: #f3f3f3; padding-top: 4px; margin: 10px; text-align:center; display: block; margin-right: auto; margin-left: auto;"><a href="http://static.fosketts.net/wp-content/uploads/2010/10/Screen-shot-2010-10-07-at-4.35.48-PM.png" ><img class="size-medium wp-image-3856" title="Screen shot 2010-10-07 at 4.35.48 PM" src="http://static.fosketts.net/wp-content/uploads/2010/10/Screen-shot-2010-10-07-at-4.35.48-PM-300x144.png" alt="" width="300" height="144" /></a><p style=' padding: 0 4px 5px; margin: 0;'  class="wp-caption-text">Right from the horse&#39;s mouth. No Storwize software here (yet).</p></div>
<p><strong>Stephen&#8217;s Stance</strong></p>
<p>With everyone and their brother (well, EMC, HP, Dell, and NetApp) rolling out primary storage deduplication, I expect this situation will change. Perhaps &#8220;Storwize&#8221; will become the IBM equivalent of &#8220;StorageWorks&#8221; &#8211; sprayed across every product. Or maybe it will become IBM&#8217;s midrange brand. But sooner or later I expect IBM will include their compression technology, too (I dare not call it &#8220;data reduction&#8221; or face <a href="http://twitter.com/az990tony/status/26653737309"  target="_blank">The Wrath of Tony</a>).</p>
<p>So the Storwize V7000 is a really nice midrange product built on proven software and ought to compete nicely with EMC, HP, and HDS. It&#8217;s maybe even a little better than the competing modular storage products. My interest would be piqued, however, by news of a larger scale-out cluster of V7000 systems. The SVC can already scale out like this, with 4-pair I/O groups.</p>
<p>But even without compression and scale-out, I could see myself recommending the V7000 to midrange storage buyers. Good work, IBM! Now, let&#8217;s talk about the rest of your storage products&#8230;</p>
<p><em>V7000 Diagram courtesy of IBM</em></p>
<div id="crp_related"><h3>You might also want to read these other posts...</h3><ul><li><a href="http://blog.fosketts.net/2011/05/09/ibm-adds-vaai-support-xiv-svc/"  rel="bookmark" class="crp_title">IBM Adds VAAI Support to XIV and SVC</a></li><li><a href="http://blog.fosketts.net/2010/10/17/back-from-the-pile-interesting-links-october-17-2010/"  rel="bookmark" class="crp_title">Back From the Pile: Interesting Links,  October 17, 2010</a></li><li><a href="http://blog.fosketts.net/2011/02/08/vmware-vaai-storage-array-support-plain-english/"  rel="bookmark" class="crp_title">VMware VAAI Storage Array Support in Plain English</a></li><li><a href="http://blog.fosketts.net/2010/09/29/hp-product-line-decoder-ring/"  rel="bookmark" class="crp_title">Stephen&#8217;s HP Product Line Decoder Ring</a></li><li><a href="http://blog.fosketts.net/2010/08/23/3par-bidding-war/"  rel="bookmark" class="crp_title">Everyone Loves 3Par &#8211; Here&#8217;s Why!</a></li></ul></div><script src="http://feeds.feedburner.com/~s/sfoskett?i=http://blog.fosketts.net/2010/10/07/ibm-storwize-v7000-svc/" type="text/javascript" charset="utf-8"></script><hr />
<p><small>© sfoskett for <a href="http://blog.fosketts.net">Stephen Foskett, Pack Rat</a>, 2010. |
<a href="http://blog.fosketts.net/2010/10/07/ibm-storwize-v7000-svc/">IBM&#8217;s Storwize V7000: 100% SVC; 0% Storwize</a>
<br/>
This post was categorized as <a href="http://blog.fosketts.net/category/everything/enterprisestorage/" title="View all posts in Enterprise storage" rel="category tag">Enterprise storage</a>, <a href="http://blog.fosketts.net/category/everything/" title="View all posts in Everything" rel="category tag">Everything</a>, <a href="http://blog.fosketts.net/category/gestaltit/" title="View all posts in Gestalt IT" rel="category tag">Gestalt IT</a>. Each of my categories has its own feed if you'd like to filter out or focus on posts like this.<br/>
</small></p>]]></content:encoded>
			<wfw:commentRss>http://blog.fosketts.net/2010/10/07/ibm-storwize-v7000-svc/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Is Deduplication Ready for Prime Time?</title>
		<link>http://blog.fosketts.net/2008/09/25/deduplication-ready-prime-time/</link>
		<comments>http://blog.fosketts.net/2008/09/25/deduplication-ready-prime-time/#comments</comments>
		<pubDate>Thu, 25 Sep 2008 21:38:53 +0000</pubDate>
		<dc:creator>Stephen</dc:creator>
				<category><![CDATA[Enterprise storage]]></category>
		<category><![CDATA[Virtual Storage]]></category>
		<category><![CDATA[data deduplication]]></category>
		<category><![CDATA[deduplication]]></category>
		<category><![CDATA[Enterprise Storage Forum]]></category>
		<category><![CDATA[greenBytes]]></category>
		<category><![CDATA[NetApp]]></category>
		<category><![CDATA[Ocarina]]></category>
		<category><![CDATA[Riverbed]]></category>
		<category><![CDATA[Storage Decisions]]></category>

		<guid isPermaLink="false">http://blog.fosketts.net/?p=781</guid>
		<description><![CDATA[Deduplication is here for backup, but it is not yet ready for prime time in primary storage applications]]></description>
			<content:encoded><![CDATA[<p>In an article for Enterprise Storage Forum, Paul Shread comments on the <a href="http://www.enterprisestorageforum.com/continuity/news/article.php/3774031"  target="_blank">positive reviews that various deduplication technologies</a> got at <a href="http://blog.fosketts.net/2008/09/24/storage-decisions-new-york-2008-feedback/"  target="_self">Storage Decisions</a> from analysts and end users. He suggests that less than 10% of attendees were using deduplication already, but that others were inspired by their experience and would be using it soon.</p>
<p>Paul goes on to quote me, saying I &#8220;didn&#8217;t think primary data de-duplication technology was ready for prime time just yet.&#8221; I absolutely did say these words, but I am not sure if my point came across.</p>
<p>I&#8217;ve recently expounded about <a href="http://blog.fosketts.net/2008/09/16/deduplication-primary-storage/"  target="_self">the benefits of deduplication technology</a>, but have warned that it might not be all it&#8217;s cracked up to be in <em>primary storage</em> environments. By &#8220;primary&#8221; I mean those storage environments serving mission-critical applications. Although dedupe works great for backup and archiving, the random I/O, low latency, and high throughput of primary storage (and especially virtualized servers) might be too much for current systems. And as of now, only <a href="http://www.netapp.com/us/products/platform-os/dedupe.html"  target="_blank">NetApp</a>, <a href="http://www.riverbed.com/company/news/press_releases/press_091508.php"  target="_blank">Riverbed</a> (<a href="http://www.byteandswitch.com/document.asp?doc_id=163827"  target="_blank">soon</a>), and startups <a href="http://green-bytes.com"  target="_blank">greenBytes</a> (see <a href="http://blog.fosketts.net/2008/09/15/greenbytes-embraces-extends-zfs/"  target="_self">my story</a>) and <a href="http://www.ocarinanetworks.com/"  target="_blank">Ocarina</a> (more on them another time) were willing to go on record with me as supporting deduplication of primary storage.</p>
<p>So what I <em>meant</em> was that deduplication is not yet ready for prime time <em>in primary storage applications</em>. No one should hesitate to use the technology for backup or archiving at this point, but make sure you do a thorough evaluation of the specific product you are selecting to <a href="http://www.backupcentral.com/content/view/192/47/"  target="_blank">make sure it delivers the performance you require</a>!</p>
<div id="crp_related"><h3>You might also want to read these other posts...</h3><ul><li><a href="http://blog.fosketts.net/2008/09/16/deduplication-primary-storage/"  rel="bookmark" class="crp_title">Deduplication Coming to Primary Storage</a></li><li><a href="http://blog.fosketts.net/2008/03/12/de-duplication-goes-mainstream/"  rel="bookmark" class="crp_title">De-Duplication Goes Mainstream</a></li><li><a href="http://blog.fosketts.net/2011/05/27/storage-decisions-chicago/"  rel="bookmark" class="crp_title">Storage Decisions Chicago: All About Capacity Optimization</a></li><li><a href="http://blog.fosketts.net/2011/09/22/data-reduction-condensed-version/"  rel="bookmark" class="crp_title">Data Reduction: the Condensed Version</a></li><li><a href="http://blog.fosketts.net/2011/09/02/storage-decisions-york-capacity-optimization/"  rel="bookmark" class="crp_title">Storage Decisions New York: Capacity Optimization</a></li></ul></div><script src="http://feeds.feedburner.com/~s/sfoskett?i=http://blog.fosketts.net/2008/09/25/deduplication-ready-prime-time/" type="text/javascript" charset="utf-8"></script><hr />
<p><small>© sfoskett for <a href="http://blog.fosketts.net">Stephen Foskett, Pack Rat</a>, 2008. |
<a href="http://blog.fosketts.net/2008/09/25/deduplication-ready-prime-time/">Is Deduplication Ready for Prime Time?</a>
<br/>
This post was categorized as <a href="http://blog.fosketts.net/category/everything/enterprisestorage/" title="View all posts in Enterprise storage" rel="category tag">Enterprise storage</a>, <a href="http://blog.fosketts.net/category/everything/virtualstorage/" title="View all posts in Virtual Storage" rel="category tag">Virtual Storage</a>. Each of my categories has its own feed if you'd like to filter out or focus on posts like this.<br/>
</small></p>]]></content:encoded>
			<wfw:commentRss>http://blog.fosketts.net/2008/09/25/deduplication-ready-prime-time/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Deduplication Coming to Primary Storage</title>
		<link>http://blog.fosketts.net/2008/09/16/deduplication-primary-storage/</link>
		<comments>http://blog.fosketts.net/2008/09/16/deduplication-primary-storage/#comments</comments>
		<pubDate>Tue, 16 Sep 2008 19:28:37 +0000</pubDate>
		<dc:creator>Stephen</dc:creator>
				<category><![CDATA[Computer History]]></category>
		<category><![CDATA[Enterprise storage]]></category>
		<category><![CDATA[Features]]></category>
		<category><![CDATA[Virtual Storage]]></category>
		<category><![CDATA[Atari]]></category>
		<category><![CDATA[Byte]]></category>
		<category><![CDATA[capacity optimization]]></category>
		<category><![CDATA[CAS]]></category>
		<category><![CDATA[Centera]]></category>
		<category><![CDATA[compression]]></category>
		<category><![CDATA[data deduplication]]></category>
		<category><![CDATA[Data Domain]]></category>
		<category><![CDATA[deduplication]]></category>
		<category><![CDATA[DR-DOS]]></category>
		<category><![CDATA[EMC]]></category>
		<category><![CDATA[FilePool]]></category>
		<category><![CDATA[greenBytes]]></category>
		<category><![CDATA[Huffman coding]]></category>
		<category><![CDATA[Microsoft]]></category>
		<category><![CDATA[NetApp]]></category>
		<category><![CDATA[Riverbed]]></category>
		<category><![CDATA[single-instance storage]]></category>
		<category><![CDATA[Stacker]]></category>
		<category><![CDATA[VMware]]></category>
		<category><![CDATA[VTL]]></category>

		<guid isPermaLink="false">http://blog.fosketts.net/?p=626</guid>
		<description><![CDATA[Although deduplication of storage is nothing new, with Data Domain and other making hay with the technique for years, it has never been ready for prime time - reduction of active primary storage applications like email and databases. Instead, deduplication has been relegated to second- or third-tier status, deduplicating archives and backup data. But change is in the air, and deduplication vendors are starting to bustle towards the bright lights of primary storage.]]></description>
			<content:encoded><![CDATA[<p style="padding-left: 30px;"><em>This is a follow-up to my story, <a href="http://blog.fosketts.net/2008/03/12/de-duplication-goes-mainstream/"  target="_self">De-Duplication Goes Mainstream</a></em></p>
<p>Although deduplication of storage is nothing new, with Data Domain and other making hay with the technique for years, it has never been ready for prime time &#8211; reduction of active primary storage applications like email and databases. Instead, deduplication has been relegated to second- or third-tier status, deduplicating archives and backup data. But change is in the air, and deduplication vendors are starting to bustle towards the bright lights of primary storage.</p>
<h3>Stone Knives and Bear Skins</h3>
<p>We have all been here before, of course. Back at the dawn of the personal computer era, data compression was a hot topic of conversation. I recall being so impressed by an article in <a rel="nofollow" href="http://en.wikipedia.org/wiki/Byte_(magazine)"  target="_blank">Byte</a> (1986:5, p99) outlining <a rel="nofollow" href="http://en.wikipedia.org/wiki/Huffman_coding"  target="_blank">Huffman coding</a> that I tried cooking up an implementation in Atari BASIC. Lossless compression has a magical pull to the geek in many of us &#8211; redundant data just <em>wants</em> to be eliminated!</p>
<div id="attachment_630" class="wp-caption alignright" style="width: 254px;  border: 1px solid #dddddd; background-color: #f3f3f3; padding-top: 4px; margin: 10px; text-align:center; float: right;"><a href="http://blog.fosketts.net/wp-content/uploads/2008/09/sc0003b3d4.png" ><img class="size-full wp-image-630 " title="Stacker" src="http://blog.fosketts.net/wp-content/uploads/2008/09/sc0003b3d4.png" alt="Stacker dominated the disk compression world - until Microsoft introduced DOS 6.0" width="244" height="254" /></a><p style=' padding: 0 4px 5px; margin: 0;'  class="wp-caption-text">Stacker dominated the disk compression world - until Microsoft introduced DOS 6.0</p></div>
<p>Companies soon applied <a href="http://www.zisman.ca/Articles/1993/DOS6.html"  target="_blank">compression to primary storage</a>, especially the limited storage in personal computers. <a rel="nofollow" href="http://en.wikipedia.org/wiki/Stac_Electronics#Microsoft_lawsuit"  target="_blank">Stacker</a> was a hit after 1990, until Microsoft built a workalike, called DoubleSpace, into DOS 6.0 in 1993, leading to a historical lawsuit. I personally used the ADDSTOR disk compression built into DR-DOS 6.0 to stretch two more years out of the 20 MB MFM hard drive in my AT&amp;T PC6300 at <a href="http://wpi.edu"  target="_blank">WPI</a>.</p>
<p>But something funny happened in the late 1990s: Compression began to lose its luster. Compressing data always takes quite a bit of CPU power, but this was offset somewhat by the truncated data transfers and more-efficient file system layout afforded in early PCs. But as disks got larger and faster, using precious CPU time to save space seemed less and less compelling. Today, although nearly every operating system includes built-in compression of files, folders, or perhaps disks, these features are rarely used. And compression was never popular in the performance-sensitive enterprise space.</p>
<h3><strong>Deduplication Has a Nice Ring</strong></h3>
<p>Although traditional fine-grained compression has not been very successful in the enterprise, its lanky cousin, single-instance storage, has long found niche jobs. Applications from databases to email systems to file servers have long had the ability to recognize to requests to store the exact same file or record, and to store just a single instance in this case. Even file systems have the ability to do single instance storage through the use of links, though this is initiated by the user rather than in an automated fashion.</p>
<p>In the late 1990s, FilePool began developing a <a rel="nofollow" href="http://en.wikipedia.org/wiki/Content-addressable_storage"  target="_blank">content-addressable storage</a> device, which was acquired by EMC in 2001. This device, later known as the Centera, was one of a number of storage platforms targeted at the archiving market introduced this decade. At the same time, <a rel="nofollow" href="http://en.wikipedia.org/wiki/Virtual_tape_library"  target="_blank">virtual tape libraries</a> made the jump from the mainframe to open systems. Both devices, being outside the critical path of performance but offering massive capacity, were well-suited to implement advanced <a rel="nofollow" href="http://en.wikipedia.org/wiki/Capacity_optimization"  target="_blank">capacity optimization</a> technologies that combined the concepts of compression with single-instance storage. Thus was created the modern world of data deduplication.</p>
<p>What we think of as deduplication is neither fish nor fowl: It assesses larger &#8220;chunks&#8221; of data than compression technologies, delivering greater capacity savings and potentially reducing performance impact, but is more flexible than single-instancing, recognizing the similarities within files or objects.</p>
<p>But it is still maddeningly difficult to scale deduplication while maintaining performance. Rather than fight to maintain reasonable write throughput, most deduplication products have switched to post-processing, deferring their work to quieter times.</p>
<h3><strong>It&#8217;s Not Just for Breakfast</strong></h3>
<p>Regardless of their methods or underlying technology, no deduplication vendor has stood up to support challenging low-latency or high-throughput production applications, however. <a href="http://blog.fosketts.net/2008/03/12/de-duplication-goes-mainstream/"  target="_self">NetApp was the first to raise the issue of support for production applications</a>, but although they tout the technology for VMware, they haven&#8217;t exactly been shouting from the rooftops to get their A-SIS deduplication technology deployed in other high-I/O applications. And I haven&#8217;t seen Hifn&#8217;s card yet.</p>
<p>Yesterday, I mentioned that greenBytes was adding deduplication to their ZFS-based storage array for primary data. And now <a href="http://www.theregister.co.uk/2008/09/16/deduplicating_primary_storage/"  target="_blank">Riverbed has fired another shot</a> over the bow, repurposing their (deduplicating) WAN accelerator product for primary (file) storage. They might be able to pull it off, too, since they have a long list of customers who are already enjoying the technology in production. It&#8217;s not a stretch to suggest that Riverbed&#8217;s appliances can scale to handle production data loads. Although it&#8217;s file-only, I can imagine quite a few scenarios where this tech could really yield benefits. Could we come full-circle, with deduplication finally reaching the enterprise storage world?</p>
<div id="crp_related"><h3>You might also want to read these other posts...</h3><ul><li><a href="http://blog.fosketts.net/2008/09/25/deduplication-ready-prime-time/"  rel="bookmark" class="crp_title">Is Deduplication Ready for Prime Time?</a></li><li><a href="http://blog.fosketts.net/2011/09/22/data-reduction-condensed-version/"  rel="bookmark" class="crp_title">Data Reduction: the Condensed Version</a></li><li><a href="http://blog.fosketts.net/2008/09/15/greenbytes-embraces-extends-zfs/"  rel="bookmark" class="crp_title">greenBytes Embraces and Extends ZFS</a></li><li><a href="http://blog.fosketts.net/2009/02/05/compression-encryption-deduplication-replication/"  rel="bookmark" class="crp_title">Compression, Encryption, Deduplication, and Replication: Strange Bedfellows</a></li><li><a href="http://blog.fosketts.net/2011/05/27/storage-decisions-chicago/"  rel="bookmark" class="crp_title">Storage Decisions Chicago: All About Capacity Optimization</a></li></ul></div><script src="http://feeds.feedburner.com/~s/sfoskett?i=http://blog.fosketts.net/2008/09/16/deduplication-primary-storage/" type="text/javascript" charset="utf-8"></script><hr />
<p><small>© sfoskett for <a href="http://blog.fosketts.net">Stephen Foskett, Pack Rat</a>, 2008. |
<a href="http://blog.fosketts.net/2008/09/16/deduplication-primary-storage/">Deduplication Coming to Primary Storage</a>
<br/>
This post was categorized as <a href="http://blog.fosketts.net/category/everything/computerhistory/" title="View all posts in Computer History" rel="category tag">Computer History</a>, <a href="http://blog.fosketts.net/category/everything/enterprisestorage/" title="View all posts in Enterprise storage" rel="category tag">Enterprise storage</a>, <a href="http://blog.fosketts.net/category/features/" title="View all posts in Features" rel="category tag">Features</a>, <a href="http://blog.fosketts.net/category/everything/virtualstorage/" title="View all posts in Virtual Storage" rel="category tag">Virtual Storage</a>. Each of my categories has its own feed if you'd like to filter out or focus on posts like this.<br/>
</small></p>]]></content:encoded>
			<wfw:commentRss>http://blog.fosketts.net/2008/09/16/deduplication-primary-storage/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>greenBytes Embraces and Extends ZFS</title>
		<link>http://blog.fosketts.net/2008/09/15/greenbytes-embraces-extends-zfs/</link>
		<comments>http://blog.fosketts.net/2008/09/15/greenbytes-embraces-extends-zfs/#comments</comments>
		<pubDate>Mon, 15 Sep 2008 15:13:29 +0000</pubDate>
		<dc:creator>Stephen</dc:creator>
				<category><![CDATA[Enterprise storage]]></category>
		<category><![CDATA[Virtual Storage]]></category>
		<category><![CDATA[CDP]]></category>
		<category><![CDATA[Copan]]></category>
		<category><![CDATA[data deduplication]]></category>
		<category><![CDATA[Data Domain]]></category>
		<category><![CDATA[deduplication]]></category>
		<category><![CDATA[greenBytes]]></category>
		<category><![CDATA[MAID]]></category>
		<category><![CDATA[snapshot]]></category>
		<category><![CDATA[spin-down]]></category>
		<category><![CDATA[Sun]]></category>
		<category><![CDATA[thin provisioning]]></category>
		<category><![CDATA[Thumper]]></category>
		<category><![CDATA[ZFS]]></category>

		<guid isPermaLink="false">http://blog.fosketts.net/?p=622</guid>
		<description><![CDATA[I&#8217;ve long hollered that ZFS is a real storage revolution in the making, but recognized that it still had a way to go before replacing UFS, HFS+, and most volume managers. Well, a little Rhode Island company called greenBytes comes out of stealth today to announce that they&#8217;re doing just that &#8211; taking the solid [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve long hollered that <a href="http://blog.fosketts.net/2008/02/27/zfs-super-file-system/"  target="_self">ZFS is a real storage revolution in the making</a>, but recognized that it still had a way to go before replacing UFS, HFS+, and most volume managers. Well, a little Rhode Island company called <a href="http://www.green-bytes.com/"  target="_blank">greenBytes comes out of stealth today</a> to announce that they&#8217;re doing just that &#8211; taking the solid ZFS core and adding some serious enterprise storage features to it. And they&#8217;re rolling the lot into a multi-protocol storage array using commodity (<a href="http://www.sun.com/servers/x64/x4500/"  target="_blank">Sun Thumper</a>) hardware. These guys have cooked up a seriously interesting entrant in the storage market, though I can&#8217;t say much for the <a rel="nofollow" href="http://en.wikipedia.org/wiki/CamelCase"  target="_blank">decapitated camel-case spelling</a> of their (<a href="http://greenbytes.de/"  target="_blank">already in use</a>) name!</p>
<p><span id="more-622"></span><strong>Spun Down</strong></p>
<p>Although <a rel="nofollow" href="http://en.wikipedia.org/wiki/ZFS#Features"  target="_blank">ZFS&#8217; universal storage pool with non-RAID</a> is a great concept, it stands in the way of at least one (sometimes) desirable storage technique: disk spin-down. Put simply, since every disk contains metadata, all disks must always be spinning. This issue is by no means a ZFS-only problem, though &#8211; certain vendors tout the (laughable) greenness of their storage systems, while hoping that the average user won&#8217;t notice the truth: That a disk simply cannot spin down while any part of it is in use. This means that tacking spin-down onto a regular storage array is like painting it a different color: There is no benefit whatsoever to the average user. Sure, a few non-provisioned drives might spin down, but what are you doing buying a lot of non-provisioned drives anyway?</p>
<p>The solution has always been right in front of everyone: Develop <a href="http://blog.fosketts.net/2008/09/14/turning-the-page-on-raid/"  target="_self">a new type of non-RAID</a> with enough intelligence to allow drives to spin down when not used. This is what <a href="http://www.copansystems.com/index.php?"  target="_blank">COPAN Systems</a> did with their <a rel="nofollow" href="http://en.wikipedia.org/wiki/Massive_array_of_idle_disks"  target="_blank">MAID</a> technology: Invent an entirely new storage array, with integrated data protection and management techniques that allow <em>alive but not active</em> drives to spin down. Spin-down is not MAID any more than a bicycle is a Ducati.</p>
<p>Let&#8217;s make one thing clear: It&#8217;s <em>really hard</em> to reduce the power demands of storage devices. Disks guzzle watts like few other data center devices, and enterprise storage uses lots of disks. Lots of vendors are looking to hop onto the green storage bandwagon, and they all seem to realize that bringing some <a href="http://storageio.com/blog/?p=72"  target="_blank">intelligence to power management by enabling spin-down</a> is an open door. But it&#8217;s awfully hard to maintain performance and data protection when disks are spinning up and down all the time.</p>
<p>One element of the greenByte story is the way in which they have tweaked ZFS to allow disks to spin down. They limit the metadata updates to just a few disks, so the others can be idled when no access to them is made. The company suggests scheduling this for off hours to minimize latency as drives are brought back online, an approach that is less than optimal from an energy perspective but demonstrates that they understand just how difficult this problem is to crack. The core is there, however: They have integrated the data protection and storage management elements to enable spin-down to be practical.</p>
<p><strong>Compressed</strong></p>
<p>Another major storage industry theme of the last few years is deduplication of data. An advanced (or devolved, depending on your perspective) form of compression, deduplication allows a storage array to store duplicate data more efficiently, reducing the amount of capacity required for some applications. <a href="http://www.datadomain.com/"  target="_blank">Data Domain</a> is top-of-mind in this space, but just about everyone now offers some form of deduplication technology.</p>
<p>One major roadblock on the way to deduplication (or compression) nirvana is performance. Simply put, it&#8217;s <em>really really hard</em> to process data on the fly without affecting performance, especially as data scales up to the multi-terabyte range or as systems scale out to include multiple devices. One approach to tackling this issue is post-processing dedupe, which accepts incoming data in the normal way but goes back and processes it later to remove duplicates. This is the method <a href="http://netapp.com"  target="_blank">NetApp</a> uses, and they have leveraged it to become <a href="http://blog.fosketts.net/2008/03/12/de-duplication-goes-mainstream/"  target="_self">the first vendor to support deduplication of production applications</a>.</p>
<p>Predictably, deduplication is another technology integrated into greenBytes&#8217; &#8220;ZFS+&#8221; technology. They claim that they can handle inline compression at wire speed, and also claim deduplication inline. It&#8217;s not yet clear exactly what the difference between compression and deduplication is to the company, or just what kind of performance their inline technology will yield, but it&#8217;s certainly nice to see this tech integrated with ZFS!</p>
<p><strong>Thin is In (the House!)</strong></p>
<p>greenBytes gets closer to enterprise storage bingo by adding <a href="http://blog.fosketts.net/2008/09/02/3pars-thin-un-provisioning-is-slightly-less-bad/"  target="_self">thin provisioning</a> to the mix. Actually, as the company&#8217;s CTO was quick to point out, they had to offer virtual or thin provisioning to enable the rest of the system to function. When your storage is sliced and diced by their Cypress array, the only way to present storage is with a wink and a promise of capacity to spare. Thankfully this is not the core of their pitch, however.</p>
<p>The company also promises snapshots and CDP replication, all leveraging ZFS at the core. All they need to add is tier-0 solid state storage to get five chips in a row without even <a rel="nofollow" href="http://en.wikipedia.org/wiki/Bingo_(U.S.)"  target="_blank">using the free space</a>! Although greenBytes is using Sun&#8217;s Thumper chassis currently for their Cypress array, their core technology is the ZFS+ software, and I expect we might see this mixed quite differently in the future. This is a software company, not an array vendor.</p>
<p>All considered, greenBytes has thoroughly broken the link between physical and logical storage, and I applaud them for it. This is exactly the kind of storage revolution the industry needs right now.</p>
<div id="crp_related"><h3>You might also want to read these other posts...</h3><ul><li><a href="http://blog.fosketts.net/2008/09/25/deduplication-ready-prime-time/"  rel="bookmark" class="crp_title">Is Deduplication Ready for Prime Time?</a></li><li><a href="http://blog.fosketts.net/2008/09/16/deduplication-primary-storage/"  rel="bookmark" class="crp_title">Deduplication Coming to Primary Storage</a></li><li><a href="http://blog.fosketts.net/2008/09/02/3pars-thin-un-provisioning/"  rel="bookmark" class="crp_title">3PAR&#8217;s Thin Un-Provisioning is Slightly Less Bad</a></li><li><a href="http://blog.fosketts.net/2008/09/14/turning-page-raid/"  rel="bookmark" class="crp_title">Turning the Page on RAID</a></li><li><a href="http://blog.fosketts.net/2011/04/30/storage-revolution/"  rel="bookmark" class="crp_title">We Need a Storage Revolution</a></li></ul></div><script src="http://feeds.feedburner.com/~s/sfoskett?i=http://blog.fosketts.net/2008/09/15/greenbytes-embraces-extends-zfs/" type="text/javascript" charset="utf-8"></script><hr />
<p><small>© sfoskett for <a href="http://blog.fosketts.net">Stephen Foskett, Pack Rat</a>, 2008. |
<a href="http://blog.fosketts.net/2008/09/15/greenbytes-embraces-extends-zfs/">greenBytes Embraces and Extends ZFS</a>
<br/>
This post was categorized as <a href="http://blog.fosketts.net/category/everything/enterprisestorage/" title="View all posts in Enterprise storage" rel="category tag">Enterprise storage</a>, <a href="http://blog.fosketts.net/category/everything/virtualstorage/" title="View all posts in Virtual Storage" rel="category tag">Virtual Storage</a>. Each of my categories has its own feed if you'd like to filter out or focus on posts like this.<br/>
</small></p>]]></content:encoded>
			<wfw:commentRss>http://blog.fosketts.net/2008/09/15/greenbytes-embraces-extends-zfs/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Jargon Watch: EMC 3D = Data Deduplication</title>
		<link>http://blog.fosketts.net/2008/05/21/jargon-watch-emc-3d-data-deduplication/</link>
		<comments>http://blog.fosketts.net/2008/05/21/jargon-watch-emc-3d-data-deduplication/#comments</comments>
		<pubDate>Wed, 21 May 2008 17:46:01 +0000</pubDate>
		<dc:creator>Stephen</dc:creator>
				<category><![CDATA[Enterprise storage]]></category>
		<category><![CDATA[3D]]></category>
		<category><![CDATA[blogging]]></category>
		<category><![CDATA[data deduplication]]></category>
		<category><![CDATA[deduplication]]></category>
		<category><![CDATA[EMC]]></category>
		<category><![CDATA[EMC World]]></category>
		<category><![CDATA[jargon]]></category>

		<guid isPermaLink="false">http://blog.fosketts.net/?p=172</guid>
		<description><![CDATA[Watching the announcements coming out of EMC World today, one bit of jargon stuck out at me:  The EMC bloggers are starting to refer to &#8220;data deduplication&#8221; as &#8220;3D&#8221;.  I had never heard this terminology before yesterday, but the EMCers are all using it, so it must be a popular term inside that company.  So [...]]]></description>
			<content:encoded><![CDATA[<p>Watching the announcements coming out of EMC World today, one bit of jargon stuck out at me:  <a rel="nofollow" href="http://thebackupblog.typepad.com/thebackupblog/2008/05/ritual-of-the-h.html"  target="_blank">The EMC bloggers</a> are starting to refer to <a rel="nofollow" href="http://chucksblog.typepad.com/chucks_blog/2008/05/3d-redux.html"  target="_blank">&#8220;<strong>d</strong>ata <strong>d</strong>e<strong>d</strong>uplication&#8221; as &#8220;3D&#8221;</a>.  I had never heard this terminology before yesterday, but the EMCers are all using it, so it must be a popular term inside that company.  So I&#8217;m just giving my readers a heads-up: 3D is deduplication, at least at EMC.</p>
<div id="crp_related"><h3>You might also want to read these other posts...</h3><ul><li><a href="http://blog.fosketts.net/2009/02/05/difference-integration-frankenstein/"  rel="bookmark" class="crp_title">The Difference Between &#8220;Integration&#8221; and &#8220;Frankenstein&#8221;</a></li><li><a href="http://blog.fosketts.net/2008/11/07/emc-maui/"  rel="bookmark" class="crp_title">EMC About To Take Us To Maui&#8230;</a></li><li><a href="http://blog.fosketts.net/2007/08/01/chuck-hollis-gets-it/"  rel="bookmark" class="crp_title">Chuck Hollis Gets It!</a></li><li><a href="http://blog.fosketts.net/2008/10/16/fcoe-versus-iscsi/"  rel="bookmark" class="crp_title">Is the FCoE Starting Pistol Aimed at iSCSI?</a></li><li><a href="http://blog.fosketts.net/2008/09/19/what-vmware-vdc-os-vstorage/"  rel="bookmark" class="crp_title">What is VMware VDC-OS vStorage?</a></li></ul></div><script src="http://feeds.feedburner.com/~s/sfoskett?i=http://blog.fosketts.net/2008/05/21/jargon-watch-emc-3d-data-deduplication/" type="text/javascript" charset="utf-8"></script><hr />
<p><small>© sfoskett for <a href="http://blog.fosketts.net">Stephen Foskett, Pack Rat</a>, 2008. |
<a href="http://blog.fosketts.net/2008/05/21/jargon-watch-emc-3d-data-deduplication/">Jargon Watch: EMC 3D = Data Deduplication</a>
<br/>
This post was categorized as <a href="http://blog.fosketts.net/category/everything/enterprisestorage/" title="View all posts in Enterprise storage" rel="category tag">Enterprise storage</a>. Each of my categories has its own feed if you'd like to filter out or focus on posts like this.<br/>
</small></p>]]></content:encoded>
			<wfw:commentRss>http://blog.fosketts.net/2008/05/21/jargon-watch-emc-3d-data-deduplication/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

