<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:series="http://unfoldingneurons.com/"
	>

<channel>
	<title>Stephen Foskett, Pack Rat &#187; deduplication Archives  &#8211; Stephen Foskett, Pack Rat</title>
	<atom:link href="http://blog.fosketts.net/tag/deduplication/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.fosketts.net</link>
	<description>Understanding the accumulation of data</description>
	<lastBuildDate>Fri, 10 Feb 2012 17:40:43 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=</generator>
<atom:link rel="hub" href="http://pubsubhubbub.appspot.com" />
	<atom:link rel="hub" href="http://superfeedr.com/hubbub" />
			<item>
		<title>Microsoft Adds Data Deduplication to NTFS in Windows 8</title>
		<link>http://blog.fosketts.net/2012/01/03/microsoft-adds-data-deduplication-ntfs-windows-8/</link>
		<comments>http://blog.fosketts.net/2012/01/03/microsoft-adds-data-deduplication-ntfs-windows-8/#comments</comments>
		<pubDate>Tue, 03 Jan 2012 21:59:06 +0000</pubDate>
		<dc:creator>Stephen</dc:creator>
				<category><![CDATA[Enterprise storage]]></category>
		<category><![CDATA[Everything]]></category>
		<category><![CDATA[Gestalt IT]]></category>
		<category><![CDATA[Personal]]></category>
		<category><![CDATA[Terabyte home]]></category>
		<category><![CDATA[Virtual Storage]]></category>
		<category><![CDATA[CSV]]></category>
		<category><![CDATA[data deduplication]]></category>
		<category><![CDATA[deduplication]]></category>
		<category><![CDATA[file system]]></category>
		<category><![CDATA[Hyper-V]]></category>
		<category><![CDATA[I/O]]></category>
		<category><![CDATA[Microsoft]]></category>
		<category><![CDATA[NTFS]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[Rick Vanover]]></category>
		<category><![CDATA[server]]></category>
		<category><![CDATA[Windows 8]]></category>

		<guid isPermaLink="false">http://blog.fosketts.net/?p=6475</guid>
		<description><![CDATA[The next version of Microsoft Windows Server includes integrated data deduplication technology. Microsoft is positioning this as a boon for server virtualization and claims it has very little performance impact. But how exactly does Microsoft's de-duplication technology work?]]></description>
			<content:encoded><![CDATA[<div id="attachment_6628" class="wp-caption aligncenter" style="width: 310px;  border: 1px solid #dddddd; background-color: #f3f3f3; padding-top: 4px; margin: 10px; text-align:center; display: block; margin-right: auto; margin-left: auto;"><a href="http://static.fosketts.net/wp-content/uploads/2012/01/Microsoft-Windows-8-Dedupe-Stack.jpg" ><img class="size-medium wp-image-6628 " title="Microsoft Windows 8 Dedupe Stack" src="http://static.fosketts.net/wp-content/uploads/2012/01/Microsoft-Windows-8-Dedupe-Stack-300x225.jpg" alt="" width="300" height="225" /></a><p style=' padding: 0 4px 5px; margin: 0;'  class="wp-caption-text">Windows 8 server editions will include a filter driver for NTFS for data deduplication</p></div>
<p>The next version of Microsoft Windows Server includes <strong>integrated data deduplication technology</strong>. Microsoft is positioning this as a boon for server virtualization and claims it has very little performance impact. But how exactly does Microsoft&#8217;s de-duplication technology work?</p>
<h3>Introducing Windows 8 Deduplication</h3>
<p>Let&#8217;s make one thing clear right from the start: Microsoft started from a clean sheet and invented their own deduplication technology. This is not a licensed, cloned, or copied feature as far as I can tell. There are some clever aspects to it, along with a few head scratchers for folks like me who&#8217;ve seen lots of different deduplication approaches.</p>
<p><strong>Microsoft&#8217;s deduplication is layered onto NTFS in Windows 8</strong>, and will be a feature add-on for Server users. It is implemented as a filter driver on a per volume basis, with each volume a complete, self describing unit. It is cluster aware, and fully crash consistent on all operations. This is a pretty neat trick: As is typical for Microsoft, deduplication will be a simple, transparent feature.</p>
<p>Now let&#8217;s talk for a moment about what Windows 8 deduplication is not.</p>
<ul>
<li>It is a <strong>server-only</strong> feature, like so many of Microsoft&#8217;s storage developments. But perhaps we might see it deployed in low-end or home servers in the future.</li>
<li>It is <strong>not supported on boot or system volumes</strong>.</li>
<li>Although it should work just fine on removable drives, <strong>deduplication requires NTFS</strong> so you can forget about FAT or exFAT. And of course the connected system must be running a server edition of Windows 8.</li>
<li>Although <strong>deduplication does not work with clustered shared volumes</strong>, it is supported in Hyper-V configurations that do not use CSV.</li>
<li>Finally, deduplication does not function on encrypted files, files with extended attributes, tiny (less than 64 kB) files, or re-parse points.</li>
</ul>
<h3>Some Technical Details on Deduplication in Windows 8</h3>
<p>Microsoft Research spent 2 years experimenting with algorithms to find the &#8220;cheapest&#8221; in terms of overhead. <strong>They select a chunk size  for each data set</strong>. This is typically between 32 KB and 128 KB, but smaller chunks can be created as well. Microsoft claims that most real-world use cases are about 80 KB. The system processes all the data looking for &#8220;fingerprints&#8221; of split points and selects the &#8220;best&#8221; on the fly for each file.</p>
<p>After data is de-duplicated, Microsoft compresses the chunks and stores them in a special &#8220;chunk store&#8221; within NTFS. This is actually  part of the System Volume store in the root of the volume, so dedupe is volume-level. The entire setup is self describing, so a deduplication NTFS volume can be read by another server without any external data.</p>
<p>There is some redundancy in the system as well. Any chunk that is referenced more than x times (100 by default) will be kept in a second location. All data in the filesystem is checksummed and will be proactively repaired. The same is done for the metadata. The deduplication service includes a scrubbing job as well as a file system optimization task to keep everything running smoothly.</p>
<p>Windows 8 deduplication cooperates with other elements of the operating system. <strong>The Windows caching layer is dedupe-aware</strong>, and this will greatly accelerate overall performance. Windows 8 also includes a new &#8220;express&#8221; library that makes compression &#8220;20 times faster&#8221;. Compressed files are not re-compressed based on filetype, so zip files, Office 2007+ files, etc will be skipped and just deduped.</p>
<p>New writes are not deduped &#8211; <strong>this is a post-process technology</strong>. The data deduplication service can be scheduled or can run in &#8220;background mode&#8221; and wait for idle time. Therefore, I/O impact is between &#8220;none and 2x&#8221; depending on type. Opening a file is less than 3% greater I/O and can be faster if it&#8217;s cached. Copying a large file can make some difference (e.g. 10 GB VHD) since it adds additional disk seeks, but multiple concurrent copies that share data can actually improve performance.</p>
<h3>Stephen&#8217;s Stance</h3>
<p>Although I am intrigued by Microsoft&#8217;s new deduplication technology in Windows 8 server, I still have many questions about its usefulness and impact on performance. Concentrating duplicate data in the system volume makes sense from a technical perspective, but could lead to an I/O hotspot on the disk. This is especially true for external caching storage systems, since there is no integration between Microsoft deduplication and storage array features. I am particularly concerned about the use of deduplication with VHD files in Hyper-V, since it could eat up valuable system RAM and impact I/O performance.</p>
<p>If you would like to try Microsoft deduplication for yourself, I am happy to report that it is included in <a rel="nofollow" href="http://msdn.microsoft.com/en-us/windows/br229518" >the developer preview of Windows 8 that is available on Dev Center</a>. Here are <a rel="nofollow" href="http://social.msdn.microsoft.com/Forums/zh/windowsdeveloperpreviewgeneral/thread/3f601771-1400-47c4-9aec-bb9bc45b2d85" >a few commands</a> to get you started, and read <a href="http://www.techrepublic.com/blog/networking/configuring-windows-server-8-deduplication/4918" >Rick Vanover&#8217;s post</a> too!</p>
<pre>Import-Module ServerManager
Add-WindowsFeature -name FS-Data-Deduplication
Import-Module Deduplication
Enable-DedupVolume E:
get-dedupvolume</pre>
<blockquote><p>Note: I am a Microsoft MVP and Microsoft briefs me on upcoming technologies under NDA. This post is based on a Microsoft briefing from November which was said at the time not to be covered by any NDA. All of this information could be gleaned by experimenting with the Windows 8 developer preview, but it&#8217;s much easier to just go to the source.</p></blockquote>
<div id="crp_related"><h3>You might also want to read these other posts...</h3><ul><li><a href="http://blog.fosketts.net/2009/05/05/windows-storage-server-2008/"  rel="bookmark" class="crp_title">I Can Finally Talk About Windows Storage Server 2008!</a></li><li><a href="http://blog.fosketts.net/2008/09/25/deduplication-ready-prime-time/"  rel="bookmark" class="crp_title">Is Deduplication Ready for Prime Time?</a></li><li><a href="http://blog.fosketts.net/2008/08/19/windows-7-server-windows-server-2008-r2/"  rel="bookmark" class="crp_title">Windows 7 Server == Windows Server 2008 R2</a></li><li><a href="http://blog.fosketts.net/2009/05/27/windows-7-hands/"  rel="bookmark" class="crp_title">Windows 7 Is Here! In My Hands! But Why 8 DVDs?</a></li><li><a href="http://blog.fosketts.net/2008/09/16/deduplication-primary-storage/"  rel="bookmark" class="crp_title">Deduplication Coming to Primary Storage</a></li></ul></div><script src="http://feeds.feedburner.com/~s/sfoskett?i=http://blog.fosketts.net/2012/01/03/microsoft-adds-data-deduplication-ntfs-windows-8/" type="text/javascript" charset="utf-8"></script><hr />
<p><small>© sfoskett for <a href="http://blog.fosketts.net">Stephen Foskett, Pack Rat</a>, 2012. |
<a href="http://blog.fosketts.net/2012/01/03/microsoft-adds-data-deduplication-ntfs-windows-8/">Microsoft Adds Data Deduplication to NTFS in Windows 8</a>
<br/>
This post was categorized as <a href="http://blog.fosketts.net/category/everything/enterprisestorage/" title="View all posts in Enterprise storage" rel="category tag">Enterprise storage</a>, <a href="http://blog.fosketts.net/category/everything/" title="View all posts in Everything" rel="category tag">Everything</a>, <a href="http://blog.fosketts.net/category/gestaltit/" title="View all posts in Gestalt IT" rel="category tag">Gestalt IT</a>, <a href="http://blog.fosketts.net/category/everything/personal/" title="View all posts in Personal" rel="category tag">Personal</a>, <a href="http://blog.fosketts.net/category/everything/terabytehome/" title="View all posts in Terabyte home" rel="category tag">Terabyte home</a>, <a href="http://blog.fosketts.net/category/everything/virtualstorage/" title="View all posts in Virtual Storage" rel="category tag">Virtual Storage</a>. Each of my categories has its own feed if you'd like to filter out or focus on posts like this.<br/>
</small></p>]]></content:encoded>
			<wfw:commentRss>http://blog.fosketts.net/2012/01/03/microsoft-adds-data-deduplication-ntfs-windows-8/feed/</wfw:commentRss>
		<slash:comments>14</slash:comments>
		</item>
		<item>
		<title>Data Reduction: the Condensed Version</title>
		<link>http://blog.fosketts.net/2011/09/22/data-reduction-condensed-version/</link>
		<comments>http://blog.fosketts.net/2011/09/22/data-reduction-condensed-version/#comments</comments>
		<pubDate>Thu, 22 Sep 2011 19:55:04 +0000</pubDate>
		<dc:creator>Stephen</dc:creator>
				<category><![CDATA[Enterprise storage]]></category>
		<category><![CDATA[Personal]]></category>
		<category><![CDATA[Virtual Storage]]></category>
		<category><![CDATA[Balesio]]></category>
		<category><![CDATA[compression]]></category>
		<category><![CDATA[data reduction]]></category>
		<category><![CDATA[deduplication]]></category>
		<category><![CDATA[FILEminimizer]]></category>
		<category><![CDATA[IBM]]></category>
		<category><![CDATA[SearchStorage]]></category>
		<category><![CDATA[Storage Decisions]]></category>
		<category><![CDATA[The Storage Community]]></category>

		<guid isPermaLink="false">http://blog.fosketts.net/?p=6208</guid>
		<description><![CDATA[Native Format Optimization (NFO) makes a lot of sense, since it addresses a common user error in a practical way, and allows capacity savings to “trickle-down” to backups, e-mail systems, and archives. But wholesale compression and the duplication of primary storage may not be worth much, especially since the cost of disk keeps dropping dramatically.]]></description>
			<content:encoded><![CDATA[<div id="attachment_6209" class="wp-caption aligncenter" style="width: 445px;  border: 1px solid #dddddd; background-color: #f3f3f3; padding-top: 4px; margin: 10px; text-align:center; display: block; margin-right: auto; margin-left: auto;"><img class="size-full wp-image-6209" title="Warning Do Not Remove Shields" src="http://static.fosketts.net/wp-content/uploads/2011/09/Warning-Do-Not-Remove-Shields.jpg" alt="" width="435" height="276" /><p style=' padding: 0 4px 5px; margin: 0;'  class="wp-caption-text">Data Reduction can be hazardous to your health!</p></div>
<p>I&#8217;m not a big fan of data reduction technology, yet I found myself talking compression and de-duplication all week. Between Storage Decisions and my recent posts over at <a href="http://searchstorage.techtarget.com/tip/Interest-in-data-reduction-methods-needs-to-keep-pace-with-data-growth#" >SearchStorage</a> and <a href="http://storagecommunity.org/blogs/stephenfoskett/archive/2011/09/07/has-the-time-finally-come-for-data-reduction.aspx" >The Storage Community</a>, I&#8217;ve had quite a bit to say on the subject. Funny enough, I&#8217;m not really a fan of data reduction technology for primary storage. Too often, data reduction is more expensive and difficult than just storing raw data.</p>
<blockquote><p>You should also read <a href="http://blog.fosketts.net/2008/09/16/deduplication-primary-storage/" >Deduplication Coming to Primary Storage</a> and <a href="http://blog.fosketts.net/2009/02/05/compression-encryption-deduplication-replication/" >Compression, Encryption, Deduplication, and Replication: Strange Bedfellows</a></p></blockquote>
<h3>Storage Decisions</h3>
<p><a href="http://blog.fosketts.net/2011/09/02/storage-decisions-york-capacity-optimization/" >My Storage Decisions presentation</a> on data reduction was hilarious, if I do say so myself, even though turnout was poor at 8:30 AM on Tuesday morning. Maybe it was this “intimate” group, but I found myself really getting into the discussion. And the nods and hollers from the audience helped, too!</p>
<p>My basic thesis at Storage Decisions was the same as always: <strong>Don&#8217;t throw good money at technology that will have little ROI</strong>. Considering that disk capacity is incredibly cheap, and dropping all the time, data reduction doesn&#8217;t look like a great fit except in certain situations. Why spend money to reduce utilization? Why put in the effort when most primary storage data reduction technologies don&#8217;t do anything to address the “multiplier effect” of archiving, DR, and backup storage?</p>
<p>This is not to say that all data reduction technology is worthless. In fact, the free compression and de-duplication built into many SSDs and even some enterprise storage devices make perfect sense. I just don&#8217;t understand spending a bunch of money to address storage capacity when most applications are starved for storage performance.</p>
<blockquote><p>You might like reading my two other posts on the subject from last week:</p>
<ul>
<li><a href="http://searchstorage.techtarget.com/tip/Interest-in-data-reduction-methods-needs-to-keep-pace-with-data-growth#" >Interest in data reduction methods needs to keep pace with data growth</a> (SearchStorage.com)</li>
<li><a href="http://storagecommunity.org/blogs/stephenfoskett/archive/2011/09/07/has-the-time-finally-come-for-data-reduction.aspx" >Has the Time Finally Come for Data Reduction?</a> (The Storage Community, sponsored by IBM)</li>
</ul>
</blockquote>
<h3>You&#8217;re Losing Me</h3>
<p>On the other hand, I do see quite a bit of value in something many people would overlook out of hand: Lossy compression of office files. Every systems administrator knows that end-users do “stupid stuff” like embedding massive photos and videos in PowerPoint presentations and Word documents. But not everyone knows that there are technological means to address this “<a href="http://www.thinkgeek.com/tshirts-apparel/unisex/itdepartment/6692/" >PEBKAC</a>” issue.</p>
<p>Some office applications already automatically reduce the size of embedded content, and operating systems can do the same. One of my more popular blog posts, in fact, is <a href="http://blog.fosketts.net/2008/10/23/reduce-file-size-pdf-mac/" >a technique to create a filter to reduce the size of PDF files in Mac OS X Preview</a>. And the Microsoft “X” Office file formats include lossless compression as well.</p>
<p>An application that recently caught my eye is the <a href="http://balesio.com/fileminimizersuite/eng/index.php" >FILEminimizer Suite</a> by Balesio. This inexpensive application reduces the size of Office and media files while leaving them in their native format. It re-compresses image files, reducing them to optimum size for use in presentations, documents, or printouts. A companion product, <a href="http://balesio.com/fileminimizerserver/eng/index.php" >FILEminimizer Server</a>, can be used on enterprise file servers to perform the same magic across the whole range of users.</p>
<h3>Stephen&#8217;s Stance</h3>
<p>Native Format Optimization (NFO) makes a lot of sense, since it addresses a common user error in a practical way, and allows capacity savings to “trickle-down” to backups, e-mail systems, and archives. But wholesale compression and the duplication of primary storage may not be worth much, especially since the cost of disk keeps dropping dramatically.</p>
<div id="crp_related"><h3>You might also want to read these other posts...</h3><ul><li><a href="http://blog.fosketts.net/2011/05/27/storage-decisions-chicago/"  rel="bookmark" class="crp_title">Storage Decisions Chicago: All About Capacity Optimization</a></li><li><a href="http://blog.fosketts.net/2011/09/02/storage-decisions-york-capacity-optimization/"  rel="bookmark" class="crp_title">Storage Decisions New York: Capacity Optimization</a></li><li><a href="http://blog.fosketts.net/2011/11/07/storage-decisions-san-francisco-2011-optimization-virtualization/"  rel="bookmark" class="crp_title">Storage Decisions San Francisco 2011: Optimization and Virtualization</a></li><li><a href="http://blog.fosketts.net/2008/09/16/deduplication-primary-storage/"  rel="bookmark" class="crp_title">Deduplication Coming to Primary Storage</a></li><li><a href="http://blog.fosketts.net/2011/08/25/pricing-squishy-competition-heats/"  rel="bookmark" class="crp_title">When Pricing Gets Squishy Competition Heats Up</a></li></ul></div><script src="http://feeds.feedburner.com/~s/sfoskett?i=http://blog.fosketts.net/2011/09/22/data-reduction-condensed-version/" type="text/javascript" charset="utf-8"></script><hr />
<p><small>© sfoskett for <a href="http://blog.fosketts.net">Stephen Foskett, Pack Rat</a>, 2011. |
<a href="http://blog.fosketts.net/2011/09/22/data-reduction-condensed-version/">Data Reduction: the Condensed Version</a>
<br/>
This post was categorized as <a href="http://blog.fosketts.net/category/everything/enterprisestorage/" title="View all posts in Enterprise storage" rel="category tag">Enterprise storage</a>, <a href="http://blog.fosketts.net/category/everything/personal/" title="View all posts in Personal" rel="category tag">Personal</a>, <a href="http://blog.fosketts.net/category/everything/virtualstorage/" title="View all posts in Virtual Storage" rel="category tag">Virtual Storage</a>. Each of my categories has its own feed if you'd like to filter out or focus on posts like this.<br/>
</small></p>]]></content:encoded>
			<wfw:commentRss>http://blog.fosketts.net/2011/09/22/data-reduction-condensed-version/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Storage Decisions New York: Capacity Optimization</title>
		<link>http://blog.fosketts.net/2011/09/02/storage-decisions-york-capacity-optimization/</link>
		<comments>http://blog.fosketts.net/2011/09/02/storage-decisions-york-capacity-optimization/#comments</comments>
		<pubDate>Fri, 02 Sep 2011 18:45:05 +0000</pubDate>
		<dc:creator>Stephen</dc:creator>
				<category><![CDATA[Enterprise storage]]></category>
		<category><![CDATA[Personal]]></category>
		<category><![CDATA[Virtual Storage]]></category>
		<category><![CDATA[compression]]></category>
		<category><![CDATA[data management]]></category>
		<category><![CDATA[data reduction]]></category>
		<category><![CDATA[deduplication]]></category>
		<category><![CDATA[Storage Decisions]]></category>
		<category><![CDATA[storage virtualization]]></category>
		<category><![CDATA[TechTarget]]></category>
		<category><![CDATA[tiered storage]]></category>
		<category><![CDATA[volume management]]></category>
		<category><![CDATA[volume manager]]></category>

		<guid isPermaLink="false">http://blog.fosketts.net/?p=6153</guid>
		<description><![CDATA[Later this month, I will be heading to New York for TechTarget's Storage Decisions conference. I will have two presentations on data reduction and storage virtualization in the main conference track. Registration is free for qualified end-users, and I urge you to attend on September 19 and 20, 2011.]]></description>
			<content:encoded><![CDATA[<div id="attachment_6156" class="wp-caption aligncenter" style="width: 410px;  border: 1px solid #dddddd; background-color: #f3f3f3; padding-top: 4px; margin: 10px; text-align:center; display: block; margin-right: auto; margin-left: auto;"><img class="size-full wp-image-6156" title="Storage Decisions Chicago 2011" src="http://static.fosketts.net/wp-content/uploads/2011/09/SD-Chi-11.jpg" alt="" width="400" height="266" /><p style=' padding: 0 4px 5px; margin: 0;'  class="wp-caption-text">Join me in New York for Storage Decisions, September 19 &amp; 20</p></div>
<p>Later this month, I will be heading to New York for <a href="http://storagedecisions.techtarget.com/newyork/index.html" >TechTarget&#8217;s Storage Decisions conference</a>. This show does a good job on the editorial side, suggesting timely topics and bringing in independent voices like Howard Marks. I will have two presentations on data reduction and storage virtualization in the main conference track. <a href="http://registration.techtarget.com/events/register.do?name=storagedecisionsnewyork" >Registration is free</a> for qualified end-users, and I urge you to attend on September 19 and 20, 2011.</p>
<h3>Reclaim Capacity with Data Reduction for Primary Storage</h3>
<blockquote><p>Depending on which industry study you read, most companies are wasting anywhere from 30% to 50% of their installed disk capacity, which translates into thousands of dollars spent with no effective return on investment. Storage vendors are beginning to provide tools that can help storage managers make the most of the disk they have installed. For example, data reduction for primary storage borrows data deduplication technology developed for backup and classic compression algorithms to help squeeze the air out of nearline and primary data and reduce its footprint. This session&#8217;s topics will include an overview of data reduction technologies and where they will have the greatest impact, what key storage vendors are offering in data reduction and an update on the major players, and the consequences of using primary data dedupe along with dedupe for backups. We&#8217;ll also look at the potential for vendor lock-in and consider why we’re reducing data in the first place.</p>
<p>Topics include:</p>
<ul>
<li>Introducing data reduction technologies
<ul>
<li>Compression: How it works and where it’s found</li>
<li>Deduplication: From single-instancing to variable block</li>
<li>Application-specific: Cracking open files</li>
</ul>
</li>
<li>Overview of data reduction products</li>
<li>Where to use them
<ul>
<li>The capacity conundrum: Store less and reduce utilization</li>
<li>Ideal applications: Justifying the cost of data reduction</li>
<li>Side effects: Considering the impact on backup, replication, I/O workload and vendor lock-in</li>
</ul>
</li>
</ul>
</blockquote>
<h3>Storage Virtualization: Who’s Doing It and Why</h3>
<blockquote><p>Storage virtualization has been around for decades and, although research indicates that 70% of companies have already virtualized at least some of their installed block or file storage, most remain unaware of this technology. Grandiose schemes for comprehensive virtual SANs have given way to more practical host- and array-based virtualization technologies, and server virtualization has created a new opportunity to create a pool of storage. This session will look at the current state of storage virtualization, how to quantify its benefits and describe which approaches are best for particular environments, and also cover how storage virtualization compares to private storage clouds.</p>
<p>Topics include:</p>
<ul>
<li>Defining storage virtualization: What it is and where to find it
<ul>
<li>Abstraction of storage resources</li>
<li>Tiered storage</li>
<li>Flexibility</li>
</ul>
</li>
<li>Popular approaches to storage virtualization
<ul>
<li>SAN controllers</li>
<li>File virtualization</li>
<li>Volume managers</li>
</ul>
</li>
<li>The pool, the hypervisor and the cloud
<ul>
<li>The impact of server virtualization</li>
<li>Is this a private cloud?</li>
</ul>
</li>
</ul>
</blockquote>
<h3>Registration</h3>
<div id="attachment_6155" class="wp-caption aligncenter" style="width: 410px;  border: 1px solid #dddddd; background-color: #f3f3f3; padding-top: 4px; margin: 10px; text-align:center; display: block; margin-right: auto; margin-left: auto;"><img class="size-full wp-image-6155" title="Storage Decisions Chicago 2011" src="http://static.fosketts.net/wp-content/uploads/2011/09/SD-Chi-11-2.jpg" alt="" width="400" height="266" /><p style=' padding: 0 4px 5px; margin: 0;'  class="wp-caption-text">You can see the future from here!</p></div>
<p>To register for Storage Decisions New York, just go to <a href="http://registration.techtarget.com/events/register.do?name=storagedecisionsnewyork" >the TechTarget registration page</a>.</p>
<p>Disclosure: TechTarget pays my expenses to attend and present at Storage Decisions, and has for many years. But they don&#8217;t pay me to present and I own the copyright on my session content. Happily, I license it all <a href="http://creativecommons.org/licenses/by-nc-sa/3.0/" >CC-by-NC-SA</a> so I can give it out freely!</p>
<div id="crp_related"><h3>You might also want to read these other posts...</h3><ul><li><a href="http://blog.fosketts.net/2011/11/07/storage-decisions-san-francisco-2011-optimization-virtualization/"  rel="bookmark" class="crp_title">Storage Decisions San Francisco 2011: Optimization and Virtualization</a></li><li><a href="http://blog.fosketts.net/2011/05/27/storage-decisions-chicago/"  rel="bookmark" class="crp_title">Storage Decisions Chicago: All About Capacity Optimization</a></li><li><a href="http://blog.fosketts.net/2011/05/17/5477/"  rel="bookmark" class="crp_title"></a></li><li><a href="http://blog.fosketts.net/2011/09/22/data-reduction-condensed-version/"  rel="bookmark" class="crp_title">Data Reduction: the Condensed Version</a></li><li><a href="http://blog.fosketts.net/about/stephen-foskett/speaking-engagements/"  rel="bookmark" class="crp_title">Speaking Engagements</a></li></ul></div><script src="http://feeds.feedburner.com/~s/sfoskett?i=http://blog.fosketts.net/2011/09/02/storage-decisions-york-capacity-optimization/" type="text/javascript" charset="utf-8"></script><hr />
<p><small>© sfoskett for <a href="http://blog.fosketts.net">Stephen Foskett, Pack Rat</a>, 2011. |
<a href="http://blog.fosketts.net/2011/09/02/storage-decisions-york-capacity-optimization/">Storage Decisions New York: Capacity Optimization</a>
<br/>
This post was categorized as <a href="http://blog.fosketts.net/category/everything/enterprisestorage/" title="View all posts in Enterprise storage" rel="category tag">Enterprise storage</a>, <a href="http://blog.fosketts.net/category/everything/personal/" title="View all posts in Personal" rel="category tag">Personal</a>, <a href="http://blog.fosketts.net/category/everything/virtualstorage/" title="View all posts in Virtual Storage" rel="category tag">Virtual Storage</a>. Each of my categories has its own feed if you'd like to filter out or focus on posts like this.<br/>
</small></p>]]></content:encoded>
			<wfw:commentRss>http://blog.fosketts.net/2011/09/02/storage-decisions-york-capacity-optimization/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>When Pricing Gets Squishy Competition Heats Up</title>
		<link>http://blog.fosketts.net/2011/08/25/pricing-squishy-competition-heats/</link>
		<comments>http://blog.fosketts.net/2011/08/25/pricing-squishy-competition-heats/#comments</comments>
		<pubDate>Thu, 25 Aug 2011 05:16:06 +0000</pubDate>
		<dc:creator>Stephen</dc:creator>
				<category><![CDATA[Enterprise storage]]></category>
		<category><![CDATA[Personal]]></category>
		<category><![CDATA[Virtual Storage]]></category>
		<category><![CDATA[deduplication]]></category>
		<category><![CDATA[flash]]></category>
		<category><![CDATA[IOPS]]></category>
		<category><![CDATA[Nimbus Data]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[pricing]]></category>
		<category><![CDATA[Pure Storage]]></category>
		<category><![CDATA[SSD]]></category>
		<category><![CDATA[Texas Memory Systems]]></category>
		<category><![CDATA[thin provisioning]]></category>
		<category><![CDATA[utilization]]></category>
		<category><![CDATA[Violin]]></category>

		<guid isPermaLink="false">http://blog.fosketts.net/?p=6082</guid>
		<description><![CDATA[I stepped into a hornet nest this week when I posted a write-up about a new flash storage array from Pure Storage. The controversy had nothing to do with the underlying technology, which seems quite sound. Rather, it was all about pricing, with Pure's competitors calling foul on their price comparisons.]]></description>
			<content:encoded><![CDATA[<div id="attachment_6083" class="wp-caption aligncenter" style="width: 310px;  border: 1px solid #dddddd; background-color: #f3f3f3; padding-top: 4px; margin: 10px; text-align:center; display: block; margin-right: auto; margin-left: auto;"><img class="size-medium wp-image-6083" title="Rotten Apple" src="http://static.fosketts.net/wp-content/uploads/2011/08/Rotten-Apple-300x199.jpg" alt="" width="300" height="199" /><p style=' padding: 0 4px 5px; margin: 0;'  class="wp-caption-text">When is a gigabyte not a gigabyte? When you&#39;re not buying gigabytes!</p></div>
<p>I stepped into a hornet nest this week when I posted a write-up about <a href="http://blog.fosketts.net/2011/08/23/pure-storage-flasharray-ssd-storage-array/" >a new flash storage array from Pure Storage</a>. The controversy had nothing to do with the underlying technology, which seems quite sound. Rather, it was all about pricing, with Pure&#8217;s competitors calling foul on their price comparisons.</p>
<h3>You&#8217;re Not Buying Gigabytes</h3>
<p>In a world of 3 TB drives, storage capacity is almost irrelevant. Capacity is what people think they are getting when they buy enterprise storage devices, but capacity is only one aspect of the purchase, and it&#8217;s not a very significant one in most cases.</p>
<p>So what are enterprise storage buyers buying?</p>
<ul>
<li><a href="http://blog.fosketts.net/2010/10/27/4-horsemen-io/" >Performance</a>, especially I/O operations (IOPS), is much more critical than capacity in most cases, and it takes lots of spindles or expensive flash chips to deliver it.</li>
<li>Data protection features like snapshots are increasingly important, and often cost extra.</li>
<li><a href="http://blog.fosketts.net/2011/04/28/support-matrix-blues/" >Compatibility</a> is paramount, as is long-term supportability from all vendors involved.</li>
<li>Integration and management features are often the deciding factor in purchases, especially when it comes to server virtualization applications.</li>
<li>High availability and product support are &#8220;must-haves&#8221; that can multiply the cost of a solution.</li>
<li>Power, cooling, and floor space can be very important for some applications and entirely inconsequential in others.</li>
<li>Capacity is sometimes important, but many applications require just a few TB or less and thin provisioning, data deduplication, and compression are really blurring the lines here.</li>
</ul>
<p>So although a typical customer will say &#8220;I need 200 GB for this application&#8221; they likely need nothing of the sort. They really need 100 IOPS, snapshots, a line on the HCL, VAAI and vCenter plugins, and redundant everything. Even the capacity number is questionable: Most applications grow over time, and few need much capacity really.</p>
<p>Since you can&#8217;t buy a 1 GB storage array and can&#8217;t fill a 10 TB device to 100%, pricing per GB is complete nonsense. Plain old storage space just sort of tags along for the ride once you build a system capable of meeting all these other needs.</p>
<h3>Data Reduction or Pricing Obfuscation?</h3>
<p><a href="http://blog.fosketts.net/2008/10/01/storage-utilization-waterfall-raw-usable/" >Utilization of storage capacity has always been terrible</a>, but <a href="http://blog.fosketts.net/2010/12/27/thin-provisioning-storage-cheaper/" >improving capacity efficiency is worthless</a>. The best you can do is over-tax your array or put all your data &#8220;eggs&#8221; in too few drive &#8220;baskets&#8221;. Achieving impressive capacity utilization just concentrates I/O, and this is the last thing you want to do with spinning hard disk drives.</p>
<p>This is why I suggest redirecting the conversation away from capacity requirements. The amount of GB to be used and the efficiency of that storage doesn&#8217;t matter all that much except for certain massive and rare applications. Once the array is big enough to handle the data, everything else is a wash.</p>
<p>This is also why I&#8217;m skeptical of data reduction technologies. Most applications would be better off optimizing for performance not reducing capacity used. And data reduction techniques like compression and deduplication quickly lead down the &#8220;your mileage may vary&#8221; rat hole.</p>
<h3>Comparing Apples to Apples</h3>
<blockquote><p>Also read <a href="http://blog.fosketts.net/2008/08/28/grapples-tangelos-impossible-compare-fairly/" >Grapples and Tangelos: Why it’s Impossible to Compare Fairly</a></p></blockquote>
<p>There is only one way to do a real fair comparison between different storage devices: Specify all the requirements and let each vendor put forward whatever they have that meets all of them. Who really cares if vendor A&#8217;s disk-based solution is 10% utilized while vendor B&#8217;s flash array needs 1/5 the capacity? As long as you have a place to put it (and enough power to feed it) it&#8217;ll still work fine.</p>
<p>One serious challenge in enterprise storage is the rise of flash memory as a storage medium. Flash chips are expensive on a data capacity basis but amazingly cheap in terms of performance and environmental efficiency. Put another way, an SSD can&#8217;t storage as much data as a hard disk, but it delivers massive I/O capability in a tiny, rugged, low-power footprint.</p>
<p>Since most enterprise applications need only a few hundred GB of capacity, a few SSDs can be a compelling alternative to a &#8220;refrigerator&#8221; full of disks. It can be hard to convince the boss, but you really can fit a whole datacenter&#8217;s worth of storage I/O into a few rack units!</p>
<h3>Pure and Nimbus</h3>
<p>This is the issue facing flashy solid state devices from many companies, and the root of my headaches this week. Pure Storage hasn&#8217;t finalized pricing yet, but are claiming that their new device costs $5 per usable gigabyte. This is incredibly cheap for an array that can blow the doors off most enterprise gear!</p>
<p>Nimbus Data, on the other hand, sells their all-flash enterprise storage array for about $10 per GB. But this is not the end of the story, and Pure might even be more expensive than Nimbus! Or maybe not. It all depends on what you&#8217;re comparing.</p>
<p>Pure claims that their cost is half the price of most comparable flash storage array competitors, but this is where the questions start to appear. Is that $5 gigabyte usable or raw? Does it include high availability? And can I really store any old gigabyte of data there or is that a compressed/deduplicated gigabyte?</p>
<p>It turns out that the real cost of Pure Storage capacity is $20 per GB including RAID and an extra mirrored array for high availability. But since every byte written to the array is thin provisioned, deduplicated, and compressed, many customers will pay much less for actual data stored. And since it&#8217;s an all-SSD array, it&#8217;ll perform way better than a disk-based system, too.</p>
<h3>Muddying the Waters</h3>
<p>So why not just call it $5 per GB and be done with it? It&#8217;s confusing, that&#8217;s why, and your mileage will vary widely.  Pure&#8217;s own slides show some applications getting 4:1 data reduction and others all the way up to 17:1. So these applications would be paying as low as $1.18 per GB or as high as $5.</p>
<p>But you can&#8217;t buy just 1 GB of storage from Pure. Their smallest array (which includes one controller and one shelf of SSDs) provides 5.5 TB of raw capacity, presumably using 24 256 GB SSDs. A high-availability configuration would include two controllers and two shelves of SSDs for 11 TB of raw storage. That&#8217;s going to cost almost a quarter of a million dollars according to my calculator. That&#8217;s one expensive gigabyte!</p>
<p>Of course no one would buy this array to store just a thousand megabytes. They would buy it to support a bunch of applications that need capacity and performance and efficiency and integration and everything else. And they can buy a mirrored pair of arrays from Pure Storage or Nimbus or Violin Memory or Texas Memory Systems or others at a variety of price points.</p>
<p>The only way to really compare these products is to spec them out on equal footing and see what the price tag looks like. These comparisons would include data reduction, but they would also have to bring in high availability and every other requirement of the applications they will support.</p>
<h3>Stephen&#8217;s Stance</h3>
<p>It&#8217;s way too difficult for me to do the pricing math for these systems, so I&#8217;m throwing in the towel. I&#8217;m thrilled to see all-flash arrays made available to IT buyers. This wouldn&#8217;t be possible without clever use of thin provisioning and data reduction, as well as smart software to overcome the limits of SSD.</p>
<p>I&#8217;m going to guess that Pure and Nimbus will cost about the same for similar configurations, though I&#8217;ll bet each believes they&#8217;re cheaper. Rather than get in the middle, I invite each company to post a comment below stating their case. I&#8217;ll even embed their responses into a future blog post on the subject if they get too long. Just don&#8217;t ask me to be the referee.</p>
<blockquote><p>Update: Pure Storage responds with an outline of their pricing:</p>
<ul>
<li><a href="http://www.purestorage.com/blog/how-pure-storage-delivers-all-flash-storage-at-below-the-price-of-spinning-disk/" >How Pure Storage Delivers All-Flash Storage at Below the Price of Spinning Disk</a></li>
</ul>
</blockquote>
<p><em>Image credit: Rotten Apple by <a rel="nofollow" href="http://www.flickr.com/photos/wappas/" >Wappas</a></em></p>
<div id="crp_related"><h3>You might also want to read these other posts...</h3><ul><li><a href="http://blog.fosketts.net/2011/08/23/pure-storage-flasharray-ssd-storage-array/"  rel="bookmark" class="crp_title">Pure Storage All-Flash Storage Array Revealed</a></li><li><a href="http://blog.fosketts.net/2012/01/31/nimbus-eclass-big-redundant-allflash-enterprise-array/"  rel="bookmark" class="crp_title">Nimbus E-Class: The First Big, Redundant, All-Flash Enterprise Array</a></li><li><a href="http://blog.fosketts.net/2008/11/08/flash-forward-flash-back/"  rel="bookmark" class="crp_title">Flash Forward or Flash Back?</a></li><li><a href="http://blog.fosketts.net/2008/08/28/grapples-tangelos-impossible-compare-fairly/"  rel="bookmark" class="crp_title">Grapples and Tangelos: Why it&#8217;s Impossible to Compare Fairly</a></li><li><a href="http://blog.fosketts.net/2008/01/14/flash-emcs-dmx-is-the-new-new-thing-again/"  rel="bookmark" class="crp_title">Flash!  EMC&#8217;s DMX is the New New Thing Again</a></li></ul></div><script src="http://feeds.feedburner.com/~s/sfoskett?i=http://blog.fosketts.net/2011/08/25/pricing-squishy-competition-heats/" type="text/javascript" charset="utf-8"></script><hr />
<p><small>© sfoskett for <a href="http://blog.fosketts.net">Stephen Foskett, Pack Rat</a>, 2011. |
<a href="http://blog.fosketts.net/2011/08/25/pricing-squishy-competition-heats/">When Pricing Gets Squishy Competition Heats Up</a>
<br/>
This post was categorized as <a href="http://blog.fosketts.net/category/everything/enterprisestorage/" title="View all posts in Enterprise storage" rel="category tag">Enterprise storage</a>, <a href="http://blog.fosketts.net/category/everything/personal/" title="View all posts in Personal" rel="category tag">Personal</a>, <a href="http://blog.fosketts.net/category/everything/virtualstorage/" title="View all posts in Virtual Storage" rel="category tag">Virtual Storage</a>. Each of my categories has its own feed if you'd like to filter out or focus on posts like this.<br/>
</small></p>]]></content:encoded>
			<wfw:commentRss>http://blog.fosketts.net/2011/08/25/pricing-squishy-competition-heats/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>How Many Social Media Marketing Campaigns Fit Into a Mini Cooper?</title>
		<link>http://blog.fosketts.net/2011/07/29/social-media-marketing-campaigns-fit-mini-cooper/</link>
		<comments>http://blog.fosketts.net/2011/07/29/social-media-marketing-campaigns-fit-mini-cooper/#comments</comments>
		<pubDate>Fri, 29 Jul 2011 16:45:26 +0000</pubDate>
		<dc:creator>Stephen</dc:creator>
				<category><![CDATA[Enterprise storage]]></category>
		<category><![CDATA[Personal]]></category>
		<category><![CDATA[deduplication]]></category>
		<category><![CDATA[EMC]]></category>
		<category><![CDATA[marketing]]></category>
		<category><![CDATA[Symantec]]></category>
		<category><![CDATA[video]]></category>
		<category><![CDATA[YouTube]]></category>

		<guid isPermaLink="false">http://blog.fosketts.net/?p=5968</guid>
		<description><![CDATA[I've witnessed quite a few publicity stunts from IT industry companies, many of which include over-the-top videos. But it's rare to find one that's actually amusing and informative. That's why I was so pleased to discover a new video from Symantec on YouTube: It's silly and fun, well produced, and actually tells us something about data de-duplication! Take a look yourself, and let me know what you think.]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve witnessed quite a few publicity stunts from IT industry companies, many of which include over-the-top videos. But it&#8217;s rare to find one that&#8217;s actually amusing and informative. That&#8217;s why I was so pleased to discover <a rel="nofollow" href="http://www.youtube.com/watch?v=pSeZtPOQ6Fo" >a new video from Symantec</a> on YouTube: It&#8217;s silly and fun, well produced, and actually tells us something about data de-duplication! Take a look yourself, and let me know what you think.</p>
<p><iframe width="439" height="250" src="http://www.youtube.com/embed/pSeZtPOQ6Fo" frameborder="0" allowfullscreen></iframe><br />
<a rel="nofollow" href="http://www.youtube.com/watch?v=pSeZtPOQ6Fo" >A Better Way to Dedupe Data: Dedupe Everywhere</a></p>
<blockquote><p>You might also want to read <a href="http://foskettservices.com/2011/01/when-marketing-becomes-pointless/" >When Marketing Becomes Pointless</a> and <a href="http://foskettservices.com/2010/08/the-epidemiology-of-viral-videos/" >The Epidemiology of Viral Videos</a></p></blockquote>
<h3>Stephen&#8217;s Stance</h3>
<p>If you&#8217;ve been following the IT industry and my blog for a while, you probably recall <a href="http://foskettservices.com/2011/01/when-marketing-becomes-pointless/" >the Mini Cooper stunts</a> EMC pulled earlier in 2011: They packed a bunch of contortionists into a Mini at their New York event, while simultaneously parking a bunch of logo covered cars in front of NetApp headquarters. This was intended to demonstrate something about storage, but the exact point escaped me. In the end, the whole thing seems mean-spirited and pointless.</p>
<p>The Symantec video included here is an entirely different animal. It&#8217;s exactly the sort of thing I approve of: A lighthearted look at a serious technical topic with only a gentle poke at the “opposition”. And I bet producing this video was a lot less expensive than hiring an acrobat school! Let&#8217;s hear it for social media!</p>
<blockquote><p>Disclaimer: Symantec is a frequent presenter at <a href="http://TechFieldDay.com" >Tech Field Day</a> and I have <a href="http://FoskettServices.com" >worked with them</a> often on other projects. But this post is my own idea and I&#8217;m getting no compensation related to this video. I just liked it and wanted to spread the word!</p></blockquote>
<div id="crp_related"><h3>You might also want to read these other posts...</h3><ul><li><a href="http://blog.fosketts.net/2011/01/23/pile-interesting-links-january-21-2011/"  rel="bookmark" class="crp_title">Back From the Pile: Interesting Links, January 21, 2011</a></li><li><a href="http://blog.fosketts.net/2011/01/18/emc-taunts-netapp-counting-coup-poor-sportsmanship/"  rel="bookmark" class="crp_title">EMC Taunts NetApp: Counting Coup or Poor Sportsmanship?</a></li><li><a href="http://blog.fosketts.net/2011/05/27/pile-interesting-links-27-2011/"  rel="bookmark" class="crp_title">Back From the Pile: Interesting Links, May 27, 2011</a></li><li><a href="http://blog.fosketts.net/2010/04/12/youtube-flash-html5-desktop-safari-chrome-vimeo/"  rel="bookmark" class="crp_title">How To: Use YouTube Without Flash In Desktop Browsers</a></li><li><a href="http://blog.fosketts.net/2011/05/26/complete-backup-system-running-10-minutes/"  rel="bookmark" class="crp_title">You Really Can Have a Complete Backup System Up and Running in 10 Minutes!</a></li></ul></div><script src="http://feeds.feedburner.com/~s/sfoskett?i=http://blog.fosketts.net/2011/07/29/social-media-marketing-campaigns-fit-mini-cooper/" type="text/javascript" charset="utf-8"></script><hr />
<p><small>© sfoskett for <a href="http://blog.fosketts.net">Stephen Foskett, Pack Rat</a>, 2011. |
<a href="http://blog.fosketts.net/2011/07/29/social-media-marketing-campaigns-fit-mini-cooper/">How Many Social Media Marketing Campaigns Fit Into a Mini Cooper?</a>
<br/>
This post was categorized as <a href="http://blog.fosketts.net/category/everything/enterprisestorage/" title="View all posts in Enterprise storage" rel="category tag">Enterprise storage</a>, <a href="http://blog.fosketts.net/category/everything/personal/" title="View all posts in Personal" rel="category tag">Personal</a>. Each of my categories has its own feed if you'd like to filter out or focus on posts like this.<br/>
</small></p>]]></content:encoded>
			<wfw:commentRss>http://blog.fosketts.net/2011/07/29/social-media-marketing-campaigns-fit-mini-cooper/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>How Does Dropbox Store Data?</title>
		<link>http://blog.fosketts.net/2011/07/11/dropbox-data-format-deduplication/</link>
		<comments>http://blog.fosketts.net/2011/07/11/dropbox-data-format-deduplication/#comments</comments>
		<pubDate>Mon, 11 Jul 2011 16:30:09 +0000</pubDate>
		<dc:creator>Stephen</dc:creator>
				<category><![CDATA[Apple]]></category>
		<category><![CDATA[Enterprise storage]]></category>
		<category><![CDATA[Everything]]></category>
		<category><![CDATA[Personal]]></category>
		<category><![CDATA[Terabyte home]]></category>
		<category><![CDATA[cloud storage]]></category>
		<category><![CDATA[data deduplication]]></category>
		<category><![CDATA[deduplication]]></category>
		<category><![CDATA[Dropbox]]></category>
		<category><![CDATA[Mac OS X]]></category>
		<category><![CDATA[MD5]]></category>
		<category><![CDATA[SHA-1]]></category>
		<category><![CDATA[TrueCrypt]]></category>

		<guid isPermaLink="false">http://blog.fosketts.net/?p=5863</guid>
		<description><![CDATA[Dropbox recently clarified (via their blog and privacy policy) that they "de-duplicate" user files. This has been known for quite a while, and is obvious to anyone who's had a large file "upload" instantly. But how exactly does Dropbox store files? Are they really de-duplicated or just single-instanced? I set out to discover the answer.]]></description>
			<content:encoded><![CDATA[<p>Dropbox recently clarified (via their <a href="http://blog.dropbox.com/?p=846" >blog</a> and <a href="https://www.dropbox.com/terms#privacy" >privacy policy</a>) that they &#8220;de-duplicate&#8221; user files. This has been known for quite a while, and is obvious to anyone who&#8217;s had a large file &#8220;upload&#8221; instantly. But how exactly does Dropbox store files? Are they really de-duplicated or just single-instanced? I set out to discover the answer.</p>
<h3>Single Instance Storage</h3>
<p>It&#8217;s fairly simple for a system to eliminate duplicate data by storing only a single instance of multiple identical files. In other words, if you and I both upload &#8220;Presentation.pptx&#8221; and it&#8217;s bit-for-bit identical, it would be a simple matter to store just one copy.</p>
<p>Dropbox definitely does this. I proved it with a simple experiment:</p>
<ol>
<li>Create a new 10 MB encrypted disk image in TrueCrypt (so it&#8217;ll be 100% unique, random data)</li>
<li>Move it to the Dropbox folder and wait a few minutes as it uploads</li>
<li>Copy the file with a new name to the folder and notice that it &#8220;uploads&#8221; instantly</li>
</ol>
<p>Dropbox is at least single-instancing storage. This helps users, since it speeds uploads and reduces bandwidth usage. It helps Dropbox in the same way, but goes further since they still &#8220;charge&#8221; files against your account whether they&#8217;re single-instanced or not.</p>
<p>Note that this single-instancing works across users and geographies. I gave a file to a friend to upload to a different Dropbox account, and saw the same &#8220;acceleration effect.&#8221; This would be quite useful to users and the company for files like iTunes songs which are identical and widespread.</p>
<h3>Clashing MD5 Hashes?</h3>
<div id="attachment_5866" class="wp-caption aligncenter" style="width: 310px;  border: 1px solid #dddddd; background-color: #f3f3f3; padding-top: 4px; margin: 10px; text-align:center; display: block; margin-right: auto; margin-left: auto;"><a href="http://static.fosketts.net/wp-content/uploads/2011/07/HashClash.png" ><img class="size-medium wp-image-5866" title="HashClash" src="http://static.fosketts.net/wp-content/uploads/2011/07/HashClash-300x64.png" alt="" width="300" height="64" /></a><p style=' padding: 0 4px 5px; margin: 0;'  class="wp-caption-text">Three files with identical sizes and MD5 hashes but different names? Creepy!</p></div>
<p>A global single-instance storage system sounds great, but it opens the door to hash collision issues. Imagine if you and I both uploaded identical files. Both would have the same &#8220;fingerprint&#8221; and Dropbox would only store it once. Now imagine instead that, out of coincidence or malice, I uploaded a file with the same fingerprint as yours but different contents. This is <a href="http://www.schneier.com/blog/archives/2005/02/cryptanalysis_o.html" >not so far-fetched as it seems</a>, and could lead to all sorts of security nightmares.</p>
<p>A common and compromised file checksum method is MD5, so I decided to test how Dropbox handles files of identical size, name, and MD5 hash using the &#8220;<a href="http://www.win.tue.nl/hashclash/Nostradamus/" >Nostradamus Attack</a>&#8221; PDFs generated by Marc Stevens. My tests show that Dropbox correctly handled the files I tried, and no combination of uploading and naming could force it to incorrectly store the right file. So Dropbox either doesn&#8217;t use MD5 or uses a combination of hashing and other mechanisms. Testing other schemes is left as an exercise to the reader.</p>
<p>One more thought: The fact that de-duplication is mentioned in the &#8220;privacy&#8221; section of the Dropbox policies raises my eyebrows, since it indicates that they see this hash collisions as a matter of privacy rather than data corruption. This indicates that Dropbox is both aware of and susceptible to hash collision attacks generally, though obviously not as simply as creating a bogus MD5 match.</p>
<blockquote><p>Note: Dropbox is well aware of this issue, having <a href="http://razorfast.com/2011/04/25/dropbox-attempts-to-kill-open-source-project/" >recently squashed</a> an open-source exploit called <a href="http://forwardfeed.pl/index.php/2011/04/24/dropship-successor-to-torrents-eng/" >Dropship</a>!</p></blockquote>
<h3>Sub-File De-Duplication</h3>
<p>Data de-duplication is like single-instancing, but it applies to some subset of data. Some enterprise storage systems de-duplicate at multi-megabyte levels, while others are far more granular.</p>
<p>To test whether Dropbox de-duplicates data, I devised a simple experiment:</p>
<ol>
<li>Create a new local copy of my existing random TrueCrypt file</li>
<li>Add a single byte to the end using the &#8220;cat&#8221; command</li>
<li>Copy the resulting file to Dropbox</li>
<li>Watch as Dropbox takes just a few seconds to upload the new file</li>
</ol>
<p>This test proves that Dropbox does indeed de-duplicate at the sub-file level. Since it took a bit longer to upload that would be expected for a single byte, we can see that Dropbox &#8220;chunks&#8221; files for hashing and uploading.</p>
<h3>De-Duplication Granularity</h3>
<p>The next question is just what size chunks or blocks Dropbox uses to de-duplicate data. To test this, I created various blocks of random data using TrueCrypt and experimented to see where the &#8220;stair-steps&#8221; were in terms of upload time.</p>
<p>My tests used four basic building blocks of 512 KB, 1024 KB, 2048 KB, and 4096 KB in size. Guessing that Dropbox used one of these sizes for their chunking system, I assumed these would quickly demonstrate the answer.</p>
<div id="attachment_5870" class="wp-caption aligncenter" style="width: 310px;  border: 1px solid #dddddd; background-color: #f3f3f3; padding-top: 4px; margin: 10px; text-align:center; display: block; margin-right: auto; margin-left: auto;"><a href="http://static.fosketts.net/wp-content/uploads/2011/07/Comparison-of-Dropbox-Transfer-Time-for-Various-Concatenated-Object-Sizes.jpg" ><img class="size-medium wp-image-5870" title="Comparison of Dropbox Transfer Time for Various Concatenated Object Sizes" src="http://static.fosketts.net/wp-content/uploads/2011/07/Comparison-of-Dropbox-Transfer-Time-for-Various-Concatenated-Object-Sizes-300x202.jpg" alt="" width="300" height="202" /></a><p style=' padding: 0 4px 5px; margin: 0;'  class="wp-caption-text">On my Mac, Dropbox clearly uses a 4 MB &quot;chunk&quot; size for deduplication</p></div>
<p>First, I uploaded each file individually and watched as Dropbox took about 30 seconds per MB. This will vary greatly, of course, but the absolute performance doesn&#8217;t matter. Only relative performance matters for demonstrating chunking.</p>
<p>Next, I concatenated each file with itself to create a new file twice as large. This would be ideally &#8220;chunkable&#8221; since it consists of exactly identical data with a nice, clean, evenly-divisible &#8220;border&#8221;. I uploaded each of these and noticed that the &#8220;4096 KB x 2&#8243; file uploaded nearly instantly, while all others took the expected amount of time.</p>
<p>I repeated this with &#8220;x 3&#8243;, &#8220;x 4&#8243;, and &#8220;x 8&#8243; files and noticed that the 4096 KB (4 MB) &#8220;barrier&#8221; was very obvious. Whenever a file contained 4096 KB or less of data Dropbox had seen before, it single-instanced it. Any time it saw a unique &#8220;block&#8221; smaller than this, it uploaded it fresh.</p>
<p>This proves, at least in the case of my own Mac OS X install of Dropbox, that a 4 MB chunk size is used for de-duplication.</p>
<h3> Stephen&#8217;s Stance</h3>
<p>Dropbox is a very useful service, and I appreciate the technology they use to make it work. By single-instancing storage, the company is able to keep costs and transfer time in check and offer a basic service for free for many users. Despite the recent security issue, I continue to use Dropbox myself and would not hesitate to recommend it. But I do suggest using your own encryption for any sensitive data, as demonstrated in my recent post, <a href="http://blog.fosketts.net/2011/07/05/mac-dropbox-encrypted-volume/" >Mac Users, Secure Your Stuff in Dropbox</a>.</p>
<p>I remain somewhat concerned about the privacy and security implications of global de-duplication of shared random data. If they use SHA-1 hashes alone, which I suspect, there is a chance that an object will not be stored correctly once 2^80 (or perhaps <a href="http://www.schneier.com/blog/archives/2005/02/sha1_broken.html" >2^69</a> or even <a rel="nofollow" href="http://lukenotricks.blogspot.com/2009/05/cost-of-sha-1-collisions-reduced-to-252.html" >2^52</a>) objects are stored. This would lead to issues of data corruption or inadvertent disclosure. This is a very remote chance indeed, but &#8220;<a rel="nofollow" href="http://en.wikipedia.org/wiki/Birthday_problem" >birthday problems</a>&#8221; like this work against hashing systems. I would love to hear from Dropbox regarding how they prevent this from happening, including disclosure of their methods of hashing data. It&#8217;s nice to see the company taking responsibility by disclosing this in their privacy policy, though!</p>
<blockquote><p>Update: Dropbox apparently does indeed use raw SHA256 hashes to &#8220;uniquely&#8221; identify data, and <a href="http://news.ycombinator.com/item?id=2478567" >this can be exploited in a number of ways</a>.</p></blockquote>
<div id="crp_related"><h3>You might also want to read these other posts...</h3><ul><li><a href="http://blog.fosketts.net/2011/07/05/mac-dropbox-encrypted-volume/"  rel="bookmark" class="crp_title">Mac Users, Secure Your Stuff in Dropbox</a></li><li><a href="http://blog.fosketts.net/2011/03/03/multiple-macs-sync-dropbox/"  rel="bookmark" class="crp_title">Keep Multiple Macs in Sync with Dropbox</a></li><li><a href="http://blog.fosketts.net/2011/03/01/google-dropbox-revolutionized-laptop-migration/"  rel="bookmark" class="crp_title">How Google and Dropbox Revolutionized My Laptop Migration</a></li><li><a href="http://blog.fosketts.net/2011/03/05/pile-interesting-links-march-4-2011/"  rel="bookmark" class="crp_title">Back From the Pile: Interesting Links, March 4, 2011</a></li><li><a href="http://blog.fosketts.net/2011/11/17/itunes-match-vbr-mp3-files-heres-fix/"  rel="bookmark" class="crp_title">iTunes Match Does Not Like VBR MP3 Files: Here&#8217;s How to Fix It</a></li></ul></div><script src="http://feeds.feedburner.com/~s/sfoskett?i=http://blog.fosketts.net/2011/07/11/dropbox-data-format-deduplication/" type="text/javascript" charset="utf-8"></script><hr />
<p><small>© sfoskett for <a href="http://blog.fosketts.net">Stephen Foskett, Pack Rat</a>, 2011. |
<a href="http://blog.fosketts.net/2011/07/11/dropbox-data-format-deduplication/">How Does Dropbox Store Data?</a>
<br/>
This post was categorized as <a href="http://blog.fosketts.net/category/everything/apple/" title="View all posts in Apple" rel="category tag">Apple</a>, <a href="http://blog.fosketts.net/category/everything/enterprisestorage/" title="View all posts in Enterprise storage" rel="category tag">Enterprise storage</a>, <a href="http://blog.fosketts.net/category/everything/" title="View all posts in Everything" rel="category tag">Everything</a>, <a href="http://blog.fosketts.net/category/everything/personal/" title="View all posts in Personal" rel="category tag">Personal</a>, <a href="http://blog.fosketts.net/category/everything/terabytehome/" title="View all posts in Terabyte home" rel="category tag">Terabyte home</a>. Each of my categories has its own feed if you'd like to filter out or focus on posts like this.<br/>
</small></p>]]></content:encoded>
			<wfw:commentRss>http://blog.fosketts.net/2011/07/11/dropbox-data-format-deduplication/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Storage Decisions Chicago: All About Capacity Optimization</title>
		<link>http://blog.fosketts.net/2011/05/27/storage-decisions-chicago/</link>
		<comments>http://blog.fosketts.net/2011/05/27/storage-decisions-chicago/#comments</comments>
		<pubDate>Fri, 27 May 2011 19:05:54 +0000</pubDate>
		<dc:creator>Stephen</dc:creator>
				<category><![CDATA[Enterprise storage]]></category>
		<category><![CDATA[Personal]]></category>
		<category><![CDATA[Virtual Storage]]></category>
		<category><![CDATA[compression]]></category>
		<category><![CDATA[data management]]></category>
		<category><![CDATA[data reduction]]></category>
		<category><![CDATA[deduplication]]></category>
		<category><![CDATA[Dell]]></category>
		<category><![CDATA[Storage Decisions]]></category>
		<category><![CDATA[storage virtualization]]></category>
		<category><![CDATA[TechTarget]]></category>
		<category><![CDATA[tiered storage]]></category>
		<category><![CDATA[volume management]]></category>
		<category><![CDATA[volume manager]]></category>

		<guid isPermaLink="false">http://blog.fosketts.net/?p=5548</guid>
		<description><![CDATA[Next month, I will be heading to Chicago for TechTarget's Storage Decisions conference. This show does a good job on the editorial side, suggesting timely topics and bringing in independent voices like Howard Marks. I will have three presentations to give: Sessions on data reduction and storage virtualization in the main conference track, as well as a dinner discussion focusing on controlling the growth of data. Registration is free for qualified end-users, and I urge you to attend.]]></description>
			<content:encoded><![CDATA[<div id="attachment_1093" class="wp-caption aligncenter" style="width: 236px;  border: 1px solid #dddddd; background-color: #f3f3f3; padding-top: 4px; margin: 10px; text-align:center; display: block; margin-right: auto; margin-left: auto;"><a href="http://blog.fosketts.net/wp-content/uploads/2008/11/img_0028.jpg" ><img class="size-medium wp-image-1093" title="Storage Decisions" src="http://blog.fosketts.net/wp-content/uploads/2008/11/img_0028-226x300.jpg" alt="" width="226" height="300" /></a><p style=' padding: 0 4px 5px; margin: 0;'  class="wp-caption-text">Join me in Chicago for Storage Decisions, June 21</p></div>
<p>Next month, I will be heading to Chicago for <a href="http://storagedecisions.techtarget.com/chicago/index.html?Offer=Foskett" >TechTarget&#8217;s Storage Decisions conference</a>. This show does a good job on the editorial side, suggesting timely topics and bringing in independent voices like Howard Marks. I will have three presentations to give: Sessions on data reduction and storage virtualization in the main conference track, as well as a dinner discussion focusing on controlling the growth of data. <a href="http://registration.techtarget.com/events/register.do?name=storagedecisionschicago&amp;offer=Foskett" >Registration is free</a> for qualified end-users, and I urge you to attend on June 21, 2011.</p>
<h3>Reclaim Capacity with Data Reduction for Primary Storage</h3>
<blockquote><p>Depending on which industry study you read, most companies are wasting anywhere from 30% to 50% of their installed disk capacity, which translates into thousands of dollars spent with no effective return on investment. Storage vendors are beginning to provide tools that can help storage managers make the most of the disk they have installed. For example, data reduction for primary storage borrows data deduplication technology developed for backup and classic compression algorithms to help squeeze the air out of nearline and primary data and reduce its footprint. This session&#8217;s topics will include an overview of data reduction technologies and where they will have the greatest impact, what key storage vendors are offering in data reduction and an update on the major players, and the consequences of using primary data dedupe along with dedupe for backups. We&#8217;ll also look at the potential for vendor lock-in and consider why we’re reducing data in the first place.</p>
<p>Topics include:</p>
<ul>
<li>Introducing data reduction technologies
<ul>
<li>Compression: How it works and where it’s found</li>
<li>Deduplication: From single-instancing to variable block</li>
<li>Application-specific: Cracking open files</li>
</ul>
</li>
<li>Overview of data reduction products</li>
<li>Where to use them
<ul>
<li>The capacity conundrum: Store less and reduce utilization</li>
<li>Ideal applications: Justifying the cost of data reduction</li>
<li>Side effects: Considering the impact on backup, replication, I/O workload and vendor lock-in</li>
</ul>
</li>
</ul>
</blockquote>
<h3>Storage Virtualization: Who’s Doing It and Why</h3>
<blockquote><p>Storage virtualization has been around for decades and, although research indicates that 70% of companies have already virtualized at least some of their installed block or file storage, most remain unaware of this technology. Grandiose schemes for comprehensive virtual SANs have given way to more practical host- and array-based virtualization technologies, and server virtualization has created a new opportunity to create a pool of storage. This session will look at the current state of storage virtualization, how to quantify its benefits and describe which approaches are best for particular environments, and also cover how storage virtualization compares to private storage clouds.</p>
<p>Topics include:</p>
<ul>
<li>Defining storage virtualization: What it is and where to find it
<ul>
<li>Abstraction of storage resources</li>
<li>Tiered storage</li>
<li>Flexibility</li>
</ul>
</li>
<li>Popular approaches to storage virtualization
<ul>
<li>SAN controllers</li>
<li>File virtualization</li>
<li>Volume managers</li>
</ul>
</li>
<li>The pool, the hypervisor and the cloud
<ul>
<li>The impact of server virtualization</li>
<li>Is this a private cloud?</li>
</ul>
</li>
</ul>
</blockquote>
<h3>Cutting Off Data Growth at the Disk</h3>
<blockquote><p>In this special dinner presentation, Stephen Foskett will discuss how to apply key data management technologies to arrest the growth of data. You’ll learn how capacity optimization technologies such as data deduplication and compression can reduce the trajectory of data growth as well as how tiering can reduce the cost of storage. Finally, Stephen will explore why the time may have finally come for active archiving and will leave you with practical ways to help your corporation better manage its data.</p></blockquote>
<p>Note that space is limited for the dinner, which is sponsored by my friends at Dell.</p>
<h3>Registration</h3>
<p>To register for Storage Decisions Chicago, just go to <a href="http://registration.techtarget.com/events/register.do?name=storagedecisionschicago&amp;offer=Foskett" >the TechTarget registration page</a>. Dinner guests will apparently be selected from that same pool of attendees.</p>
<blockquote><p>Disclosure: TechTarget pays my expenses to attend and present at Storage Decisions, and has for many years. I also get a speaker fee for the dinner session.</p></blockquote>
<div id="crp_related"><h3>You might also want to read these other posts...</h3><ul><li><a href="http://blog.fosketts.net/2011/09/02/storage-decisions-york-capacity-optimization/"  rel="bookmark" class="crp_title">Storage Decisions New York: Capacity Optimization</a></li><li><a href="http://blog.fosketts.net/2011/11/07/storage-decisions-san-francisco-2011-optimization-virtualization/"  rel="bookmark" class="crp_title">Storage Decisions San Francisco 2011: Optimization and Virtualization</a></li><li><a href="http://blog.fosketts.net/2011/05/17/5477/"  rel="bookmark" class="crp_title"></a></li><li><a href="http://blog.fosketts.net/2011/09/22/data-reduction-condensed-version/"  rel="bookmark" class="crp_title">Data Reduction: the Condensed Version</a></li><li><a href="http://blog.fosketts.net/2008/04/10/chicago-in-may-perfect-for-storage-virtualization-and-email-archiving-talks/"  rel="bookmark" class="crp_title">Chicago in May?  Perfect for Storage Virtualization and Email Archiving Talks!</a></li></ul></div><script src="http://feeds.feedburner.com/~s/sfoskett?i=http://blog.fosketts.net/2011/05/27/storage-decisions-chicago/" type="text/javascript" charset="utf-8"></script><hr />
<p><small>© sfoskett for <a href="http://blog.fosketts.net">Stephen Foskett, Pack Rat</a>, 2011. |
<a href="http://blog.fosketts.net/2011/05/27/storage-decisions-chicago/">Storage Decisions Chicago: All About Capacity Optimization</a>
<br/>
This post was categorized as <a href="http://blog.fosketts.net/category/everything/enterprisestorage/" title="View all posts in Enterprise storage" rel="category tag">Enterprise storage</a>, <a href="http://blog.fosketts.net/category/everything/personal/" title="View all posts in Personal" rel="category tag">Personal</a>, <a href="http://blog.fosketts.net/category/everything/virtualstorage/" title="View all posts in Virtual Storage" rel="category tag">Virtual Storage</a>. Each of my categories has its own feed if you'd like to filter out or focus on posts like this.<br/>
</small></p>]]></content:encoded>
			<wfw:commentRss>http://blog.fosketts.net/2011/05/27/storage-decisions-chicago/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>You Really Can Have a Complete Backup System Up and Running in 10 Minutes!</title>
		<link>http://blog.fosketts.net/2011/05/26/complete-backup-system-running-10-minutes/</link>
		<comments>http://blog.fosketts.net/2011/05/26/complete-backup-system-running-10-minutes/#comments</comments>
		<pubDate>Thu, 26 May 2011 21:48:19 +0000</pubDate>
		<dc:creator>Stephen</dc:creator>
				<category><![CDATA[Enterprise storage]]></category>
		<category><![CDATA[Everything]]></category>
		<category><![CDATA[Personal]]></category>
		<category><![CDATA[Terabyte home]]></category>
		<category><![CDATA[backup]]></category>
		<category><![CDATA[deduplication]]></category>
		<category><![CDATA[Druva]]></category>
		<category><![CDATA[Foskett Services]]></category>
		<category><![CDATA[InSync]]></category>
		<category><![CDATA[Phoenix]]></category>
		<category><![CDATA[SafePoint]]></category>
		<category><![CDATA[Tech Field Day]]></category>
		<category><![CDATA[YouTube]]></category>

		<guid isPermaLink="false">http://blog.fosketts.net/?p=5531</guid>
		<description><![CDATA[I recently worked with Druva to produce a series of videos documenting the installation and configuration of InSync. As part of this process, I went through the entire roll out myself using virtual machines and real data. The result was eye-opening: InSync really does install in under 10 minutes!]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m not a fan of heavy-duty installs and bloatware, but it seems inevitable these days that any product worth having requires hours of setup time. This is especially true when it comes to client server applications like backup software: it&#8217;s so difficult to go from nothing to something. This is why people love simple applications like Time Machine. When <a href="http://www.druva.com/" >Druva</a> announced that their <a href="http://www.druva.com/insync/laptop-backup" >InSync</a> remote backup application weighed in at under 50 MB and could be installed in under 10 minutes, they really got my attention.</p>
<h3>Introducing InSync</h3>
<p>Druva InSync is not a general-purpose enterprise backup application, though it is targeted at the business market. Instead of going head-to-head with successful enterprise backup applications like Symantec backup exec and net backup, EMC legato, IBM TSM, CommVault, and the like, Druva wisely found a different niche.</p>
<p>Everyone hates their remote backup application, and this is where Druva decided to focus. End-user laptop and desktop backup clients have a reputation for slamming performance and failing to protect data for machines on the go. A few years back, my work laptop at one of the leading remote backup solutions installed by corporate IT. I hated every time a backup kicked off, since I couldn&#8217;t get any work done until it was finished. So I secretly enjoyed the fact that it regularly failed to start. I guess like most end-users, I would rather get my work done than protect my data.</p>
<p>Druva InSync tackles these problems by focusing on simple installation and low resource requirements. The client install is only about 20 MB, and is configured to leverage the duplication and WAN acceleration technologies for quick backups. It also throttles CPU utilization and doesn&#8217;t rely on a fixed schedule like many solutions. The client can be remotely administered an updated once it is installed, keeping everything running smoothly, but users have some flexibility to add additional backups according to system policy.</p>
<p>The InSync server is remarkably tiny, weighing in at under 50 MB and installing in just a few minutes. While most backup applications require installing additional software, including Microsoft&#8217;s heavy-duty SQL Server, Druva InSync is completely self-contained. But it includes many advanced features, including SSD support and memory caching to improve de-duplication performance.</p>
<h3>Druva InSync Installation and Configuration</h3>
<p>I recently <a href="http://foskettservices.com/2011/05/foskett-services-video-content-druva-software/" >worked with Druva</a> to produce <a rel="nofollow" href="http://www.youtube.com/user/stephenfoskett#grid/user/A302B37CEEBF2A27" >a series of videos</a> documenting the installation and configuration of InSync. As part of this process, I went through the entire roll out myself using virtual machines and real data. The result was eye-opening: InSync really does install in under 10 minutes!</p>
<p><object width="500" height="306"><param name="movie" value="http://www.youtube.com/v/T0IspuNqwS0?version=3"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/T0IspuNqwS0?version=3" type="application/x-shockwave-flash" width="500" height="306" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
<p>My demonstration deployment included a Windows Server 2008 machine  and a Windows 7 client. Both ran under VMware Fusion but were configured with realistic CPU and memory footprints to approximate a real world environment. In my first video, I download and install the Druva InSync server and client software, and get everything up and running.</p>
<p><object width="500" height="306"><param name="movie" value="http://www.youtube.com/v/fay8Zfp4nC8?version=3"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/fay8Zfp4nC8?version=3" type="application/x-shockwave-flash" width="500" height="306" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
<p>My next video includes some more advanced configuration topics. I create separate users for administration of accounts and profiles, discuss remote access and adding WAN network ports, and do some advanced storage configuration. I also discuss some of the best practice recommendations that the Druva folks told me about.</p>
<p><object width="500" height="306"><param name="movie" value="http://www.youtube.com/v/trF-0jFhLSQ?version=3"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/trF-0jFhLSQ?version=3" type="application/x-shockwave-flash" width="500" height="306" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
<p>Finally, my third video includes a discussion of reporting features as well as some troubleshooting ideas.</p>
<h3>Stephen&#8217;s Stance</h3>
<p>Installation and configuration of Druva InSync is impressive, but what it does once it&#8217;s installed is what really matters. In my tests, I was impressed by the performance of the de-duplicated in accelerated data transfer from the client to the server. According to the Druva InSync dashboard, I was getting almost 2 to one reduction in data transfer on the very first use, and this looks to get much better over time. Druva&#8217;s internal videos show massive data reduction for clients that have been running for a while.</p>
<p>This got me thinking about the possibilities of using a product like InSync to back up web servers at remote hosting providers, and application I&#8217;m eager to try out. Druva recently introduced Phoenix, a server backup product to do just this. The company also has recently introduced a data loss prevention component for InSync called <a href="http://www.druva.com/safepoint" >SafePoint</a>. It looks like I will get a chance to try this product out very soon!</p>
<blockquote><p>Disclaimer: My company, <a href="http://foskettservices.com" >Foskett Services</a>, was hired by Druva to produce these videos, and Druva sponsored <a href="http://techfieldday.com" >Tech Field Day</a>, an event I organize.</p></blockquote>
<div id="crp_related"><h3>You might also want to read these other posts...</h3><ul><li><a href="http://blog.fosketts.net/2011/05/27/pile-interesting-links-27-2011/"  rel="bookmark" class="crp_title">Back From the Pile: Interesting Links, May 27, 2011</a></li><li><a href="http://blog.fosketts.net/2011/02/22/pile-interesting-links-february-18-2011/"  rel="bookmark" class="crp_title">Back From the Pile: Interesting Links, February 18, 2011</a></li><li><a href="http://blog.fosketts.net/2011/03/08/select-virtual-server-backup-product/"  rel="bookmark" class="crp_title">How To Select a Virtual Server Backup Product?</a></li><li><a href="http://blog.fosketts.net/2010/10/05/commvault-simpana-9-backup-snapshot-cloud/"  rel="bookmark" class="crp_title">CommVault Simpana 9 Takes Backup To A New Level</a></li><li><a href="http://blog.fosketts.net/2011/05/23/mac-osx-lion-time-machine-local-snapshots/"  rel="bookmark" class="crp_title">Local Snapshots in Mac OS X Lion Time Machine: Is It A Good Idea?</a></li></ul></div><script src="http://feeds.feedburner.com/~s/sfoskett?i=http://blog.fosketts.net/2011/05/26/complete-backup-system-running-10-minutes/" type="text/javascript" charset="utf-8"></script><hr />
<p><small>© sfoskett for <a href="http://blog.fosketts.net">Stephen Foskett, Pack Rat</a>, 2011. |
<a href="http://blog.fosketts.net/2011/05/26/complete-backup-system-running-10-minutes/">You Really Can Have a Complete Backup System Up and Running in 10 Minutes!</a>
<br/>
This post was categorized as <a href="http://blog.fosketts.net/category/everything/enterprisestorage/" title="View all posts in Enterprise storage" rel="category tag">Enterprise storage</a>, <a href="http://blog.fosketts.net/category/everything/" title="View all posts in Everything" rel="category tag">Everything</a>, <a href="http://blog.fosketts.net/category/everything/personal/" title="View all posts in Personal" rel="category tag">Personal</a>, <a href="http://blog.fosketts.net/category/everything/terabytehome/" title="View all posts in Terabyte home" rel="category tag">Terabyte home</a>. Each of my categories has its own feed if you'd like to filter out or focus on posts like this.<br/>
</small></p>]]></content:encoded>
			<wfw:commentRss>http://blog.fosketts.net/2011/05/26/complete-backup-system-running-10-minutes/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Processing and Scheduling Thin Provisioning</title>
		<link>http://blog.fosketts.net/2011/02/22/processing-scheduling-thin-provisioning/</link>
		<comments>http://blog.fosketts.net/2011/02/22/processing-scheduling-thin-provisioning/#comments</comments>
		<pubDate>Tue, 22 Feb 2011 15:41:50 +0000</pubDate>
		<dc:creator>Stephen</dc:creator>
				<category><![CDATA[Computer History]]></category>
		<category><![CDATA[Enterprise storage]]></category>
		<category><![CDATA[Everything]]></category>
		<category><![CDATA[Virtual Storage]]></category>
		<category><![CDATA[cron]]></category>
		<category><![CDATA[deduplication]]></category>
		<category><![CDATA[IBM]]></category>
		<category><![CDATA[in-line]]></category>
		<category><![CDATA[Nimbus]]></category>
		<category><![CDATA[Nimbus Data]]></category>
		<category><![CDATA[post-processing]]></category>
		<category><![CDATA[SVC]]></category>
		<category><![CDATA[thin provisioning]]></category>
		<category><![CDATA[Thin Reclamation]]></category>

		<guid isPermaLink="false">http://blog.fosketts.net/?p=4640</guid>
		<description><![CDATA[Although the core issues with thin provisioning revolve around communication, it presents unique challenges to the storage array as well. We talked about granularity of pages, and the comments for that piece were extremely enlightening. Now let's consider another key factor: Scheduling.]]></description>
			<content:encoded><![CDATA[<p><a href="http://static.fosketts.net/wp-content/uploads/2010/12/Slide01.jpg"><img style=' display: block; margin-right: auto; margin-left: auto;'  class="aligncenter size-medium wp-image-4606" title="Slide01" src="http://static.fosketts.net/wp-content/uploads/2010/12/Slide01-300x225.jpg" alt="" width="300" height="225" /></a>

One of the topics I've often written and spoken about is thin provisioning. This series of 11 articles is an edited version of <a href="http://www.slideshare.net/sfoskett/state-of-the-art-thin-provisioning" target="_blank">my thin provisioning presentation from Interop New York 2010</a>. I hope you enjoy it!</p>
<p>Although the core issues with thin provisioning revolve around communication, it presents unique challenges to the storage array as well. We talked about <a href="http://blog.fosketts.net/2011/01/10/granularity-thin-provisioning-approaches/"  target="_blank">granularity of pages</a>, and the comments for that piece were extremely enlightening. Now let&#8217;s consider another key factor: Scheduling.</p>
<p>Note that the &#8220;provisioning&#8221; part is relatively easy to do on the fly: An array just has to allocate additional capacity as writes come in, which is something it does anyway. It&#8217;s the thin reclamation that poses a challenge, since this involves zero detection across a whole page of data in many cases.</p>
<p><a href="http://static.fosketts.net/wp-content/uploads/2010/12/Slide21.jpg" ><img style=' display: block; margin-right: auto; margin-left: auto;'  class="aligncenter size-medium wp-image-4586" title="Slide21" src="http://static.fosketts.net/wp-content/uploads/2010/12/Slide21-300x225.jpg" alt="" width="300" height="225" /></a></p>
<p>Just like de-duplication, thin provisioning challenges the resources of the storage array to do background number crunching. And just like dedupe, the array engineers have a choice of when to do the reclamation processing: Well after writing or &#8220;in-line&#8221;. The extreme ends of this spectrum fall into two equally disappointing categories: Wholly ineffective or ridiculously intensive.</p>
<p>Let&#8217;s start with the &#8220;intensive&#8221; side: You could have the controller do thin provisioning automatically; that&#8217;s kind of what IBM does with SVC, for example, and 3PAR claims to do this too. The trouble is that the controller has to literally watch everything, and it&#8217;s got to reassemble whole pages, perhaps 42 MB or even one GB in cache. If it didn&#8217;t have all that data, it would have to go fetch it, put it into cache, look at it, make sure it was all zeros, then get rid of it. It&#8217;s really, really difficult to do automatic, in-line, thin provisioning. It&#8217;s a good thing to do, but it&#8217;s a hard thing to do.</p>
<p>So most vendors schedule thinning for later. In the &#8220;10 terabytes of zeros&#8221; example, they&#8217;re actually going to write 10 terabytes to disk, or at least through to cache. Then, at some point in the future, they&#8217;ll go back and reclaim that space. Some are pretty aggressive and reclaim capacity very frequently. Others are fairly lazy: The Drobo seems to reclaim only once or twice a day. A lot of people who have them are surprised when the thing springs to life and starts going, &#8220;Bada-bada-bada-bada-bada-bada.&#8221; Apparently it&#8217;s reclaiming storage at that time.</p>
<p>Some thin provisioning systems are even manually-initiated, and this is really pretty ineffective. The storage administrator has better things to do than reclaim storage all the time, so they are probably going to set a cron job to do it regularly at a specified time. If the system only does it on demand, that means that it doesn&#8217;t have the horsepower to do it automatically. Ergo, it&#8217;s sometimes going to conflict with &#8220;real work&#8221; and cause a problem.</p>
<p>I would look for a system that was fairly aggressive with thin reclamation. I was talking to the guys at <a href="http://www.nimbusdata.com/"  target="_blank">Nimbus Data</a>, for example, and <a href="http://www.nimbusdata.com/products/halo_benefits.html"  target="_blank">they claim</a> to do thin provisioning in-line all the time. I hope that we see more storage arrays that are doing that, and less that are doing it manually, on demand, because that&#8217;s just not as useful.</p>
<p>But considering that thin provisioning used to be almost useless, the fact that it&#8217;s now at least somewhat useful is gratifying.</p>
<div id="crp_related"><h3>You might also want to read these other posts...</h3><ul><li><a href="http://blog.fosketts.net/2007/07/30/how-thin-are-you/"  rel="bookmark" class="crp_title">How Thin Are You?</a></li><li><a href="http://blog.fosketts.net/2011/01/06/bridge-veritas-thin-provisioning-api/"  rel="bookmark" class="crp_title">The Bridge: Veritas Thin (Provisioning) API</a></li><li><a href="http://blog.fosketts.net/2011/01/04/page-reclaim-savior-thin-provisioning/"  rel="bookmark" class="crp_title">Zero Page Reclaim: Savior of Thin Provisioning?</a></li><li><a href="http://blog.fosketts.net/2011/01/05/write_same-green-eggs-ham/"  rel="bookmark" class="crp_title">What is WRITE_SAME? Green Eggs and Ham!</a></li><li><a href="http://blog.fosketts.net/2011/01/10/granularity-thin-provisioning-approaches/"  rel="bookmark" class="crp_title">Granularity of Thin Provisioning Approaches</a></li></ul></div><script src="http://feeds.feedburner.com/~s/sfoskett?i=http://blog.fosketts.net/2011/02/22/processing-scheduling-thin-provisioning/" type="text/javascript" charset="utf-8"></script><hr />
<p><small>© sfoskett for <a href="http://blog.fosketts.net">Stephen Foskett, Pack Rat</a>, 2011. |
<a href="http://blog.fosketts.net/2011/02/22/processing-scheduling-thin-provisioning/">Processing and Scheduling Thin Provisioning</a>
<br/>
This post was categorized as <a href="http://blog.fosketts.net/category/everything/computerhistory/" title="View all posts in Computer History" rel="category tag">Computer History</a>, <a href="http://blog.fosketts.net/category/everything/enterprisestorage/" title="View all posts in Enterprise storage" rel="category tag">Enterprise storage</a>, <a href="http://blog.fosketts.net/category/everything/" title="View all posts in Everything" rel="category tag">Everything</a>, <a href="http://blog.fosketts.net/category/everything/virtualstorage/" title="View all posts in Virtual Storage" rel="category tag">Virtual Storage</a>. Each of my categories has its own feed if you'd like to filter out or focus on posts like this.<br/>
</small></p>]]></content:encoded>
			<wfw:commentRss>http://blog.fosketts.net/2011/02/22/processing-scheduling-thin-provisioning/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<series:name><![CDATA[State of the Art Thin Provisioning]]></series:name>
	</item>
		<item>
		<title>See W. Curtis Preston&#8217;s Backup Central Live!</title>
		<link>http://blog.fosketts.net/2011/01/31/curtis-prestons-backup-central-live/</link>
		<comments>http://blog.fosketts.net/2011/01/31/curtis-prestons-backup-central-live/#comments</comments>
		<pubDate>Mon, 31 Jan 2011 19:11:00 +0000</pubDate>
		<dc:creator>Stephen</dc:creator>
				<category><![CDATA[Enterprise storage]]></category>
		<category><![CDATA[Gestalt IT]]></category>
		<category><![CDATA[Personal]]></category>
		<category><![CDATA[Virtual Storage]]></category>
		<category><![CDATA[AppAsure]]></category>
		<category><![CDATA[Aptare]]></category>
		<category><![CDATA[archive]]></category>
		<category><![CDATA[backup]]></category>
		<category><![CDATA[Backup Central]]></category>
		<category><![CDATA[Backup Central Live!]]></category>
		<category><![CDATA[CDP]]></category>
		<category><![CDATA[Cirtas]]></category>
		<category><![CDATA[cloud backup]]></category>
		<category><![CDATA[deduplication]]></category>
		<category><![CDATA[FalconStor]]></category>
		<category><![CDATA[Jacob Farmer]]></category>
		<category><![CDATA[NEC]]></category>
		<category><![CDATA[Quantum]]></category>
		<category><![CDATA[seminar]]></category>
		<category><![CDATA[Spectra Logic]]></category>
		<category><![CDATA[VTL]]></category>
		<category><![CDATA[W. Curtis Preston]]></category>

		<guid isPermaLink="false">http://blog.fosketts.net/?p=4842</guid>
		<description><![CDATA[Last week, after the Exec Event in Palo Alto, I joined my friend W. Curtis Preston for his first Backup Central Live! event. Curtis has spent years educating IT pros about data protection, this was the first week of a new series of self-produced events. And let me tell you, although I've seen him present dozens of times, Curtis was really in his element here. He held the packed room enthralled, and the vendor sponsors I talked to were very pleased about the event!]]></description>
			<content:encoded><![CDATA[<div id="attachment_4844" class="wp-caption aligncenter" style="width: 310px;  border: 1px solid #dddddd; background-color: #f3f3f3; padding-top: 4px; margin: 10px; text-align:center; display: block; margin-right: auto; margin-left: auto;"><a href="http://static.fosketts.net/wp-content/uploads/2011/01/Preston-Presenting-Backup-Central-Live.jpg" ><img class="size-medium wp-image-4844" title="W. Curtis Preston presents" src="http://static.fosketts.net/wp-content/uploads/2011/01/Preston-Presenting-Backup-Central-Live-300x145.jpg" alt="" width="300" height="145" /></a><p style=' padding: 0 4px 5px; margin: 0;'  class="wp-caption-text">W. Curtis Preston launched his own series of Backup Central Live! seminars for 2011</p></div>
<p>Last week, after the Exec Event in Palo Alto, I joined my friend W. Curtis Preston for his first <a href="http://BackupCentralLive.com"  target="_blank">Backup Central Live!</a> event. Curtis has spent years educating IT pros about data protection, this was the first week of a new series of self-produced events. And let me tell you, although I&#8217;ve seen him present dozens of times, Curtis was really in his element here. He held the packed room enthralled, and the vendor sponsors I talked to were very pleased about the event!</p>
<h3>Introducing Backup Central Live!</h3>
<p>The Backup Central Live! series are day-long seminars across the USA in 2011. Each event includes over 3 hours of content from &#8220;Mr. Backup&#8221;, W. Curtis Preston, as well as presentations from <a href="http://www.cambridgecomputer.com/management.cfm"  target="_blank">Jacob Farmer</a> and the sponsoring vendors. The seminars are free for qualified end-users, which includes most of the readers of this blog!</p>
<p>Curtis and company will cover the challenges of backing up and recovering data in a variety of settings:</p>
<ul>
<li>Virtualized servers (e.g. VMware, Hyper-V, Xen)</li>
<li>Very large servers and data centers</li>
<li>Remote offices and laptops</li>
<li>Data retained for multiple years</li>
</ul>
<p>The session also includes technical detail about key products and technologies:</p>
<ul>
<li>Cloud Backup Services</li>
<li>Deduplication</li>
<li>Continuous data protection (CDP) and near-CDP</li>
<li>Archive software</li>
<li>Tape and its proper role</li>
</ul>
<p>Attendees even get free breakfast and lunch, which was of a good hotel caterer quality in my opinion.</p>
<h3>Stephen&#8217;s Stance</h3>
<div id="attachment_4843" class="wp-caption aligncenter" style="width: 310px;  border: 1px solid #dddddd; background-color: #f3f3f3; padding-top: 4px; margin: 10px; text-align:center; display: block; margin-right: auto; margin-left: auto;"><a href="http://static.fosketts.net/wp-content/uploads/2011/01/Backup-Central-Live-Staff.jpg" ><img class="size-medium wp-image-4843" title="Backup Central Live staff" src="http://static.fosketts.net/wp-content/uploads/2011/01/Backup-Central-Live-Staff-300x199.jpg" alt="" width="300" height="199" /></a><p style=' padding: 0 4px 5px; margin: 0;'  class="wp-caption-text">The Backup Central Live! crew does a great job putting together a professional event</p></div>
<p>I knew Curtis could put together quality backup content, but the crew deserves credit for such a professional and successful event. They attracted some great sponsors, too, including AppAsure, Aptare, FalconStor, NEC, Quantum, Spectra Logic, and Cirtas. And Jacob Farmer&#8217;s involvement was a pleasant surprise, too: I&#8217;ve always enjoyed the deep technical conversations I&#8217;ve had with him!</p>
<p>If you enjoyed my own backup, archiving, and storage seminars in the past, I know you&#8217;ll love this event. The next Backup Central Live! cities are as follows. If you&#8217;ll be around, you really ought to attend!</p>
<ul>
<li>Orlando, FL Feb 1 <a rel="nofollow" href="http://events.constantcontact.com/register/event?llr=45qwnieab&amp;oeidk=a07e37xl0rvcce6022b" >Register here</a></li>
<li>Houston, TX Feb 8 <a rel="nofollow" href="http://events.constantcontact.com/register/event?llr=45qwnieab&amp;oeidk=a07e37xl0uq787fee2b" >Register here</a><a rel="nofollow" href="http://events.constantcontact.com/register/event?llr=45qwnieab&amp;oeidk=a07e37xl0rvcce6022b" ></a></li>
<li>Chicago, IL Feb 22 <a rel="nofollow" href="http://events.constantcontact.com/register/event?llr=45qwnieab&amp;oeidk=a07e37xl0t1c1572d01" >Register here</a></li>
</ul>
<p>My only suggestion for the crew is that they get a bigger room next time!</p>
<div id="crp_related"><h3>You might also want to read these other posts...</h3><ul><li><a href="http://blog.fosketts.net/2011/04/02/curtis-preston-announces-backup-central-live/"  rel="bookmark" class="crp_title">W. Curtis Preston Announces More Backup Central Live!</a></li><li><a href="http://blog.fosketts.net/2011/04/09/pile-interesting-links-april-8-2011/"  rel="bookmark" class="crp_title">Back From the Pile: Interesting Links, April 8, 2011</a></li><li><a href="http://blog.fosketts.net/2008/12/22/techtarget-2009-event-schedule/"  rel="bookmark" class="crp_title">TechTarget Posts 2009 Event Schedule</a></li><li><a href="http://blog.fosketts.net/2011/12/20/wireless-field-day-2-san-jose/"  rel="bookmark" class="crp_title">Wireless Field Day 2 &#8211; Silicon Valley</a></li><li><a href="http://blog.fosketts.net/2011/02/07/pile-interesting-links-february-4-2011/"  rel="bookmark" class="crp_title">Back From the Pile: Interesting Links, February 4, 2011</a></li></ul></div><script src="http://feeds.feedburner.com/~s/sfoskett?i=http://blog.fosketts.net/2011/01/31/curtis-prestons-backup-central-live/" type="text/javascript" charset="utf-8"></script><hr />
<p><small>© sfoskett for <a href="http://blog.fosketts.net">Stephen Foskett, Pack Rat</a>, 2011. |
<a href="http://blog.fosketts.net/2011/01/31/curtis-prestons-backup-central-live/">See W. Curtis Preston&#8217;s Backup Central Live!</a>
<br/>
This post was categorized as <a href="http://blog.fosketts.net/category/everything/enterprisestorage/" title="View all posts in Enterprise storage" rel="category tag">Enterprise storage</a>, <a href="http://blog.fosketts.net/category/gestaltit/" title="View all posts in Gestalt IT" rel="category tag">Gestalt IT</a>, <a href="http://blog.fosketts.net/category/everything/personal/" title="View all posts in Personal" rel="category tag">Personal</a>, <a href="http://blog.fosketts.net/category/everything/virtualstorage/" title="View all posts in Virtual Storage" rel="category tag">Virtual Storage</a>. Each of my categories has its own feed if you'd like to filter out or focus on posts like this.<br/>
</small></p>]]></content:encoded>
			<wfw:commentRss>http://blog.fosketts.net/2011/01/31/curtis-prestons-backup-central-live/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

