<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:series="http://unfoldingneurons.com/"
	>

<channel>
	<title>Stephen Foskett, Pack Rat &#187; compression Archives  &#8211; Stephen Foskett, Pack Rat</title>
	<atom:link href="http://blog.fosketts.net/tag/compression/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.fosketts.net</link>
	<description>Understanding the accumulation of data</description>
	<lastBuildDate>Fri, 10 Feb 2012 17:40:43 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=</generator>
<atom:link rel="hub" href="http://pubsubhubbub.appspot.com" />
	<atom:link rel="hub" href="http://superfeedr.com/hubbub" />
			<item>
		<title>Storage Decisions San Francisco 2011: Optimization and Virtualization</title>
		<link>http://blog.fosketts.net/2011/11/07/storage-decisions-san-francisco-2011-optimization-virtualization/</link>
		<comments>http://blog.fosketts.net/2011/11/07/storage-decisions-san-francisco-2011-optimization-virtualization/#comments</comments>
		<pubDate>Mon, 07 Nov 2011 15:52:25 +0000</pubDate>
		<dc:creator>Stephen</dc:creator>
				<category><![CDATA[Enterprise storage]]></category>
		<category><![CDATA[Personal]]></category>
		<category><![CDATA[Virtual Storage]]></category>
		<category><![CDATA[compression]]></category>
		<category><![CDATA[data reduction]]></category>
		<category><![CDATA[Dennis Martin]]></category>
		<category><![CDATA[Jon Toigo]]></category>
		<category><![CDATA[Mark Staimer]]></category>
		<category><![CDATA[San Francisco]]></category>
		<category><![CDATA[Storage Decisions]]></category>
		<category><![CDATA[storage virtualization]]></category>
		<category><![CDATA[TechTarget]]></category>

		<guid isPermaLink="false">http://blog.fosketts.net/?p=6270</guid>
		<description><![CDATA[Tomorrow, I will be in San Francisco for TechTarget's Storage Decisions conference. This show does a good job on the editorial side, suggesting timely topics and bringing in folks like Dennis Martin, Mark Staimer, and Jon Toigo. I will have two presentations on data reduction and storage virtualization in the main conference track - both are updated from my New York sessions.]]></description>
			<content:encoded><![CDATA[<div id="attachment_6156" class="wp-caption aligncenter" style="width: 410px;  border: 1px solid #dddddd; background-color: #f3f3f3; padding-top: 4px; margin: 10px; text-align:center; display: block; margin-right: auto; margin-left: auto;"><img class="size-full wp-image-6156" title="Storage Decisions Chicago 2011" src="http://static.fosketts.net/wp-content/uploads/2011/09/SD-Chi-11.jpg" alt="" width="400" height="266" /><p style=' padding: 0 4px 5px; margin: 0;'  class="wp-caption-text">Join me in New York for Storage Decisions, September 19 &amp; 20</p></div>
<p>Tomorrow, I will be in San Francisco for <a href="http://storagedecisions.techtarget.com/sanfran/index.html" >TechTarget&#8217;s Storage Decisions conference</a>. This show does a good job on the editorial side, suggesting timely topics and bringing in folks like Dennis Martin, Mark Staimer, and Jon Toigo. I will have two presentations on data reduction and storage virtualization in the main conference track &#8211; both are updated from my New York sessions. <a href="http://registration.techtarget.com/events/register.do?name=storagedecisionssanfran" >Registration is free</a> for qualified end-users, and I urge you to attend.</p>
<h3>Reclaim Capacity with Data Reduction for Primary Storage</h3>
<p>I have updated the session with additional information on thin provisioning and compression, as well as expanding the slides to reflect many of the questions and comments I received in New York. The end result remains the same: I&#8217;m not sold on data reduction for primary storage as a product and recommend tackling data growth if at all possible. If it&#8217;s completely impractical to delete data, there are a few products that work well.</p>
<blockquote><p>Depending on which industry study you read, most companies are wasting anywhere from 30% to 50% of their installed disk capacity, which translates into thousands of dollars spent with no effective return on investment. Storage vendors are beginning to provide tools that can help storage managers make the most of the disk they have installed. For example, data reduction for primary storage borrows data deduplication technology developed for backup and classic compression algorithms to help squeeze the air out of nearline and primary data and reduce its footprint. This session&#8217;s topics will include an overview of data reduction technologies and where they will have the greatest impact, what key storage vendors are offering in data reduction and an update on the major players, and the consequences of using primary data dedupe along with dedupe for backups. We&#8217;ll also look at the potential for vendor lock-in and consider why we’re reducing data in the first place.</p>
<p>Topics include:</p>
<ul>
<li>Introducing data reduction technologies
<ul>
<li>Compression: How it works and where it’s found</li>
<li>Deduplication: From single-instancing to variable block</li>
<li>Application-specific: Cracking open files</li>
</ul>
</li>
<li>Overview of data reduction products</li>
<li>Where to use them
<ul>
<li>The capacity conundrum: Store less and reduce utilization</li>
<li>Ideal applications: Justifying the cost of data reduction</li>
<li>Side effects: Considering the impact on backup, replication, I/O workload and vendor lock-in</li>
</ul>
</li>
</ul>
</blockquote>
<h3>Storage Virtualization: Who’s Doing It and Why</h3>
<p>My storage virtualization session has been massively tweaked, including updated information on standalone virtualization products as well as a more in-depth discussion of successful and failed use cases. This session presents the conundrum of why server virtualization has been so successful while storage virtualization has failed for over a decade. I believe this is due to the problems these products try to solve. Consolidation of resources and reduction of administrator effort are noble goals, but not really compelling in the long term. Unless some real is this value can be extracted, storage virtualization will continue to be a failed product.</p>
<blockquote><p>Storage virtualization has been around for decades and, although research indicates that 70% of companies have already virtualized at least some of their installed block or file storage, most remain unaware of this technology. Grandiose schemes for comprehensive virtual SANs have given way to more practical host- and array-based virtualization technologies, and server virtualization has created a new opportunity to create a pool of storage. This session will look at the current state of storage virtualization, how to quantify its benefits and describe which approaches are best for particular environments, and also cover how storage virtualization compares to private storage clouds.</p>
<p>Topics include:</p>
<ul>
<li>Defining storage virtualization: What it is and where to find it
<ul>
<li>Abstraction of storage resources</li>
<li>Tiered storage</li>
<li>Flexibility</li>
</ul>
</li>
<li>Popular approaches to storage virtualization
<ul>
<li>SAN controllers</li>
<li>File virtualization</li>
<li>Volume managers</li>
</ul>
</li>
<li>The pool, the hypervisor and the cloud
<ul>
<li>The impact of server virtualization</li>
<li>Is this a private cloud?</li>
</ul>
</li>
</ul>
</blockquote>
<h3>Registration</h3>
<div id="attachment_6155" class="wp-caption aligncenter" style="width: 410px;  border: 1px solid #dddddd; background-color: #f3f3f3; padding-top: 4px; margin: 10px; text-align:center; display: block; margin-right: auto; margin-left: auto;"><img class="size-full wp-image-6155" title="Storage Decisions Chicago 2011" src="http://static.fosketts.net/wp-content/uploads/2011/09/SD-Chi-11-2.jpg" alt="" width="400" height="266" /><p style=' padding: 0 4px 5px; margin: 0;'  class="wp-caption-text">You can see the future from here!</p></div>
<p>To register for Storage Decisions San Francisco, just go to <a href="http://registration.techtarget.com/events/register.do?name=storagedecisionssanfran" >the TechTarget registration page</a>.</p>
<p>Disclosure: TechTarget pays my expenses to attend and present at Storage Decisions, and has for many years. But they don&#8217;t pay me to present and I own the copyright on my session content. Happily, I license it all <a href="http://creativecommons.org/licenses/by-nc-sa/3.0/" >CC-by-NC-SA</a> so I can give it out freely!</p>
<div id="crp_related"><h3>You might also want to read these other posts...</h3><ul><li><a href="http://blog.fosketts.net/2011/09/02/storage-decisions-york-capacity-optimization/"  rel="bookmark" class="crp_title">Storage Decisions New York: Capacity Optimization</a></li><li><a href="http://blog.fosketts.net/2011/05/27/storage-decisions-chicago/"  rel="bookmark" class="crp_title">Storage Decisions Chicago: All About Capacity Optimization</a></li><li><a href="http://blog.fosketts.net/2011/05/17/5477/"  rel="bookmark" class="crp_title"></a></li><li><a href="http://blog.fosketts.net/about/stephen-foskett/speaking-engagements/"  rel="bookmark" class="crp_title">Speaking Engagements</a></li><li><a href="http://blog.fosketts.net/2011/09/22/data-reduction-condensed-version/"  rel="bookmark" class="crp_title">Data Reduction: the Condensed Version</a></li></ul></div><script src="http://feeds.feedburner.com/~s/sfoskett?i=http://blog.fosketts.net/2011/11/07/storage-decisions-san-francisco-2011-optimization-virtualization/" type="text/javascript" charset="utf-8"></script><hr />
<p><small>© sfoskett for <a href="http://blog.fosketts.net">Stephen Foskett, Pack Rat</a>, 2011. |
<a href="http://blog.fosketts.net/2011/11/07/storage-decisions-san-francisco-2011-optimization-virtualization/">Storage Decisions San Francisco 2011: Optimization and Virtualization</a>
<br/>
This post was categorized as <a href="http://blog.fosketts.net/category/everything/enterprisestorage/" title="View all posts in Enterprise storage" rel="category tag">Enterprise storage</a>, <a href="http://blog.fosketts.net/category/everything/personal/" title="View all posts in Personal" rel="category tag">Personal</a>, <a href="http://blog.fosketts.net/category/everything/virtualstorage/" title="View all posts in Virtual Storage" rel="category tag">Virtual Storage</a>. Each of my categories has its own feed if you'd like to filter out or focus on posts like this.<br/>
</small></p>]]></content:encoded>
			<wfw:commentRss>http://blog.fosketts.net/2011/11/07/storage-decisions-san-francisco-2011-optimization-virtualization/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Data Reduction: the Condensed Version</title>
		<link>http://blog.fosketts.net/2011/09/22/data-reduction-condensed-version/</link>
		<comments>http://blog.fosketts.net/2011/09/22/data-reduction-condensed-version/#comments</comments>
		<pubDate>Thu, 22 Sep 2011 19:55:04 +0000</pubDate>
		<dc:creator>Stephen</dc:creator>
				<category><![CDATA[Enterprise storage]]></category>
		<category><![CDATA[Personal]]></category>
		<category><![CDATA[Virtual Storage]]></category>
		<category><![CDATA[Balesio]]></category>
		<category><![CDATA[compression]]></category>
		<category><![CDATA[data reduction]]></category>
		<category><![CDATA[deduplication]]></category>
		<category><![CDATA[FILEminimizer]]></category>
		<category><![CDATA[IBM]]></category>
		<category><![CDATA[SearchStorage]]></category>
		<category><![CDATA[Storage Decisions]]></category>
		<category><![CDATA[The Storage Community]]></category>

		<guid isPermaLink="false">http://blog.fosketts.net/?p=6208</guid>
		<description><![CDATA[Native Format Optimization (NFO) makes a lot of sense, since it addresses a common user error in a practical way, and allows capacity savings to “trickle-down” to backups, e-mail systems, and archives. But wholesale compression and the duplication of primary storage may not be worth much, especially since the cost of disk keeps dropping dramatically.]]></description>
			<content:encoded><![CDATA[<div id="attachment_6209" class="wp-caption aligncenter" style="width: 445px;  border: 1px solid #dddddd; background-color: #f3f3f3; padding-top: 4px; margin: 10px; text-align:center; display: block; margin-right: auto; margin-left: auto;"><img class="size-full wp-image-6209" title="Warning Do Not Remove Shields" src="http://static.fosketts.net/wp-content/uploads/2011/09/Warning-Do-Not-Remove-Shields.jpg" alt="" width="435" height="276" /><p style=' padding: 0 4px 5px; margin: 0;'  class="wp-caption-text">Data Reduction can be hazardous to your health!</p></div>
<p>I&#8217;m not a big fan of data reduction technology, yet I found myself talking compression and de-duplication all week. Between Storage Decisions and my recent posts over at <a href="http://searchstorage.techtarget.com/tip/Interest-in-data-reduction-methods-needs-to-keep-pace-with-data-growth#" >SearchStorage</a> and <a href="http://storagecommunity.org/blogs/stephenfoskett/archive/2011/09/07/has-the-time-finally-come-for-data-reduction.aspx" >The Storage Community</a>, I&#8217;ve had quite a bit to say on the subject. Funny enough, I&#8217;m not really a fan of data reduction technology for primary storage. Too often, data reduction is more expensive and difficult than just storing raw data.</p>
<blockquote><p>You should also read <a href="http://blog.fosketts.net/2008/09/16/deduplication-primary-storage/" >Deduplication Coming to Primary Storage</a> and <a href="http://blog.fosketts.net/2009/02/05/compression-encryption-deduplication-replication/" >Compression, Encryption, Deduplication, and Replication: Strange Bedfellows</a></p></blockquote>
<h3>Storage Decisions</h3>
<p><a href="http://blog.fosketts.net/2011/09/02/storage-decisions-york-capacity-optimization/" >My Storage Decisions presentation</a> on data reduction was hilarious, if I do say so myself, even though turnout was poor at 8:30 AM on Tuesday morning. Maybe it was this “intimate” group, but I found myself really getting into the discussion. And the nods and hollers from the audience helped, too!</p>
<p>My basic thesis at Storage Decisions was the same as always: <strong>Don&#8217;t throw good money at technology that will have little ROI</strong>. Considering that disk capacity is incredibly cheap, and dropping all the time, data reduction doesn&#8217;t look like a great fit except in certain situations. Why spend money to reduce utilization? Why put in the effort when most primary storage data reduction technologies don&#8217;t do anything to address the “multiplier effect” of archiving, DR, and backup storage?</p>
<p>This is not to say that all data reduction technology is worthless. In fact, the free compression and de-duplication built into many SSDs and even some enterprise storage devices make perfect sense. I just don&#8217;t understand spending a bunch of money to address storage capacity when most applications are starved for storage performance.</p>
<blockquote><p>You might like reading my two other posts on the subject from last week:</p>
<ul>
<li><a href="http://searchstorage.techtarget.com/tip/Interest-in-data-reduction-methods-needs-to-keep-pace-with-data-growth#" >Interest in data reduction methods needs to keep pace with data growth</a> (SearchStorage.com)</li>
<li><a href="http://storagecommunity.org/blogs/stephenfoskett/archive/2011/09/07/has-the-time-finally-come-for-data-reduction.aspx" >Has the Time Finally Come for Data Reduction?</a> (The Storage Community, sponsored by IBM)</li>
</ul>
</blockquote>
<h3>You&#8217;re Losing Me</h3>
<p>On the other hand, I do see quite a bit of value in something many people would overlook out of hand: Lossy compression of office files. Every systems administrator knows that end-users do “stupid stuff” like embedding massive photos and videos in PowerPoint presentations and Word documents. But not everyone knows that there are technological means to address this “<a href="http://www.thinkgeek.com/tshirts-apparel/unisex/itdepartment/6692/" >PEBKAC</a>” issue.</p>
<p>Some office applications already automatically reduce the size of embedded content, and operating systems can do the same. One of my more popular blog posts, in fact, is <a href="http://blog.fosketts.net/2008/10/23/reduce-file-size-pdf-mac/" >a technique to create a filter to reduce the size of PDF files in Mac OS X Preview</a>. And the Microsoft “X” Office file formats include lossless compression as well.</p>
<p>An application that recently caught my eye is the <a href="http://balesio.com/fileminimizersuite/eng/index.php" >FILEminimizer Suite</a> by Balesio. This inexpensive application reduces the size of Office and media files while leaving them in their native format. It re-compresses image files, reducing them to optimum size for use in presentations, documents, or printouts. A companion product, <a href="http://balesio.com/fileminimizerserver/eng/index.php" >FILEminimizer Server</a>, can be used on enterprise file servers to perform the same magic across the whole range of users.</p>
<h3>Stephen&#8217;s Stance</h3>
<p>Native Format Optimization (NFO) makes a lot of sense, since it addresses a common user error in a practical way, and allows capacity savings to “trickle-down” to backups, e-mail systems, and archives. But wholesale compression and the duplication of primary storage may not be worth much, especially since the cost of disk keeps dropping dramatically.</p>
<div id="crp_related"><h3>You might also want to read these other posts...</h3><ul><li><a href="http://blog.fosketts.net/2011/05/27/storage-decisions-chicago/"  rel="bookmark" class="crp_title">Storage Decisions Chicago: All About Capacity Optimization</a></li><li><a href="http://blog.fosketts.net/2011/09/02/storage-decisions-york-capacity-optimization/"  rel="bookmark" class="crp_title">Storage Decisions New York: Capacity Optimization</a></li><li><a href="http://blog.fosketts.net/2011/11/07/storage-decisions-san-francisco-2011-optimization-virtualization/"  rel="bookmark" class="crp_title">Storage Decisions San Francisco 2011: Optimization and Virtualization</a></li><li><a href="http://blog.fosketts.net/2008/09/16/deduplication-primary-storage/"  rel="bookmark" class="crp_title">Deduplication Coming to Primary Storage</a></li><li><a href="http://blog.fosketts.net/2011/08/25/pricing-squishy-competition-heats/"  rel="bookmark" class="crp_title">When Pricing Gets Squishy Competition Heats Up</a></li></ul></div><script src="http://feeds.feedburner.com/~s/sfoskett?i=http://blog.fosketts.net/2011/09/22/data-reduction-condensed-version/" type="text/javascript" charset="utf-8"></script><hr />
<p><small>© sfoskett for <a href="http://blog.fosketts.net">Stephen Foskett, Pack Rat</a>, 2011. |
<a href="http://blog.fosketts.net/2011/09/22/data-reduction-condensed-version/">Data Reduction: the Condensed Version</a>
<br/>
This post was categorized as <a href="http://blog.fosketts.net/category/everything/enterprisestorage/" title="View all posts in Enterprise storage" rel="category tag">Enterprise storage</a>, <a href="http://blog.fosketts.net/category/everything/personal/" title="View all posts in Personal" rel="category tag">Personal</a>, <a href="http://blog.fosketts.net/category/everything/virtualstorage/" title="View all posts in Virtual Storage" rel="category tag">Virtual Storage</a>. Each of my categories has its own feed if you'd like to filter out or focus on posts like this.<br/>
</small></p>]]></content:encoded>
			<wfw:commentRss>http://blog.fosketts.net/2011/09/22/data-reduction-condensed-version/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Storage Decisions New York: Capacity Optimization</title>
		<link>http://blog.fosketts.net/2011/09/02/storage-decisions-york-capacity-optimization/</link>
		<comments>http://blog.fosketts.net/2011/09/02/storage-decisions-york-capacity-optimization/#comments</comments>
		<pubDate>Fri, 02 Sep 2011 18:45:05 +0000</pubDate>
		<dc:creator>Stephen</dc:creator>
				<category><![CDATA[Enterprise storage]]></category>
		<category><![CDATA[Personal]]></category>
		<category><![CDATA[Virtual Storage]]></category>
		<category><![CDATA[compression]]></category>
		<category><![CDATA[data management]]></category>
		<category><![CDATA[data reduction]]></category>
		<category><![CDATA[deduplication]]></category>
		<category><![CDATA[Storage Decisions]]></category>
		<category><![CDATA[storage virtualization]]></category>
		<category><![CDATA[TechTarget]]></category>
		<category><![CDATA[tiered storage]]></category>
		<category><![CDATA[volume management]]></category>
		<category><![CDATA[volume manager]]></category>

		<guid isPermaLink="false">http://blog.fosketts.net/?p=6153</guid>
		<description><![CDATA[Later this month, I will be heading to New York for TechTarget's Storage Decisions conference. I will have two presentations on data reduction and storage virtualization in the main conference track. Registration is free for qualified end-users, and I urge you to attend on September 19 and 20, 2011.]]></description>
			<content:encoded><![CDATA[<div id="attachment_6156" class="wp-caption aligncenter" style="width: 410px;  border: 1px solid #dddddd; background-color: #f3f3f3; padding-top: 4px; margin: 10px; text-align:center; display: block; margin-right: auto; margin-left: auto;"><img class="size-full wp-image-6156" title="Storage Decisions Chicago 2011" src="http://static.fosketts.net/wp-content/uploads/2011/09/SD-Chi-11.jpg" alt="" width="400" height="266" /><p style=' padding: 0 4px 5px; margin: 0;'  class="wp-caption-text">Join me in New York for Storage Decisions, September 19 &amp; 20</p></div>
<p>Later this month, I will be heading to New York for <a href="http://storagedecisions.techtarget.com/newyork/index.html" >TechTarget&#8217;s Storage Decisions conference</a>. This show does a good job on the editorial side, suggesting timely topics and bringing in independent voices like Howard Marks. I will have two presentations on data reduction and storage virtualization in the main conference track. <a href="http://registration.techtarget.com/events/register.do?name=storagedecisionsnewyork" >Registration is free</a> for qualified end-users, and I urge you to attend on September 19 and 20, 2011.</p>
<h3>Reclaim Capacity with Data Reduction for Primary Storage</h3>
<blockquote><p>Depending on which industry study you read, most companies are wasting anywhere from 30% to 50% of their installed disk capacity, which translates into thousands of dollars spent with no effective return on investment. Storage vendors are beginning to provide tools that can help storage managers make the most of the disk they have installed. For example, data reduction for primary storage borrows data deduplication technology developed for backup and classic compression algorithms to help squeeze the air out of nearline and primary data and reduce its footprint. This session&#8217;s topics will include an overview of data reduction technologies and where they will have the greatest impact, what key storage vendors are offering in data reduction and an update on the major players, and the consequences of using primary data dedupe along with dedupe for backups. We&#8217;ll also look at the potential for vendor lock-in and consider why we’re reducing data in the first place.</p>
<p>Topics include:</p>
<ul>
<li>Introducing data reduction technologies
<ul>
<li>Compression: How it works and where it’s found</li>
<li>Deduplication: From single-instancing to variable block</li>
<li>Application-specific: Cracking open files</li>
</ul>
</li>
<li>Overview of data reduction products</li>
<li>Where to use them
<ul>
<li>The capacity conundrum: Store less and reduce utilization</li>
<li>Ideal applications: Justifying the cost of data reduction</li>
<li>Side effects: Considering the impact on backup, replication, I/O workload and vendor lock-in</li>
</ul>
</li>
</ul>
</blockquote>
<h3>Storage Virtualization: Who’s Doing It and Why</h3>
<blockquote><p>Storage virtualization has been around for decades and, although research indicates that 70% of companies have already virtualized at least some of their installed block or file storage, most remain unaware of this technology. Grandiose schemes for comprehensive virtual SANs have given way to more practical host- and array-based virtualization technologies, and server virtualization has created a new opportunity to create a pool of storage. This session will look at the current state of storage virtualization, how to quantify its benefits and describe which approaches are best for particular environments, and also cover how storage virtualization compares to private storage clouds.</p>
<p>Topics include:</p>
<ul>
<li>Defining storage virtualization: What it is and where to find it
<ul>
<li>Abstraction of storage resources</li>
<li>Tiered storage</li>
<li>Flexibility</li>
</ul>
</li>
<li>Popular approaches to storage virtualization
<ul>
<li>SAN controllers</li>
<li>File virtualization</li>
<li>Volume managers</li>
</ul>
</li>
<li>The pool, the hypervisor and the cloud
<ul>
<li>The impact of server virtualization</li>
<li>Is this a private cloud?</li>
</ul>
</li>
</ul>
</blockquote>
<h3>Registration</h3>
<div id="attachment_6155" class="wp-caption aligncenter" style="width: 410px;  border: 1px solid #dddddd; background-color: #f3f3f3; padding-top: 4px; margin: 10px; text-align:center; display: block; margin-right: auto; margin-left: auto;"><img class="size-full wp-image-6155" title="Storage Decisions Chicago 2011" src="http://static.fosketts.net/wp-content/uploads/2011/09/SD-Chi-11-2.jpg" alt="" width="400" height="266" /><p style=' padding: 0 4px 5px; margin: 0;'  class="wp-caption-text">You can see the future from here!</p></div>
<p>To register for Storage Decisions New York, just go to <a href="http://registration.techtarget.com/events/register.do?name=storagedecisionsnewyork" >the TechTarget registration page</a>.</p>
<p>Disclosure: TechTarget pays my expenses to attend and present at Storage Decisions, and has for many years. But they don&#8217;t pay me to present and I own the copyright on my session content. Happily, I license it all <a href="http://creativecommons.org/licenses/by-nc-sa/3.0/" >CC-by-NC-SA</a> so I can give it out freely!</p>
<div id="crp_related"><h3>You might also want to read these other posts...</h3><ul><li><a href="http://blog.fosketts.net/2011/11/07/storage-decisions-san-francisco-2011-optimization-virtualization/"  rel="bookmark" class="crp_title">Storage Decisions San Francisco 2011: Optimization and Virtualization</a></li><li><a href="http://blog.fosketts.net/2011/05/27/storage-decisions-chicago/"  rel="bookmark" class="crp_title">Storage Decisions Chicago: All About Capacity Optimization</a></li><li><a href="http://blog.fosketts.net/2011/05/17/5477/"  rel="bookmark" class="crp_title"></a></li><li><a href="http://blog.fosketts.net/2011/09/22/data-reduction-condensed-version/"  rel="bookmark" class="crp_title">Data Reduction: the Condensed Version</a></li><li><a href="http://blog.fosketts.net/about/stephen-foskett/speaking-engagements/"  rel="bookmark" class="crp_title">Speaking Engagements</a></li></ul></div><script src="http://feeds.feedburner.com/~s/sfoskett?i=http://blog.fosketts.net/2011/09/02/storage-decisions-york-capacity-optimization/" type="text/javascript" charset="utf-8"></script><hr />
<p><small>© sfoskett for <a href="http://blog.fosketts.net">Stephen Foskett, Pack Rat</a>, 2011. |
<a href="http://blog.fosketts.net/2011/09/02/storage-decisions-york-capacity-optimization/">Storage Decisions New York: Capacity Optimization</a>
<br/>
This post was categorized as <a href="http://blog.fosketts.net/category/everything/enterprisestorage/" title="View all posts in Enterprise storage" rel="category tag">Enterprise storage</a>, <a href="http://blog.fosketts.net/category/everything/personal/" title="View all posts in Personal" rel="category tag">Personal</a>, <a href="http://blog.fosketts.net/category/everything/virtualstorage/" title="View all posts in Virtual Storage" rel="category tag">Virtual Storage</a>. Each of my categories has its own feed if you'd like to filter out or focus on posts like this.<br/>
</small></p>]]></content:encoded>
			<wfw:commentRss>http://blog.fosketts.net/2011/09/02/storage-decisions-york-capacity-optimization/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Storage Decisions Chicago: All About Capacity Optimization</title>
		<link>http://blog.fosketts.net/2011/05/27/storage-decisions-chicago/</link>
		<comments>http://blog.fosketts.net/2011/05/27/storage-decisions-chicago/#comments</comments>
		<pubDate>Fri, 27 May 2011 19:05:54 +0000</pubDate>
		<dc:creator>Stephen</dc:creator>
				<category><![CDATA[Enterprise storage]]></category>
		<category><![CDATA[Personal]]></category>
		<category><![CDATA[Virtual Storage]]></category>
		<category><![CDATA[compression]]></category>
		<category><![CDATA[data management]]></category>
		<category><![CDATA[data reduction]]></category>
		<category><![CDATA[deduplication]]></category>
		<category><![CDATA[Dell]]></category>
		<category><![CDATA[Storage Decisions]]></category>
		<category><![CDATA[storage virtualization]]></category>
		<category><![CDATA[TechTarget]]></category>
		<category><![CDATA[tiered storage]]></category>
		<category><![CDATA[volume management]]></category>
		<category><![CDATA[volume manager]]></category>

		<guid isPermaLink="false">http://blog.fosketts.net/?p=5548</guid>
		<description><![CDATA[Next month, I will be heading to Chicago for TechTarget's Storage Decisions conference. This show does a good job on the editorial side, suggesting timely topics and bringing in independent voices like Howard Marks. I will have three presentations to give: Sessions on data reduction and storage virtualization in the main conference track, as well as a dinner discussion focusing on controlling the growth of data. Registration is free for qualified end-users, and I urge you to attend.]]></description>
			<content:encoded><![CDATA[<div id="attachment_1093" class="wp-caption aligncenter" style="width: 236px;  border: 1px solid #dddddd; background-color: #f3f3f3; padding-top: 4px; margin: 10px; text-align:center; display: block; margin-right: auto; margin-left: auto;"><a href="http://blog.fosketts.net/wp-content/uploads/2008/11/img_0028.jpg" ><img class="size-medium wp-image-1093" title="Storage Decisions" src="http://blog.fosketts.net/wp-content/uploads/2008/11/img_0028-226x300.jpg" alt="" width="226" height="300" /></a><p style=' padding: 0 4px 5px; margin: 0;'  class="wp-caption-text">Join me in Chicago for Storage Decisions, June 21</p></div>
<p>Next month, I will be heading to Chicago for <a href="http://storagedecisions.techtarget.com/chicago/index.html?Offer=Foskett" >TechTarget&#8217;s Storage Decisions conference</a>. This show does a good job on the editorial side, suggesting timely topics and bringing in independent voices like Howard Marks. I will have three presentations to give: Sessions on data reduction and storage virtualization in the main conference track, as well as a dinner discussion focusing on controlling the growth of data. <a href="http://registration.techtarget.com/events/register.do?name=storagedecisionschicago&amp;offer=Foskett" >Registration is free</a> for qualified end-users, and I urge you to attend on June 21, 2011.</p>
<h3>Reclaim Capacity with Data Reduction for Primary Storage</h3>
<blockquote><p>Depending on which industry study you read, most companies are wasting anywhere from 30% to 50% of their installed disk capacity, which translates into thousands of dollars spent with no effective return on investment. Storage vendors are beginning to provide tools that can help storage managers make the most of the disk they have installed. For example, data reduction for primary storage borrows data deduplication technology developed for backup and classic compression algorithms to help squeeze the air out of nearline and primary data and reduce its footprint. This session&#8217;s topics will include an overview of data reduction technologies and where they will have the greatest impact, what key storage vendors are offering in data reduction and an update on the major players, and the consequences of using primary data dedupe along with dedupe for backups. We&#8217;ll also look at the potential for vendor lock-in and consider why we’re reducing data in the first place.</p>
<p>Topics include:</p>
<ul>
<li>Introducing data reduction technologies
<ul>
<li>Compression: How it works and where it’s found</li>
<li>Deduplication: From single-instancing to variable block</li>
<li>Application-specific: Cracking open files</li>
</ul>
</li>
<li>Overview of data reduction products</li>
<li>Where to use them
<ul>
<li>The capacity conundrum: Store less and reduce utilization</li>
<li>Ideal applications: Justifying the cost of data reduction</li>
<li>Side effects: Considering the impact on backup, replication, I/O workload and vendor lock-in</li>
</ul>
</li>
</ul>
</blockquote>
<h3>Storage Virtualization: Who’s Doing It and Why</h3>
<blockquote><p>Storage virtualization has been around for decades and, although research indicates that 70% of companies have already virtualized at least some of their installed block or file storage, most remain unaware of this technology. Grandiose schemes for comprehensive virtual SANs have given way to more practical host- and array-based virtualization technologies, and server virtualization has created a new opportunity to create a pool of storage. This session will look at the current state of storage virtualization, how to quantify its benefits and describe which approaches are best for particular environments, and also cover how storage virtualization compares to private storage clouds.</p>
<p>Topics include:</p>
<ul>
<li>Defining storage virtualization: What it is and where to find it
<ul>
<li>Abstraction of storage resources</li>
<li>Tiered storage</li>
<li>Flexibility</li>
</ul>
</li>
<li>Popular approaches to storage virtualization
<ul>
<li>SAN controllers</li>
<li>File virtualization</li>
<li>Volume managers</li>
</ul>
</li>
<li>The pool, the hypervisor and the cloud
<ul>
<li>The impact of server virtualization</li>
<li>Is this a private cloud?</li>
</ul>
</li>
</ul>
</blockquote>
<h3>Cutting Off Data Growth at the Disk</h3>
<blockquote><p>In this special dinner presentation, Stephen Foskett will discuss how to apply key data management technologies to arrest the growth of data. You’ll learn how capacity optimization technologies such as data deduplication and compression can reduce the trajectory of data growth as well as how tiering can reduce the cost of storage. Finally, Stephen will explore why the time may have finally come for active archiving and will leave you with practical ways to help your corporation better manage its data.</p></blockquote>
<p>Note that space is limited for the dinner, which is sponsored by my friends at Dell.</p>
<h3>Registration</h3>
<p>To register for Storage Decisions Chicago, just go to <a href="http://registration.techtarget.com/events/register.do?name=storagedecisionschicago&amp;offer=Foskett" >the TechTarget registration page</a>. Dinner guests will apparently be selected from that same pool of attendees.</p>
<blockquote><p>Disclosure: TechTarget pays my expenses to attend and present at Storage Decisions, and has for many years. I also get a speaker fee for the dinner session.</p></blockquote>
<div id="crp_related"><h3>You might also want to read these other posts...</h3><ul><li><a href="http://blog.fosketts.net/2011/09/02/storage-decisions-york-capacity-optimization/"  rel="bookmark" class="crp_title">Storage Decisions New York: Capacity Optimization</a></li><li><a href="http://blog.fosketts.net/2011/11/07/storage-decisions-san-francisco-2011-optimization-virtualization/"  rel="bookmark" class="crp_title">Storage Decisions San Francisco 2011: Optimization and Virtualization</a></li><li><a href="http://blog.fosketts.net/2011/05/17/5477/"  rel="bookmark" class="crp_title"></a></li><li><a href="http://blog.fosketts.net/2011/09/22/data-reduction-condensed-version/"  rel="bookmark" class="crp_title">Data Reduction: the Condensed Version</a></li><li><a href="http://blog.fosketts.net/2008/04/10/chicago-in-may-perfect-for-storage-virtualization-and-email-archiving-talks/"  rel="bookmark" class="crp_title">Chicago in May?  Perfect for Storage Virtualization and Email Archiving Talks!</a></li></ul></div><script src="http://feeds.feedburner.com/~s/sfoskett?i=http://blog.fosketts.net/2011/05/27/storage-decisions-chicago/" type="text/javascript" charset="utf-8"></script><hr />
<p><small>© sfoskett for <a href="http://blog.fosketts.net">Stephen Foskett, Pack Rat</a>, 2011. |
<a href="http://blog.fosketts.net/2011/05/27/storage-decisions-chicago/">Storage Decisions Chicago: All About Capacity Optimization</a>
<br/>
This post was categorized as <a href="http://blog.fosketts.net/category/everything/enterprisestorage/" title="View all posts in Enterprise storage" rel="category tag">Enterprise storage</a>, <a href="http://blog.fosketts.net/category/everything/personal/" title="View all posts in Personal" rel="category tag">Personal</a>, <a href="http://blog.fosketts.net/category/everything/virtualstorage/" title="View all posts in Virtual Storage" rel="category tag">Virtual Storage</a>. Each of my categories has its own feed if you'd like to filter out or focus on posts like this.<br/>
</small></p>]]></content:encoded>
			<wfw:commentRss>http://blog.fosketts.net/2011/05/27/storage-decisions-chicago/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>IBM&#8217;s Storwize V7000: 100% SVC; 0% Storwize</title>
		<link>http://blog.fosketts.net/2010/10/07/ibm-storwize-v7000-svc/</link>
		<comments>http://blog.fosketts.net/2010/10/07/ibm-storwize-v7000-svc/#comments</comments>
		<pubDate>Thu, 07 Oct 2010 21:16:34 +0000</pubDate>
		<dc:creator>Stephen</dc:creator>
				<category><![CDATA[Enterprise storage]]></category>
		<category><![CDATA[Everything]]></category>
		<category><![CDATA[Gestalt IT]]></category>
		<category><![CDATA[compression]]></category>
		<category><![CDATA[data deduplication]]></category>
		<category><![CDATA[IBM]]></category>
		<category><![CDATA[scale-out]]></category>
		<category><![CDATA[Storwize]]></category>
		<category><![CDATA[SVC]]></category>
		<category><![CDATA[Tony Pearson]]></category>
		<category><![CDATA[V7000]]></category>

		<guid isPermaLink="false">http://blog.fosketts.net/?p=3852</guid>
		<description><![CDATA[Today, IBM alerted the world that they had not fallen asleep at the wheel by kicking out an awfully-impressive midrange storage array, the Storwize V7000. This seems like an excellent device, filled with proven engineering borrowed from the successful SAN Volume Controller (SVC) line of storage virtualization products. But closer examination (and IBM's own Tony Pearson) reveal that it contains exactly nothing from their Storwize acquisition apart from the name.]]></description>
			<content:encoded><![CDATA[<p>Today, IBM alerted the world that they had not fallen asleep at the wheel by kicking out an awfully-impressive midrange storage array, the Storwize V7000. This seems like an excellent device, filled with proven engineering borrowed from the successful SAN Volume Controller (SVC) line of storage virtualization products. But closer examination (and IBM&#8217;s own <a href="http://twitter.com/az990tony/status/26653205787"  target="_blank">Tony Pearson</a>) reveal that it contains exactly nothing from their Storwize acquisition apart from the name.</p>
<h3>SVC 6.1 + Disk Hardware = V7000</h3>
<p>Let&#8217;s get one thing out of the way immediately: As I&#8217;ve said many times (including on stage at Storage Decisions last month), SVC is about the only IBM storage product I genuinely like. Its well-engineered, useful, and performs well. It&#8217;s just too bad its native habitat is a jungle of weird and expensive IBM gear.</p>
<p>SVC is really an enterprise storage array without any disks, just as HDS&#8217; USP VSP is a storage virtualization engine with disks. It does all sorts of great things, from thin provisioning to replication to automatic tiered storage to painless migration (once you&#8217;re migrated to it, at least). Fibre Channel comes in, magic happens, and Fibre Channel comes out. And it runs on commodity servers, which surely gives IBM a healthy profit margin but doesn&#8217;t seem to translate into lower cost for customers.</p>
<div id="attachment_3855" class="wp-caption aligncenter" style="width: 236px;  border: 1px solid #dddddd; background-color: #f3f3f3; padding-top: 4px; margin: 10px; text-align:center; display: block; margin-right: auto; margin-left: auto;"><a href="http://static.fosketts.net/wp-content/uploads/2010/10/v700-parentage4.png" ><img class="size-medium wp-image-3855" title="v700-parentage4" src="http://static.fosketts.net/wp-content/uploads/2010/10/v700-parentage4-226x300.png" alt="" width="226" height="300" /></a><p style=' padding: 0 4px 5px; margin: 0;'  class="wp-caption-text">Green = SVC 5; Pink = SVC 6.1. No Storwize.</p></div>
<p>The new Storwize V7000 is <a href="https://www.ibm.com/developerworks/mydeveloperworks/blogs/storagevirtualization/?lang=en"  target="_blank">essentially</a> the SVC software running on <a href="https://www.ibm.com/developerworks/mydeveloperworks/blogs/InsideSystemStorage/?lang=en"  target="_blank">server hardware</a> that includes both dual controllers and a bunch of internal hard disk drives. This can connect to up to nine &#8220;dumb&#8221; expansion storage enclosures. Hardware-wise, it&#8217;s very like the typical midrange <a href="http://www.thestoragearchitect.com/2010/08/24/choosing-between-monolithic-and-modular-architectures-part-i/"  target="_blank">modular</a> storage systems sold by EMC (CLARiiON), HP (EVA), HDS (AMS), and NetApp.</p>
<p>Software-wise <a rel="nofollow" href="http://storagebuddhist.wordpress.com/2010/10/07/ibms-new-midrange-v7000-with-easy-tier-external-virtualization/"  target="_blank">the V7000 is all SVC</a>. Much of the software is directly derived from SVC 5.1 (green stuff in IBM&#8217;s diagram), while some new tech is mixed in, too. But pretty much everything (green, blue, pink) is shared with SVC 6.1 other than the hardware. It&#8217;s just incredible what advanced software running on commodity hardware can do, and IBM is right up there with folks like HP and EMC who are adopting this engineering model.</p>
<h3>Where&#8217;s the Storwize?</h3>
<p>Then there&#8217;s that name. This isn&#8217;t just the V7000, it&#8217;s the <a href="http://www-03.ibm.com/systems/storage/disk/storwize_v7000/index.html"  target="_blank">Storwize V7000</a>. When I heard the name, I was expecting that it would include some data reduction/optimization/compression/whatever technology from Storwize, the company IBM <a href="http://www.networkcomputing.com/deduplication/ibm-acquires-storwize.php"  target="_blank">acquired</a> in July. This would match EMC&#8217;s acquisition of Data Domain, Dell&#8217;s buy of Ocarina, and HP&#8217;s rollout of their cool StorOnce software.</p>
<p>But there&#8217;s no Storwize in the V7000 apart from the name. This is a straight-ahead midrange storage system with no special bit-crunching powers apart from the thin provisioning already offered by SVC. I asked the IBM folks about this, and they confirmed that they needed a name and thought Storwize was fitting.</p>
<div id="attachment_3856" class="wp-caption aligncenter" style="width: 310px;  border: 1px solid #dddddd; background-color: #f3f3f3; padding-top: 4px; margin: 10px; text-align:center; display: block; margin-right: auto; margin-left: auto;"><a href="http://static.fosketts.net/wp-content/uploads/2010/10/Screen-shot-2010-10-07-at-4.35.48-PM.png" ><img class="size-medium wp-image-3856" title="Screen shot 2010-10-07 at 4.35.48 PM" src="http://static.fosketts.net/wp-content/uploads/2010/10/Screen-shot-2010-10-07-at-4.35.48-PM-300x144.png" alt="" width="300" height="144" /></a><p style=' padding: 0 4px 5px; margin: 0;'  class="wp-caption-text">Right from the horse&#39;s mouth. No Storwize software here (yet).</p></div>
<p><strong>Stephen&#8217;s Stance</strong></p>
<p>With everyone and their brother (well, EMC, HP, Dell, and NetApp) rolling out primary storage deduplication, I expect this situation will change. Perhaps &#8220;Storwize&#8221; will become the IBM equivalent of &#8220;StorageWorks&#8221; &#8211; sprayed across every product. Or maybe it will become IBM&#8217;s midrange brand. But sooner or later I expect IBM will include their compression technology, too (I dare not call it &#8220;data reduction&#8221; or face <a href="http://twitter.com/az990tony/status/26653737309"  target="_blank">The Wrath of Tony</a>).</p>
<p>So the Storwize V7000 is a really nice midrange product built on proven software and ought to compete nicely with EMC, HP, and HDS. It&#8217;s maybe even a little better than the competing modular storage products. My interest would be piqued, however, by news of a larger scale-out cluster of V7000 systems. The SVC can already scale out like this, with 4-pair I/O groups.</p>
<p>But even without compression and scale-out, I could see myself recommending the V7000 to midrange storage buyers. Good work, IBM! Now, let&#8217;s talk about the rest of your storage products&#8230;</p>
<p><em>V7000 Diagram courtesy of IBM</em></p>
<div id="crp_related"><h3>You might also want to read these other posts...</h3><ul><li><a href="http://blog.fosketts.net/2011/05/09/ibm-adds-vaai-support-xiv-svc/"  rel="bookmark" class="crp_title">IBM Adds VAAI Support to XIV and SVC</a></li><li><a href="http://blog.fosketts.net/2010/10/17/back-from-the-pile-interesting-links-october-17-2010/"  rel="bookmark" class="crp_title">Back From the Pile: Interesting Links,  October 17, 2010</a></li><li><a href="http://blog.fosketts.net/2011/02/08/vmware-vaai-storage-array-support-plain-english/"  rel="bookmark" class="crp_title">VMware VAAI Storage Array Support in Plain English</a></li><li><a href="http://blog.fosketts.net/2010/09/29/hp-product-line-decoder-ring/"  rel="bookmark" class="crp_title">Stephen&#8217;s HP Product Line Decoder Ring</a></li><li><a href="http://blog.fosketts.net/2010/08/23/3par-bidding-war/"  rel="bookmark" class="crp_title">Everyone Loves 3Par &#8211; Here&#8217;s Why!</a></li></ul></div><script src="http://feeds.feedburner.com/~s/sfoskett?i=http://blog.fosketts.net/2010/10/07/ibm-storwize-v7000-svc/" type="text/javascript" charset="utf-8"></script><hr />
<p><small>© sfoskett for <a href="http://blog.fosketts.net">Stephen Foskett, Pack Rat</a>, 2010. |
<a href="http://blog.fosketts.net/2010/10/07/ibm-storwize-v7000-svc/">IBM&#8217;s Storwize V7000: 100% SVC; 0% Storwize</a>
<br/>
This post was categorized as <a href="http://blog.fosketts.net/category/everything/enterprisestorage/" title="View all posts in Enterprise storage" rel="category tag">Enterprise storage</a>, <a href="http://blog.fosketts.net/category/everything/" title="View all posts in Everything" rel="category tag">Everything</a>, <a href="http://blog.fosketts.net/category/gestaltit/" title="View all posts in Gestalt IT" rel="category tag">Gestalt IT</a>. Each of my categories has its own feed if you'd like to filter out or focus on posts like this.<br/>
</small></p>]]></content:encoded>
			<wfw:commentRss>http://blog.fosketts.net/2010/10/07/ibm-storwize-v7000-svc/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Bizarre HFS+ Tricks in Mac OS X 10.6 Snow Leopard</title>
		<link>http://blog.fosketts.net/2009/09/11/bizarre-hfs-tricks-in-mac-os-x-10-6-snow-leopard/</link>
		<comments>http://blog.fosketts.net/2009/09/11/bizarre-hfs-tricks-in-mac-os-x-10-6-snow-leopard/#comments</comments>
		<pubDate>Fri, 11 Sep 2009 15:27:24 +0000</pubDate>
		<dc:creator>Stephen</dc:creator>
				<category><![CDATA[Apple]]></category>
		<category><![CDATA[Computer History]]></category>
		<category><![CDATA[Ars Technica]]></category>
		<category><![CDATA[compression]]></category>
		<category><![CDATA[filesystem]]></category>
		<category><![CDATA[HFS]]></category>
		<category><![CDATA[John Siracusa]]></category>
		<category><![CDATA[Mac OS X]]></category>
		<category><![CDATA[Snow Leopard]]></category>

		<guid isPermaLink="false">http://blog.fosketts.net/?p=2302</guid>
		<description><![CDATA[I don&#8217;t usually excerpt large amounts of text from other blogs. But this is just too cool. UNIX nerds and Mac OS X weenies alike will either shake their heads and jump out a window or laugh out loud at one of the under-reported changes in Snow Leopard. See, Snow Leopard&#8217;s version of HFS+ allows [...]]]></description>
			<content:encoded><![CDATA[<p>I don&#8217;t usually excerpt large amounts of text from other blogs. But this is just too cool. UNIX nerds and Mac OS X weenies alike will either shake their heads and jump out a window or laugh out loud at one of the under-reported changes in Snow Leopard.</p>
<p>See, Snow Leopard&#8217;s version of HFS+ allows per-file compression using three very creative filesystem hacks. <span id="more-2302"></span>I&#8217;ll let <a href="http://arstechnica.com/apple/reviews/2009/08/mac-os-x-10-6.ars/3#"  target="_blank">John Siracusa from Ars Technica</a> take the story from here, and I urge you to read John&#8217;s complete (and very, very long) <a href="http://arstechnica.com/apple/reviews/2009/08/mac-os-x-10-6.ars/"  target="_blank">Snow Leopard review</a>!</p>
<blockquote><p>In Snow Leopard, other kinds of files climb on board the compression bandwagon. To give just one example, ninety-seven percent of the executable files in Snow Leopard are compressed. How compressed? Let&#8217;s look:</p>
<p>% cd Applications/Mail.app/Contents/MacOS</p>
<p>% ls -l Mail</p>
<p>-rwxr-xr-x@ 1 root  wheel  0 Jun 18 19:35 Mail</p>
<p>Boy, that&#8217;s, uh, pretty small, huh? Is this really an executable or what? Let&#8217;s check our assumptions.</p>
<p>% file Applications/Mail.app/Contents/MacOS/Mail</p>
<p>Applications/Mail.app/Contents/MacOS/Mail: empty</p>
<p>Yikes! What&#8217;s going on here? Well, what I didn&#8217;t tell you is that the commands shown above were run from a Leopard system looking at a Snow Leopard disk. In fact, all compressed Snow Leopard files appear to contain zero bytes when viewed from a pre-Snow Leopard version of Mac OS X. (They look and act perfectly normal when booted into Snow Leopard, of course.)</p>
<p>So, where&#8217;s the data? The little &#8220;@&#8221; at the end of the permissions string in the ls output above (a feature introduced in Leopard) provides a clue. Though the Mail executable has a zero file size, it does have some extended attributes:</p>
<p>% xattr -l Applications/Mail.app/Contents/MacOS/Mail</p>
<p>com.apple.ResourceFork:</p>
<p>0000     00 00 01 00 00 2C F5 F2 00 2C F4 F2 00 00 00 32    &#8230;..,&#8230;,&#8230;..2</p>
<p>0010     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    &#8230;&#8230;&#8230;&#8230;&#8230;.</p>
<p>(184,159 lines snipped)</p>
<p>2CF610   63 6D 70 66 00 00 00 0A 00 01 FF FF 00 00 00 00    cmpf&#8230;&#8230;&#8230;&#8230;</p>
<p>2CF620   00 00 00 00                                        &#8230;.</p>
<p>com.apple.decmpfs:</p>
<p>0000   66 70 6D 63 04 00 00 00 A0 82 72 00 00 00 00 00    fpmc&#8230;&#8230;r&#8230;..</p>
<p>Ah, there&#8217;s all the data. But wait, it&#8217;s in the resource fork? Weren&#8217;t those deprecated about eight years ago? Indeed they were. What you&#8217;re witnessing here is yet another addition to Apple&#8217;s favorite file system hobbyhorse, HFS+.</p>
<p>At the dawn of Mac OS X, Apple added journaling, symbolic links, and hard links. In Tiger, extended attributes and access control lists were incorporated. In Leopard, HFS+ gained support for hard links to directories. In Snow Leopard, HFS+ learns another new trick: per-file compression.</p>
<p>The presence of the com.apple.decmpfs attribute is the first hint that this file is compressed. This attribute is actually hidden from the xattr command when booted into Snow Leopard. But from a Leopard system, which has no knowledge of its special significance, it shows up as plain as day.</p>
<p>Even more information is revealed with the help of Mac OS X Internals guru Amit Singh&#8217;s hfsdebug program, which has quietly been updated for Snow Leopard.</p>
<p>% hfsdebug /Applications/Mail.app/Contents/MacOS/Mail</p>
<p>&#8230;</p>
<p>compression magic    = cmpf</p>
<p>compression type     = 4 (resource fork has compressed data)</p>
<p>uncompressed size    = 7500336 bytes</p>
<p>And sure enough, as we saw, the resource fork does indeed contain the compressed data. Still, why the resource fork? It&#8217;s all part of Apple&#8217;s usual, clever backward-compatibility gymnastics. A recent example is the way that hard links to directories show up—and function—as aliases when viewed from a pre-Leopard version of Mac OS X.</p>
<p>In the case of a HFS+ compression, Apple was (understandably) unable to make pre-Snow Leopard systems read and interpret the compressed data, which is stored in ways that did not exist at the time those earlier operating systems were written. But rather than letting applications (and users) running on pre-10.6 systems choke on—or worse, corrupt through modification—the unexpectedly compressed file contents, Apple has chosen to hide the compressed data instead.</p>
<p>And where can the complete contents of a potentially large file be hidden in such a way that pre-Snow Leopard systems can still copy that file without the loss of data? Why, in the resource fork, of course. The Finder has always correctly preserved Mac-specific metadata and both the resource and data forks when moving or duplicating files. In Leopard, even the lowly cp and rsync commands will do the same. So while it may be a little bit spooky to see all those &#8220;empty&#8221; 0 KB files when looking at a Snow Leopard disk from a pre-Snow Leopard OS, the chance of data loss is small, even if you move or copy one of the files.</p>
<p>The resource fork isn&#8217;t the only place where Apple has decided to smuggle compressed data. For smaller files, hfsdebug shows the following:</p>
<p>% hfsdebug /etc/asl.conf</p>
<p>&#8230;</p>
<p>compression magic    = cmpf</p>
<p>compression type     = 3 (xattr has compressed data)</p>
<p>uncompressed size    = 860 bytes</p>
<p>Here, the data is small enough to be stored entirely within an extended attribute, albeit in compressed form. And then, the final frontier:</p>
<p>% hfsdebug /Volumes/Snow Time/Applications/Mail.app/Contents/PkgInfo</p>
<p>&#8230;</p>
<p>compression magic    = cmpf</p>
<p>compression type     = 3 (xattr has inline data)</p>
<p>uncompressed size    = 8 bytes</p>
<p>That&#8217;s right, an entire file&#8217;s contents stored uncompressed in an extended attribute. In the case of a standard PkgInfo file like this one, those contents are the four-byte classic Mac OS type and creator codes.</p>
<p>% xattr -l Applications/Mail.app/Contents/PkgInfo</p>
<p>com.apple.decmpfs:</p>
<p>0000   66 70 6D 63 03 00 00 00 08 00 00 00 00 00 00 00    fpmc&#8230;&#8230;&#8230;&#8230;</p>
<p>0010   FF 41 50 50 4C 65 6D 61 6C                         .APPLemal</p>
<p>There&#8217;s still the same &#8220;fpmc&#8230;&#8221; preamble seen in all the earlier examples of the com.apple.decmpfs attribute, but at the end of the value, the expected data appears as plain as day: type code &#8220;APPL&#8221; (application) and creator code &#8220;emal&#8221; (for the Mail application—cute, as per classic Mac OS tradition).</p>
<p>You may be wondering, if this is all about data compression, how does storing eight uncompressed bytes plus a 17-byte preamble in an extended attribute save any disk space? The answer to that lies in how HFS+ allocates disk space. When storing information in a data or resource fork, HFS+ allocates space in multiples of the file system&#8217;s allocation block size (4 KB, by default). So those eight bytes will take up a minimum of 4,096 bytes if stored in the traditional way. When allocating disk space for extended attributes, however, the allocation block size is not a factor; the data is packed in much more tightly. In the end, the actual space saved by storing those 25 bytes of data in an extended attribute is over 4,000 bytes.</p>
<p>But compression isn&#8217;t just about saving disk space. It&#8217;s also a classic example of trading CPU cycles for decreased I/O latency and bandwidth. Over the past few decades, CPU performance has gotten better (and computing resources more plentiful—more on that later) at a much faster rate than disk performance has increased. Modern hard disk seek times and rotational delays are still measured in milliseconds. In one millisecond, a 2 GHz CPU goes through two million cycles. And then, of course, there&#8217;s still the actual data transfer time to consider.</p>
<p>Granted, several levels of caching throughout the OS and hardware work mightily to hide these delays. But those bits have to come off the disk at some point to fill those caches. Compression means that fewer bits have to be transferred. Given the almost comical glut of CPU resources on a modern multi-core Mac under normal use, the total time needed to transfer a compressed payload from the disk and use the CPU to decompress its contents into memory will still usually be far less than the time it&#8217;d take to transfer the data in uncompressed form.</p>
<p>That explains the potential performance benefits of transferring less data, but the use of extended attributes to store file contents can actually make things faster, as well. It all has to do with data locality.</p>
<p>If there&#8217;s one thing that slows down a hard disk more than transferring a large amount of data, it&#8217;s moving its heads from one part of the disk to another. Every move means time for the head to start moving, then stop, then ensure that it&#8217;s correctly positioned over the desired location, then wait for the spinning disk to put the desired bits beneath it. These are all real, physical, moving parts, and it&#8217;s amazing that they do their dance as quickly and efficiently as they do, but physics has its limits. These motions are the real performance killers for rotational storage like hard disks.</p>
<p>The HFS+ volume format stores all its information about files—metadata—in two primary locations on disk: the Catalog File, which stores file dates, permissions, ownership, and a host of other things, and the Attributes File, which stores &#8220;named forks.&#8221;</p>
<p>Extended attributes in HFS+ are implemented as named forks in the Attributes File. But unlike resource forks, which can be very large (up to the maximum file size supported by the file system), extended attributes in HFS+ are stored &#8220;inline&#8221; in the Attributes File. In practice, this means a limit of about 128 bytes per attribute. But it also means that the disk head doesn&#8217;t need to take a trip to another part of the disk to get the actual data.</p>
<p>As you can imagine, the disk blocks that make up the Catalog and Attributes files are frequently accessed, and therefore more likely than most to be in a cache somewhere. All of this conspires to make the complete storage of a file, including both its metadata in its data, within the B-tree-structured Catalog and Attributes files an overall performance win. Even an eight-byte payload that balloons to 25 bytes is not a concern, as long as it&#8217;s still less than the allocation block size for normal data storage, and as long as it all fits within a B-tree node in the Attributes File that the OS has to read in its entirety anyway.</p>
<p>There are other significant contributions to Snow Leopard&#8217;s reduced disk footprint (e.g., the removal of unnecessary localizations and &#8220;designable.nib&#8221; files) but HFS+ compression is by far the most technically interesting.</p>
<p>via <a href="http://arstechnica.com/apple/reviews/2009/08/mac-os-x-10-6.ars/3#" >Mac OS X 10.6 Snow Leopard: the Ars Technica review &#8211; Ars Technica</a>.</p>
</blockquote>
<div id="crp_related"><h3>You might also want to read these other posts...</h3><ul><li><a href="http://blog.fosketts.net/2010/07/26/boot-snow-leopard-64bit-mode/"  rel="bookmark" class="crp_title">How To Boot Snow Leopard in 64-Bit Mode</a></li><li><a href="http://blog.fosketts.net/2009/06/09/snow-leopard-storage/"  rel="bookmark" class="crp_title">Snow Leopard Is Stingy With The Storage Love</a></li><li><a href="http://blog.fosketts.net/2009/08/30/64bit-snow-leopard-kernel/"  rel="bookmark" class="crp_title">No 64-Bit Snow Leopard Kernel For You!</a></li><li><a href="http://blog.fosketts.net/2009/08/30/snow-leopard-hp-printer-driver/"  rel="bookmark" class="crp_title">Located! Missing HP Printer Driver For Snow Leopard</a></li><li><a href="http://blog.fosketts.net/2008/09/03/os-x-custom-drive-icons-2-boot-camp-ntfs/"  rel="bookmark" class="crp_title">OS X Custom Drive Icons 2: Boot Camp and NTFS</a></li></ul></div><script src="http://feeds.feedburner.com/~s/sfoskett?i=http://blog.fosketts.net/2009/09/11/bizarre-hfs-tricks-in-mac-os-x-10-6-snow-leopard/" type="text/javascript" charset="utf-8"></script><hr />
<p><small>© sfoskett for <a href="http://blog.fosketts.net">Stephen Foskett, Pack Rat</a>, 2009. |
<a href="http://blog.fosketts.net/2009/09/11/bizarre-hfs-tricks-in-mac-os-x-10-6-snow-leopard/">Bizarre HFS+ Tricks in Mac OS X 10.6 Snow Leopard</a>
<br/>
This post was categorized as <a href="http://blog.fosketts.net/category/everything/apple/" title="View all posts in Apple" rel="category tag">Apple</a>, <a href="http://blog.fosketts.net/category/everything/computerhistory/" title="View all posts in Computer History" rel="category tag">Computer History</a>. Each of my categories has its own feed if you'd like to filter out or focus on posts like this.<br/>
</small></p>]]></content:encoded>
			<wfw:commentRss>http://blog.fosketts.net/2009/09/11/bizarre-hfs-tricks-in-mac-os-x-10-6-snow-leopard/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Compression, Encryption, Deduplication, and Replication: Strange Bedfellows</title>
		<link>http://blog.fosketts.net/2009/02/05/compression-encryption-deduplication-replication/</link>
		<comments>http://blog.fosketts.net/2009/02/05/compression-encryption-deduplication-replication/#comments</comments>
		<pubDate>Fri, 06 Feb 2009 00:11:30 +0000</pubDate>
		<dc:creator>Stephen</dc:creator>
				<category><![CDATA[Enterprise storage]]></category>
		<category><![CDATA[Personal]]></category>
		<category><![CDATA[.Mac]]></category>
		<category><![CDATA[backup]]></category>
		<category><![CDATA[compression]]></category>
		<category><![CDATA[data backup]]></category>
		<category><![CDATA[deduplication]]></category>
		<category><![CDATA[encryption]]></category>
		<category><![CDATA[gzip]]></category>
		<category><![CDATA[Macports]]></category>
		<category><![CDATA[Ocarina]]></category>
		<category><![CDATA[rsync]]></category>
		<category><![CDATA[rsyncrypto]]></category>
		<category><![CDATA[Samba]]></category>
		<category><![CDATA[security]]></category>

		<guid isPermaLink="false">http://blog.fosketts.net/?p=1396</guid>
		<description><![CDATA[One of the great ironies of storage technology is the inverse relationship between efficiency and security: Adding performance or reducing storage requirements almost always results in reducing the confidentiality, integrity, or availability of a system. Many of the advances in capacity utilization put into production over the last few years rely on deduplication of data. [...]]]></description>
			<content:encoded><![CDATA[<div id="attachment_1397" class="wp-caption alignright" style="width: 310px;  border: 1px solid #dddddd; background-color: #f3f3f3; padding-top: 4px; margin: 10px; text-align:center; float: right;"><a href="http://blog.fosketts.net/wp-content/uploads/2009/02/compact.jpg" ><img class="size-medium wp-image-1397" title="compact" src="http://blog.fosketts.net/wp-content/uploads/2009/02/compact-300x65.jpg" alt="Does data encryption throw efficiency out the window? Not always!" width="300" height="65" /></a><p style=' padding: 0 4px 5px; margin: 0;'  class="wp-caption-text">Does data encryption throw storage efficiency out the window? Not always!</p></div>
<p>One of the great ironies of storage technology is <strong>the inverse relationship between efficiency and security</strong>: Adding performance or reducing storage requirements almost always results in reducing the confidentiality, integrity, or availability of a system.</p>
<p>Many of the advances in capacity utilization put into production over the last few years rely on deduplication of data. This key technology has moved from basic compression tools to take on challenges in the fields of replication and archiving, and is even moving into primary storage. At the same time, interconnectedness and the digital revolution has made security a greater challenge, with focus and attention turning to encryption and authentication to prevent identity theft or worse crimes. The only problem is, <strong>most encryption schemes are incompatible with compression or deduplication of data</strong>!<span id="more-1396"></span></p>
<h3 class="post-subhead">Incompatibility of Encryption and Compression</h3>
<p>Consider a basic lossless compression algorithm: We take an input file consisting of binary data and replace all repeating patterns with a unique code. If a file contained the sequence, &#8220;101110&#8243; eight hundred times in a row, we could replace the whole 4800-bit sequence with a much smaller sequence that says &#8220;repeat this eight hundred times&#8221;. In fact, this is exactly what I did (using English) in the previous sentence! This basic concept, called <a rel="nofollow" href="http://en.wikipedia.org/wiki/Run-length_encoding"  target="_blank">run-length encoding</a>, illustrates how most modern compression technology functions.</p>
<p>Replace the sequence of identical bits with a larger block of data or an entire file and you have <strong>deduplication and single-instance storage</strong>! In fact, as the compression technology gains access to the underlying data, it can become more and more efficient. The software from <a href="http://ocarinatech.com"  target="_blank">Ocarina</a>, for example, actually <em>decompresses</em> jpg and pdf files before recompressing them, resulting in astonishing capacity gains!</p>
<p>Now let&#8217;s look at compression&#8217;s secretive cousin, encryption. It&#8217;s only a small intellectual leap to use similar ideas to hide the contents of a file, rather than just squashing it. But encryption algorithms are constantly under attack, so some very smart minds have come up with some incredibly clever methods to hide data. One of the most important advances was <a rel="nofollow" href="http://en.wikipedia.org/wiki/Public_key_encryption"  target="_blank">public-key cryptography</a>, where two different keys are used: A public key used for writing, and a private key to read data. This same technique can be used to authenticate identity, since only the designated reader would (in theory) have the key required.</p>
<p>Cryptography has become exceedingly complicated lately in response to repeated attacks. Most compression and encryption algorithms are <a rel="nofollow" href="http://en.wikipedia.org/wiki/Deterministic_algorithm"  target="_blank">deterministic</a>, meaning that identical input always yields the same output. This is unacceptable for strong encryption, since a <a rel="nofollow" href="http://en.wikipedia.org/wiki/Known-plaintext_attack"  target="_blank">known plaintext attack</a> can be used with the public key to reveal the contents. Much work has focused on eliminating residues of the original data from the encrypted version, as <a rel="nofollow" href="http://en.wikipedia.org/wiki/Cipher_block_chaining#Electronic_codebook_.28ECB.29" >illustrated brilliantly</a> on Wikipedia with the classic Linux &#8220;tux&#8221; image. <strong>The goal is to make the encrypted data indistinguishable from random &#8220;noise&#8221;</strong>.</p>
<p>What happens when we mix these powerful technologies? <strong>Deduplication and encryption defeat each other</strong>! Deduplication <em>must</em> have access to repeating, deterministic data, and encryption <em>must not allow</em> this to happen. The most common solution (apart from skipping the encryption) is to place the deduplication technology first, allowing it access to the raw data before sending it on to be encrypted. But this leaves the data unprotected longer, and limits the possible locations where encryption technology can be applied. For example, an archive platform would have to encrypt data internally, since many now include deduplication as an integral component.</p>
<p>Why do we prefer compression to encryption? Simply because that&#8217;s where the money is! <strong>If we can cut down on storage space or WAN bandwidth, we see cost avoidance or even real cost savings</strong>! But if we &#8220;waste&#8221; space by encrypting data, we only save money in the case of a security breach.</p>
<h3 class="post-subhead">A Glimmer of Hope</h3>
<p>I had long thought this was an intractable problem, but a glimmer of hope recently presented itself. My hosting provider allows users to back up their files to a special repository using the rsync protocol. This is pretty handy, as you can imagine, but I was concerned about the security of this service. What happens if someone gains access to all of my data by hacking their servers?</p>
<p>At first, I only stored non-sensitive data on the backup site, but this limited its appeal. So I went looking for something that would allow me to encrypt my data before uploading it, and I discovered two interesting concepts: <strong>rsyncrypto</strong> and <strong>gzip-rsyncable</strong>.</p>
<p><a href="http://samba.anu.edu.au/rsync/"  target="_blank">rsync</a> is a solid protocol, reducing network demands by only sending the changed blocks of a file. But, as noted, compression and encryption tools change the whole file even if only a tiny bit has been altered. A few years back, the folks behind rsync (who also happen to be the minds behind the Samba CIFS server) developed a patch for gzip which causes it to compress files in chunks rather than in their entirety. This patch, called gzip-rsyncable, hasn&#8217;t been added to the main source even after a dozen years, but yields amazing results in accelerating rsync performance.</p>
<p>The same technique was then applied to RSA and AES cryptography to create <a href="http://rsyncrypto.lingnu.com/index.php/Home_Page"  target="_blank">rsyncrypto</a>. This open source encryption tool makes a simple tweak to the standard CBC encryption schema (reusing the initialization vector) to allow encrypted files to be sent more efficiently over rsync. In fact, it relies on gzip-rsyncable to work its magic. Of course, the resulting file is somewhat less secure, but it is probably more than enough to keep a casual snooper at bay.</p>
<p><strong>Both of these tools are similar to modern deduplication techniques</strong> in that they chop files up into smaller, variable-sized blocks before working their magic. And the result is awesome: I modified a single word in a large word document that I had previously encrypted and stored at the backup site and was able to transfer just a single block of the new file in an instant rather than a few minutes. My only real issue is the lack of integration of all of these tools: I had to write a bash script to encrypt  my files to a temporary directory before rsyncing them. I wish they could be integrated with the main gzip and rsync sources!</p>
<p>If you are interested in trying out these tools for yourself, and if you use a Mac, you are in luck: Macports offers both tools as simple downloads! Just <a href="http://macports.org"  target="_blank">install macports</a>, type &#8220;sudo port install gzip +rsyncable&#8221; to install gzip with the &#8211;rsyncable flag, then type &#8220;sudo port install rsyncrypto&#8221; and you&#8217;re done! I&#8217;ll post more details here if there is interest.</p>
<div id="crp_related"><h3>You might also want to read these other posts...</h3><ul><li><a href="http://blog.fosketts.net/2011/07/05/mac-dropbox-encrypted-volume/"  rel="bookmark" class="crp_title">Mac Users, Secure Your Stuff in Dropbox</a></li><li><a href="http://blog.fosketts.net/2011/09/22/data-reduction-condensed-version/"  rel="bookmark" class="crp_title">Data Reduction: the Condensed Version</a></li><li><a href="http://blog.fosketts.net/2008/09/16/deduplication-primary-storage/"  rel="bookmark" class="crp_title">Deduplication Coming to Primary Storage</a></li><li><a href="http://blog.fosketts.net/2009/12/01/iphone-locked-exchange-fix/"  rel="bookmark" class="crp_title">How iPhone OS 3.1 Locked Some Out Of Exchange, And How To Fix It</a></li><li><a href="http://blog.fosketts.net/2011/07/11/dropbox-data-format-deduplication/"  rel="bookmark" class="crp_title">How Does Dropbox Store Data?</a></li></ul></div><script src="http://feeds.feedburner.com/~s/sfoskett?i=http://blog.fosketts.net/2009/02/05/compression-encryption-deduplication-replication/" type="text/javascript" charset="utf-8"></script><hr />
<p><small>© sfoskett for <a href="http://blog.fosketts.net">Stephen Foskett, Pack Rat</a>, 2009. |
<a href="http://blog.fosketts.net/2009/02/05/compression-encryption-deduplication-replication/">Compression, Encryption, Deduplication, and Replication: Strange Bedfellows</a>
<br/>
This post was categorized as <a href="http://blog.fosketts.net/category/everything/enterprisestorage/" title="View all posts in Enterprise storage" rel="category tag">Enterprise storage</a>, <a href="http://blog.fosketts.net/category/everything/personal/" title="View all posts in Personal" rel="category tag">Personal</a>. Each of my categories has its own feed if you'd like to filter out or focus on posts like this.<br/>
</small></p>]]></content:encoded>
			<wfw:commentRss>http://blog.fosketts.net/2009/02/05/compression-encryption-deduplication-replication/feed/</wfw:commentRss>
		<slash:comments>14</slash:comments>
		</item>
		<item>
		<title>EMC Atmos Versus VMware VDC-OS: Will The Real Cloud Strategy Please Stand Up?</title>
		<link>http://blog.fosketts.net/2008/11/10/emc-atmos-vmware-vdc-os-cloud-strategy/</link>
		<comments>http://blog.fosketts.net/2008/11/10/emc-atmos-vmware-vdc-os-cloud-strategy/#comments</comments>
		<pubDate>Mon, 10 Nov 2008 16:03:42 +0000</pubDate>
		<dc:creator>Stephen</dc:creator>
				<category><![CDATA[Enterprise storage]]></category>
		<category><![CDATA[Virtual Storage]]></category>
		<category><![CDATA[Amazon]]></category>
		<category><![CDATA[Atmos]]></category>
		<category><![CDATA[Azure]]></category>
		<category><![CDATA[CAS]]></category>
		<category><![CDATA[Centera]]></category>
		<category><![CDATA[Chuck Hollis]]></category>
		<category><![CDATA[CIFS]]></category>
		<category><![CDATA[cloud]]></category>
		<category><![CDATA[cloud storage]]></category>
		<category><![CDATA[Cloud vServices]]></category>
		<category><![CDATA[compression]]></category>
		<category><![CDATA[COS]]></category>
		<category><![CDATA[deduplication]]></category>
		<category><![CDATA[EMC]]></category>
		<category><![CDATA[HCAP]]></category>
		<category><![CDATA[Hitachi]]></category>
		<category><![CDATA[Maui]]></category>
		<category><![CDATA[Microsoft]]></category>
		<category><![CDATA[NAS]]></category>
		<category><![CDATA[nas storage]]></category>
		<category><![CDATA[network attached storage]]></category>
		<category><![CDATA[network storage]]></category>
		<category><![CDATA[NFS]]></category>
		<category><![CDATA[Nirvanix]]></category>
		<category><![CDATA[replication]]></category>
		<category><![CDATA[REST]]></category>
		<category><![CDATA[SOAP]]></category>
		<category><![CDATA[Steve Todd]]></category>
		<category><![CDATA[VDC-OS]]></category>
		<category><![CDATA[VMware]]></category>

		<guid isPermaLink="false">http://blog.fosketts.net/?p=1075</guid>
		<description><![CDATA[As I guessed on Friday, EMC has officially announced their Maui Atmos software layer today, calling it the &#8220;industry&#8217;s first COS (cloud-optimized storage) offering&#8221;, &#8220;a new era for IT&#8221;, and &#8220;a new category of storage.&#8221; So the new era for IT is a cloud with globally-distributed object stores with policy management? Great! But I thought [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://blog.fosketts.net/2008/11/07/emc-maui/"  target="_blank">As I guessed on Friday</a>, EMC has officially announced their <span style="text-decoration: line-through;">Maui</span> Atmos software layer today, <a href="http://www.emc.com/products/category/subcategory/cloud-optimized-storage.htm?CMP=ILC-carHP&amp;panel=harnessing+cloud+computin"  target="_blank">calling</a> it the &#8220;industry&#8217;s first COS (cloud-optimized storage) offering&#8221;, &#8220;a new era for IT&#8221;, and &#8220;a new category of storage.&#8221; So the new era for IT is a cloud with globally-distributed object stores with policy management?</p>
<p>Great! But I thought the new era for IT was a cloud with choice, mobility, and application support, as <a href="http://www.vmware.com/technology/virtual-datacenter-os/cloud-vservices/"  target="_blank">trumpeted</a> by EMC&#8217;s VMware subsidiary! Wasn&#8217;t Cloud vServices from VDC-OS supposed to be the <a href="http://blog.fosketts.net/2008/09/16/vmware-virtual-datacenter-operating-system-vdc-os/"  target="_blank">prototype cloud strategy</a> for the datacenter?</p>
<p>What we have here is <strong>a simple clash of marketing</strong> amusingly taking place at (nearly) the same company. VMware figured out how to extend their server virtualization products outside the confines of the data center, and laid that technology out as a strategy with the trendy &#8220;cloud&#8221; name. Meanwhile, mother EMC is working on next-generation content storage software and decides to roll that out as a strategy and also jumps on the &#8220;cloud&#8221; meme. What&#8217;s an IT manager to do?<span id="more-1075"></span></p>
<h3 class="post-subhead">Defining Atmos</h3>
<p>As predicted, EMC&#8217;s Atmos (code-name Maui) is a <a href="http://www.theregister.co.uk/2008/11/10/emc_launches_maui_as_atmos/"  target="_blank">distributed software layer</a> to handle the storage and management of data objects across geographically-dispersed storage devices. EMC&#8217;s Chuck Hollis <a href="http://chucksblog.emc.com/chucks_blog/2008/11/emc-atmos-maui-is-here.html"  target="_blank">demonstrates Atmos</a> with a simple, practical example, perhaps making it sound too much like Akamai but generally getting the point across. You have a data object, write it to Atmos through REST/SOAP or CIFS/NFS, assign some metadata, and the software takes care of data placement for you. It&#8217;ll add local copies, replicate for availability and performance, compress or deduplicate, manage versions, and all sorts of goodies (if you ask it to).</p>
<p>But EMC already has a capable object storage platform, the Centera. We&#8217;ve just got used to the content-addressable storage (CAS) label for object storage (even though this name misses the point of object storage, in my opinion) and now EMC wants us to learn a new label for a somewhat-similar device? Steve Todd, EMC&#8217;s object guy extraordinaire, <a rel="nofollow" href="http://stevetodd.typepad.com/my_weblog/2008/11/atmos-cloud-optimized-storage.html"  target="_blank">lays it out</a>:</p>
<blockquote><p>SAN Value = Centralized, secure multi-tenancy for blocks.</p>
<p><strong><span style="font-weight: normal;">NAS Value = Centralized, secure multi-tenancy for files.</span></strong></p>
<p><strong><span style="font-weight: normal;">CAS Value = Centralized, secure multi-tenancy for objects (content + metadata).</span></strong></p>
<p><strong><span style="font-weight: normal;">COS Value = </span><em><span style="font-weight: normal;">Globalized</span></em><span style="font-weight: normal;">, secure multi-tenancy for content with </span><em><span style="font-weight: normal;">rich policies</span></em><span style="font-weight: normal;">.</span></strong></p>
</blockquote>
<p>Ok, so <strong>the defining capabilities of Atmos are its global scale and rich policies</strong>. And the fact that &#8220;objects&#8221; has become &#8220;content&#8221;, presumably since Atmos can handle traditional NAS (CIFS/NFS) chores as well.</p>
<h3 class="post-subhead">Prayers Answered?</h3>
<p>It sounds like EMC is answering <a href="http://blog.fosketts.net/2008/09/28/we-need-storage-revolution/"  target="_blank">my prayers for a storage revolution</a>, delivering a highly-capable object storage platform that transcends the old limits of blocks, directories, and files. Steve Todd points out that Atmos handles five policy categories out of the box:</p>
<ul>
<li>Replication</li>
<li>Compression</li>
<li>Spin-down</li>
<li>Object de-dup</li>
<li>Versioning</li>
</ul>
<p>So we write some data to Atmos, using either traditional NAS or <a rel="nofollow" href="http://en.wikipedia.org/wiki/Web_2.0"  target="_blank">webby dubby</a> protocols like <a rel="nofollow" href="http://en.wikipedia.org/wiki/SOAP_(protocol)"  target="_blank">SOAP</a>, and can then apply policies in any of these five categories to that data. One can also extend the Atmos to accept other policies, but the absence (out of the box) of concepts like encryption, secure deletion, retention, and access control are surprising.</p>
<p>I am quite puzzled about how practical these policy capabilities will be in the real world. How exactly would an application say &#8220;I want you to compress that file I wrote over NFS just now?&#8221; Hitachi&#8217;s HCAP platform, for example, also has policy capabilities and a NAS front end, and although archiving applications can communicate their policy needs, <strong>I don&#8217;t see lots of current general-purpose applications using it</strong>.</p>
<h3 class="post-subhead">Strategic Storage?</h3>
<p>This brings me to my puzzlement: The default Atmos policies are all general-purpose, production computing ideas, not the special-purpose, archiving and retention needs served by Centera, HCAP, and the rest. So <strong>the Atmos is clearly intended to be a production data storage system</strong>, not an archiving system to compete with Centera.</p>
<p>Since mainstream business applications currently don&#8217;t have any capability to specify policies like these when writing files, and since NAS protocols lack any means to communicate them even if the apps want to, we can conclude that <strong>EMC expects that Atmos users will write special applications to take advantage of it</strong>.</p>
<p>EMC certainly doesn&#8217;t expect that the NAS-capable Atmos will simply replace today&#8217;s distributed NAS solutions. <strong>NAS is a sideshow for Atmos</strong>. The real action will be in the REST/SOAP webby dubby applications that will be written with the platform in mind and will take full advantage of these capabilities.</p>
<p>If this is true, and I <a rel="nofollow" href="http://storagebod.typepad.com/storagebods_blog/2008/11/i-like-a-party-with-a-atmosphere.html"  target="_blank">and others</a> suspect that it is, then <strong>Atmos really isn&#8217;t a game-changing platform unless you change your game</strong>. If you write new applications to store data with SOAP, Atmos is a nice in-house alternative to Amazon S3 or Nirvanix, and offers a very compelling set of data management capabilities. And if you want to set up shop to compete with those service providers, Atmos is a dream come true with <a rel="nofollow" href="http://storagezilla.typepad.com/storagezilla/2008/11/building-emc-atmos.html"  target="_blank">built-in multi-tenancy</a>.</p>
<h3 class="post-subhead">Datacenter Strategy</h3>
<p>So EMC alone has two seemingly competitive datacenter strategies. And then there&#8217;s Microsoft, which announced its <a href="http://dcsblog.burtongroup.com/data_center_strategies/2008/10/waiting-for-the-other-shoe-to-drop.html"  target="_blank">Azure cloud platform</a> recently, and Amazon and the other cloud providers.</p>
<p>So let&#8217;s say you&#8217;re a CIO for a large corporation. Which of the following strategies is more compelling:</p>
<ol>
<li>Use <strong>VMware VDC-OS</strong> to add capabilities and <strong>Cloud vServices</strong> extend your current virtual infrastructure geographically</li>
<li>Recompile and tweak your Windows applications to leverage <strong>Microsoft Azure</strong></li>
<li>Develop new applications to take advantage of the impressive storage capabilities of an in-house <strong>EMC Atmos </strong>system</li>
<li>Point your new applications at a third-party cloud provider like Amazon or Nirvanix</li>
</ol>
<p>IT people are practical. Although we love new technology, we tend to be cautious. We also hate massive software development efforts, and only sanction them when they&#8217;re absolutely necessary. Given these personality traits, I&#8217;d say VDC-OS and perhaps Cloud vServices still stands out as the most likely and practical scenario for the majority of applications and businesses.</p>
<p>This is not to say that EMC Atmos will be a flop. I&#8217;m impressed by the technology, and expect that Atmos will find buyers, just as Centera did. And Atmos might even replace Centera once EMC adds retention policies to it and scales it down as well as up and out. But Atmos will not redefine the datacenter. We&#8217;re stuck with blocks and files, and VMware&#8217;s practical strategy is a winner in that world.</p>
<p><strong>Update:</strong> <a href="http://www.storagerap.com/2008/11/atmos-dead-or-not-dead-innovative-or-repetitive.html"  target="_blank">Marc Farley compares Atmos to WAFS</a>, with ominous implications, and echos my recent question on what is and is not innovative.</p>
<p><strong>Update 2:</strong> Chuck Hollis, Storagezilla, and <a rel="nofollow" href="http://lensblog.typepad.com/ebiz/2008/11/emc-announces-atmos.html"  target="_blank">Len Devanna</a> have all come right out and said that this is only intended for certain customers with massive distributed storage needs, and is not intended as a new datacenter strategy. Even the &#8220;cloudfella&#8221; says &#8220;ciao&#8221;:</p>
<p>
<object width="425" height="344" data="http://www.youtube.com/v/eaqklyv3yrg&amp;hl=en&amp;fs=1" type="application/x-shockwave-flash"><param name="allowFullScreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="src" value="http://www.youtube.com/v/eaqklyv3yrg&amp;hl=en&amp;fs=1" /><param name="allowfullscreen" value="true" /></object>
</p>
<p><strong>Update 3:</strong> More great information, including <a rel="nofollow" href="http://virtualgeek.typepad.com/virtual_geek/2008/11/whats-the-relat.html"  target="_blank">a reply regarding VDC-OS and Atmos</a> from the one and only Chad Sakac, more great detail about <a rel="nofollow" href="http://stevetodd.typepad.com/my_weblog/2008/11/atmos-policy-under-the-hood.html"  target="_blank">the inner workings of Atmos</a> from Steve Todd, and <a href="http://flickerdown.com/?p=268"  target="_blank">even more info</a> from Dave Graham. Finally, although I think that Cloudfellas video is cute, I wouldn&#8217;t categorize it as viral. But <a rel="nofollow" href="http://lensblog.typepad.com/ebiz/2008/11/beware-flaming-appliances-from-the-sky.html"  target="_blank">those Mozy ads</a> are awesome!</p>
<blockquote><p>See my posts on <a href="http://gestaltit.com/author/stephen/"  target="_blank">Gestalt IT</a> for similar <a href="http://gestaltit.com"  target="_blank">enterprise IT infrastructure commentary</a></p>
</blockquote>
<div id="crp_related"><h3>You might also want to read these other posts...</h3><ul><li><a href="http://blog.fosketts.net/2008/11/07/emc-maui/"  rel="bookmark" class="crp_title">EMC About To Take Us To Maui&#8230;</a></li><li><a href="http://blog.fosketts.net/2010/10/26/caringo-castor-cloud-storage/"  rel="bookmark" class="crp_title">Caringo Bulks Up CAStor For Cloud Services</a></li><li><a href="http://blog.fosketts.net/2008/09/16/vmware-virtual-datacenter-operating-system-vdc-os/"  rel="bookmark" class="crp_title">VMware Virtual Datacenter Operating System: Heavyweight or Hot Air?</a></li><li><a href="http://blog.fosketts.net/2009/03/19/sun-cloud/"  rel="bookmark" class="crp_title">Sun Launches Their Own Cloud, But For Which Market?</a></li><li><a href="http://blog.fosketts.net/2011/04/24/changing-it-organization-roles/"  rel="bookmark" class="crp_title">Changes in Technology Drive Changes in IT Organizations and Roles</a></li></ul></div><script src="http://feeds.feedburner.com/~s/sfoskett?i=http://blog.fosketts.net/2008/11/10/emc-atmos-vmware-vdc-os-cloud-strategy/" type="text/javascript" charset="utf-8"></script><hr />
<p><small>© sfoskett for <a href="http://blog.fosketts.net">Stephen Foskett, Pack Rat</a>, 2008. |
<a href="http://blog.fosketts.net/2008/11/10/emc-atmos-vmware-vdc-os-cloud-strategy/">EMC Atmos Versus VMware VDC-OS: Will The Real Cloud Strategy Please Stand Up?</a>
<br/>
This post was categorized as <a href="http://blog.fosketts.net/category/everything/enterprisestorage/" title="View all posts in Enterprise storage" rel="category tag">Enterprise storage</a>, <a href="http://blog.fosketts.net/category/everything/virtualstorage/" title="View all posts in Virtual Storage" rel="category tag">Virtual Storage</a>. Each of my categories has its own feed if you'd like to filter out or focus on posts like this.<br/>
</small></p>]]></content:encoded>
			<wfw:commentRss>http://blog.fosketts.net/2008/11/10/emc-atmos-vmware-vdc-os-cloud-strategy/feed/</wfw:commentRss>
		<slash:comments>14</slash:comments>
		</item>
		<item>
		<title>Deduplication Coming to Primary Storage</title>
		<link>http://blog.fosketts.net/2008/09/16/deduplication-primary-storage/</link>
		<comments>http://blog.fosketts.net/2008/09/16/deduplication-primary-storage/#comments</comments>
		<pubDate>Tue, 16 Sep 2008 19:28:37 +0000</pubDate>
		<dc:creator>Stephen</dc:creator>
				<category><![CDATA[Computer History]]></category>
		<category><![CDATA[Enterprise storage]]></category>
		<category><![CDATA[Features]]></category>
		<category><![CDATA[Virtual Storage]]></category>
		<category><![CDATA[Atari]]></category>
		<category><![CDATA[Byte]]></category>
		<category><![CDATA[capacity optimization]]></category>
		<category><![CDATA[CAS]]></category>
		<category><![CDATA[Centera]]></category>
		<category><![CDATA[compression]]></category>
		<category><![CDATA[data deduplication]]></category>
		<category><![CDATA[Data Domain]]></category>
		<category><![CDATA[deduplication]]></category>
		<category><![CDATA[DR-DOS]]></category>
		<category><![CDATA[EMC]]></category>
		<category><![CDATA[FilePool]]></category>
		<category><![CDATA[greenBytes]]></category>
		<category><![CDATA[Huffman coding]]></category>
		<category><![CDATA[Microsoft]]></category>
		<category><![CDATA[NetApp]]></category>
		<category><![CDATA[Riverbed]]></category>
		<category><![CDATA[single-instance storage]]></category>
		<category><![CDATA[Stacker]]></category>
		<category><![CDATA[VMware]]></category>
		<category><![CDATA[VTL]]></category>

		<guid isPermaLink="false">http://blog.fosketts.net/?p=626</guid>
		<description><![CDATA[Although deduplication of storage is nothing new, with Data Domain and other making hay with the technique for years, it has never been ready for prime time - reduction of active primary storage applications like email and databases. Instead, deduplication has been relegated to second- or third-tier status, deduplicating archives and backup data. But change is in the air, and deduplication vendors are starting to bustle towards the bright lights of primary storage.]]></description>
			<content:encoded><![CDATA[<p style="padding-left: 30px;"><em>This is a follow-up to my story, <a href="http://blog.fosketts.net/2008/03/12/de-duplication-goes-mainstream/"  target="_self">De-Duplication Goes Mainstream</a></em></p>
<p>Although deduplication of storage is nothing new, with Data Domain and other making hay with the technique for years, it has never been ready for prime time &#8211; reduction of active primary storage applications like email and databases. Instead, deduplication has been relegated to second- or third-tier status, deduplicating archives and backup data. But change is in the air, and deduplication vendors are starting to bustle towards the bright lights of primary storage.</p>
<h3>Stone Knives and Bear Skins</h3>
<p>We have all been here before, of course. Back at the dawn of the personal computer era, data compression was a hot topic of conversation. I recall being so impressed by an article in <a rel="nofollow" href="http://en.wikipedia.org/wiki/Byte_(magazine)"  target="_blank">Byte</a> (1986:5, p99) outlining <a rel="nofollow" href="http://en.wikipedia.org/wiki/Huffman_coding"  target="_blank">Huffman coding</a> that I tried cooking up an implementation in Atari BASIC. Lossless compression has a magical pull to the geek in many of us &#8211; redundant data just <em>wants</em> to be eliminated!</p>
<div id="attachment_630" class="wp-caption alignright" style="width: 254px;  border: 1px solid #dddddd; background-color: #f3f3f3; padding-top: 4px; margin: 10px; text-align:center; float: right;"><a href="http://blog.fosketts.net/wp-content/uploads/2008/09/sc0003b3d4.png" ><img class="size-full wp-image-630 " title="Stacker" src="http://blog.fosketts.net/wp-content/uploads/2008/09/sc0003b3d4.png" alt="Stacker dominated the disk compression world - until Microsoft introduced DOS 6.0" width="244" height="254" /></a><p style=' padding: 0 4px 5px; margin: 0;'  class="wp-caption-text">Stacker dominated the disk compression world - until Microsoft introduced DOS 6.0</p></div>
<p>Companies soon applied <a href="http://www.zisman.ca/Articles/1993/DOS6.html"  target="_blank">compression to primary storage</a>, especially the limited storage in personal computers. <a rel="nofollow" href="http://en.wikipedia.org/wiki/Stac_Electronics#Microsoft_lawsuit"  target="_blank">Stacker</a> was a hit after 1990, until Microsoft built a workalike, called DoubleSpace, into DOS 6.0 in 1993, leading to a historical lawsuit. I personally used the ADDSTOR disk compression built into DR-DOS 6.0 to stretch two more years out of the 20 MB MFM hard drive in my AT&amp;T PC6300 at <a href="http://wpi.edu"  target="_blank">WPI</a>.</p>
<p>But something funny happened in the late 1990s: Compression began to lose its luster. Compressing data always takes quite a bit of CPU power, but this was offset somewhat by the truncated data transfers and more-efficient file system layout afforded in early PCs. But as disks got larger and faster, using precious CPU time to save space seemed less and less compelling. Today, although nearly every operating system includes built-in compression of files, folders, or perhaps disks, these features are rarely used. And compression was never popular in the performance-sensitive enterprise space.</p>
<h3><strong>Deduplication Has a Nice Ring</strong></h3>
<p>Although traditional fine-grained compression has not been very successful in the enterprise, its lanky cousin, single-instance storage, has long found niche jobs. Applications from databases to email systems to file servers have long had the ability to recognize to requests to store the exact same file or record, and to store just a single instance in this case. Even file systems have the ability to do single instance storage through the use of links, though this is initiated by the user rather than in an automated fashion.</p>
<p>In the late 1990s, FilePool began developing a <a rel="nofollow" href="http://en.wikipedia.org/wiki/Content-addressable_storage"  target="_blank">content-addressable storage</a> device, which was acquired by EMC in 2001. This device, later known as the Centera, was one of a number of storage platforms targeted at the archiving market introduced this decade. At the same time, <a rel="nofollow" href="http://en.wikipedia.org/wiki/Virtual_tape_library"  target="_blank">virtual tape libraries</a> made the jump from the mainframe to open systems. Both devices, being outside the critical path of performance but offering massive capacity, were well-suited to implement advanced <a rel="nofollow" href="http://en.wikipedia.org/wiki/Capacity_optimization"  target="_blank">capacity optimization</a> technologies that combined the concepts of compression with single-instance storage. Thus was created the modern world of data deduplication.</p>
<p>What we think of as deduplication is neither fish nor fowl: It assesses larger &#8220;chunks&#8221; of data than compression technologies, delivering greater capacity savings and potentially reducing performance impact, but is more flexible than single-instancing, recognizing the similarities within files or objects.</p>
<p>But it is still maddeningly difficult to scale deduplication while maintaining performance. Rather than fight to maintain reasonable write throughput, most deduplication products have switched to post-processing, deferring their work to quieter times.</p>
<h3><strong>It&#8217;s Not Just for Breakfast</strong></h3>
<p>Regardless of their methods or underlying technology, no deduplication vendor has stood up to support challenging low-latency or high-throughput production applications, however. <a href="http://blog.fosketts.net/2008/03/12/de-duplication-goes-mainstream/"  target="_self">NetApp was the first to raise the issue of support for production applications</a>, but although they tout the technology for VMware, they haven&#8217;t exactly been shouting from the rooftops to get their A-SIS deduplication technology deployed in other high-I/O applications. And I haven&#8217;t seen Hifn&#8217;s card yet.</p>
<p>Yesterday, I mentioned that greenBytes was adding deduplication to their ZFS-based storage array for primary data. And now <a href="http://www.theregister.co.uk/2008/09/16/deduplicating_primary_storage/"  target="_blank">Riverbed has fired another shot</a> over the bow, repurposing their (deduplicating) WAN accelerator product for primary (file) storage. They might be able to pull it off, too, since they have a long list of customers who are already enjoying the technology in production. It&#8217;s not a stretch to suggest that Riverbed&#8217;s appliances can scale to handle production data loads. Although it&#8217;s file-only, I can imagine quite a few scenarios where this tech could really yield benefits. Could we come full-circle, with deduplication finally reaching the enterprise storage world?</p>
<div id="crp_related"><h3>You might also want to read these other posts...</h3><ul><li><a href="http://blog.fosketts.net/2008/09/25/deduplication-ready-prime-time/"  rel="bookmark" class="crp_title">Is Deduplication Ready for Prime Time?</a></li><li><a href="http://blog.fosketts.net/2011/09/22/data-reduction-condensed-version/"  rel="bookmark" class="crp_title">Data Reduction: the Condensed Version</a></li><li><a href="http://blog.fosketts.net/2008/09/15/greenbytes-embraces-extends-zfs/"  rel="bookmark" class="crp_title">greenBytes Embraces and Extends ZFS</a></li><li><a href="http://blog.fosketts.net/2009/02/05/compression-encryption-deduplication-replication/"  rel="bookmark" class="crp_title">Compression, Encryption, Deduplication, and Replication: Strange Bedfellows</a></li><li><a href="http://blog.fosketts.net/2011/05/27/storage-decisions-chicago/"  rel="bookmark" class="crp_title">Storage Decisions Chicago: All About Capacity Optimization</a></li></ul></div><script src="http://feeds.feedburner.com/~s/sfoskett?i=http://blog.fosketts.net/2008/09/16/deduplication-primary-storage/" type="text/javascript" charset="utf-8"></script><hr />
<p><small>© sfoskett for <a href="http://blog.fosketts.net">Stephen Foskett, Pack Rat</a>, 2008. |
<a href="http://blog.fosketts.net/2008/09/16/deduplication-primary-storage/">Deduplication Coming to Primary Storage</a>
<br/>
This post was categorized as <a href="http://blog.fosketts.net/category/everything/computerhistory/" title="View all posts in Computer History" rel="category tag">Computer History</a>, <a href="http://blog.fosketts.net/category/everything/enterprisestorage/" title="View all posts in Enterprise storage" rel="category tag">Enterprise storage</a>, <a href="http://blog.fosketts.net/category/features/" title="View all posts in Features" rel="category tag">Features</a>, <a href="http://blog.fosketts.net/category/everything/virtualstorage/" title="View all posts in Virtual Storage" rel="category tag">Virtual Storage</a>. Each of my categories has its own feed if you'd like to filter out or focus on posts like this.<br/>
</small></p>]]></content:encoded>
			<wfw:commentRss>http://blog.fosketts.net/2008/09/16/deduplication-primary-storage/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

