I’m not a big fan of data reduction technology, yet I found myself talking compression and de-duplication all week. Between Storage Decisions and my recent posts over at SearchStorage and The Storage Community, I’ve had quite a bit to say on the subject. Funny enough, I’m not really a fan of data reduction technology for primary storage. Too often, data reduction is more expensive and difficult than just storing raw data.
You should also read Deduplication Coming to Primary Storage and Compression, Encryption, Deduplication, and Replication: Strange Bedfellows
Storage Decisions
My Storage Decisions presentation on data reduction was hilarious, if I do say so myself, even though turnout was poor at 8:30 AM on Tuesday morning. Maybe it was this “intimate” group, but I found myself really getting into the discussion. And the nods and hollers from the audience helped, too!
My basic thesis at Storage Decisions was the same as always: Don’t throw good money at technology that will have little ROI. Considering that disk capacity is incredibly cheap, and dropping all the time, data reduction doesn’t look like a great fit except in certain situations. Why spend money to reduce utilization? Why put in the effort when most primary storage data reduction technologies don’t do anything to address the “multiplier effect” of archiving, DR, and backup storage?
This is not to say that all data reduction technology is worthless. In fact, the free compression and de-duplication built into many SSDs and even some enterprise storage devices make perfect sense. I just don’t understand spending a bunch of money to address storage capacity when most applications are starved for storage performance.
You might like reading my two other posts on the subject from last week:
- Interest in data reduction methods needs to keep pace with data growth (SearchStorage.com)
- Has the Time Finally Come for Data Reduction? (The Storage Community, sponsored by IBM)
You’re Losing Me
On the other hand, I do see quite a bit of value in something many people would overlook out of hand: Lossy compression of office files. Every systems administrator knows that end-users do “stupid stuff” like embedding massive photos and videos in PowerPoint presentations and Word documents. But not everyone knows that there are technological means to address this “PEBKAC” issue.
Some office applications already automatically reduce the size of embedded content, and operating systems can do the same. One of my more popular blog posts, in fact, is a technique to create a filter to reduce the size of PDF files in Mac OS X Preview. And the Microsoft “X” Office file formats include lossless compression as well.
An application that recently caught my eye is the FILEminimizer Suite by Balesio. This inexpensive application reduces the size of Office and media files while leaving them in their native format. It re-compresses image files, reducing them to optimum size for use in presentations, documents, or printouts. A companion product, FILEminimizer Server, can be used on enterprise file servers to perform the same magic across the whole range of users.
Stephen’s Stance
Native Format Optimization (NFO) makes a lot of sense, since it addresses a common user error in a practical way, and allows capacity savings to “trickle-down” to backups, e-mail systems, and archives. But wholesale compression and the duplication of primary storage may not be worth much, especially since the cost of disk keeps dropping dramatically.
Andrew vonNagy says
Stephen,
You should also consider the indirect costs of storage growth, like storage processors, enclosures, SAN ports, data center rack space, power, cooling, etc.
How does this alter the analysis?
Thanks,
Andrew vonNagy