Without getting into the debate on blogketing (I’ll save that for another post), I was pretty impressed by Chuck Hollis’ recent post on ILM. I think he’s made a good discussion of the wherefores of ILM, and maybe counteracted a bit of the prevailing anti-ILM argument.
I’ve been in the trenches on storage content (aka data) for a long time. I, too, have often reverted to the old “gigs of MP3s and porn” argument from time to time. But I’ve done enough filesystem assessments at real companies to realize that that’s not really the norm. In fact, I’ve rarely found much porn, music, video, or jokes on full-up corporate file servers. And I’ve analyzed enough storage environments to know that, while file servers are big, they’re not normally the majority user of storage in large data centers.
On the contrary, most enterprise storage is taken up by business applications, though not necessarily critical data. Email, backup, and certainly user file servers are big space users. But give me a few Oracle instances, source code repositories, or image processing servers, and watch those applications shrink in significance.
No matter what the application, though, the real issue with storage growth (and ILM) is the (in)ability of IT managers to do anything about it. Let’s say we had permission to delete really inappropriate data, which is not a sure thing. Would we IT folks even be able to recognize it? How would we locate it? Can we even view user files without violating user trust, company privacy policies, or even laws? Many countries (yes, not all data is in the USA), regulate access to data even inside a company.
Now let’s move into grayer areas of “unnecessary” corporate data. Many storage administrators can’t even name the applications that take up all that space, let alone understand the intricacies of the data under management. To make a timely (and tired) Harry Potter analogy, IT are the house-elves of the business – powerful but subservient, with little input into what happens above and around them. I’ve talked to business people who don’t want IT to have any input, relegating them to order takers and laborers.
This is a dangerous slide, however. Lots of people have the capability to take IT orders and keep the lights on, a realization that leads to outsourcing. IT pros must prove their worth to the business in order to remain relevant and irreplaceable!
ILM is one way to do that. To get back to Chuck’s post, we need to take the reins and try to understand data better. We need to pick certain applications that lend themselves to automated data classification and tiered storage and try to get them under control. Email is a great candidate, and that’s why email archiving applications have taken off recently. File servers are coming along, too, especially with file virtualization in the ascendancy.
I’m particularly excited about what a smart IT manager I know called the “second wave” of SRM tools. Rather than just collecting stock metadata (age, name, owner, etc), the latest filesystem scanning tools look inside a file, trying to better classify them. Let’s say 1/4 of your file server is made up of Microsoft Word, Excel, and PowerPoint documents. What can you do about that unless you can identify which are critical and which are not? Each business will have its own criteria, and you need a flexible tool to scan them all and report back to you before you can “ILM” them. That’s what lots of software vendors are currently working on, and though we’re at an early stage still, the results are promising.
Sadly, though, we in IT may soon find that we just can’t delete anything. Even totally banned content like porn could be critical to a legal case against an employee, and it won’t be long before we are expected to keep everything that shows up on our servers for a very long time. Most companies have policies for hardcopy document retention, and many are currenyly diving into the world of data policy as well. The default policy may be “keep until we decide what to do with it”, and this could cause the current trend of storage growth to accelerate!
If we can’t delete data, we will be forced to sail the Titanic rather than sink it. Small companies can benefit most from the falling price of storage, since the entire storage footprint for a little shop is often under a terabyte. But larger organizations will find that they need to start tiering their storage, and quickly in order to keep prices under control.
And then there’s green storage. Again, Mr. Toigo makes the very valid point that the problem is in the business, not in the hardware we use. But if we can’t do anything about data growth for the time being, we had better start tackling the technical challenges we face. I’ve talked to many IT folks who are very worried about data center space, as well as the terrifying trio of heat, power, and cooling. For them, green technologies are no laughing matter! If you can’t get any more power, you have to lower your per-GB requirement and quickly.
It’s easy to say “understand your data and delete some”, but hard for IT pros to actually do it. Until we can tackle the strategic issue of data growth, we’ll have to continue fighting the tactical problems of storage.