When was the last time you deleted data? Even at home, where we have autonomy and authority over our own data, many of us are digital pack rats. But at work? Never! No one ever deletes anything! Let’s talk about why this is.
Retention vs. Deletion
Just about everything we do in IT infrastructure is focused on retention. We back up our data and implement other data protection tools like snapshots and mirrors. We might also archive data so that the General Counsel can place legal hold on it, as well as perform data discovery during litigation. And then there’s the whole field of data security, focused on locking people out of data, keeping it intact and un-viewed.
But what about deletion? Almost no effort is put towards removing data, though the rapid growth of storage might lead one to think this is a key area for IT. We certainly could put some effort on revision control, and especially deleting drafts and outdated data. We could easily expire content that was no longer needed, if only we had some way to know that. And we’ve talked a lot about secure deletion, even though we hardly ever actually perform that task except when moving to new physical storage hardware.
The greatest challenge for deletion is a simple question: What should we delete and when?
IT can not answer these questions. They must be put to the business people who really own the data. Without permission and buy-in, IT is in serious legal peril when it comes to deleting data: Any deletion must be in accordance with policy and must be legal, that is there is no legal or regulatory hold on it. And there is no way most IT staff feel empowered to do that!
Some Data Should Be Deleted
Certainly, not all data should be saved. There is “low-hanging fruit” in every storage estate that can and should be deleted:
- Ephemeral copies – Drafts, temporary data, working copies
- Time-limited projects — Third-party or client data, test and development
- Expired data — Retention policies that are expired and no legal hold remains
- Legally required – Data that isn’t yours, or that legal demands deleted
Tackling these data sets is much easier to tackle than cleaning out primary data stores, since it doesn’t require as much sifting and sorting: These data sets can often be identified programmatically! If you have data sets like these, this is the ideal place to start a deletion effort.
Delete on Demand
Regardless of the type, however, IT should not delete data without direction. It is perilous in today’s legal environment to destroy data without a policy directing that action. So we should continue to focus on retention for most data, while we work with legal to determine which data can be deleted and come up with a process for approval.
But it’s important to start offer a deletion-friendly environment for certain data types. Such a storage system would reduce the difficulties associated with data deletion. Really, only an integrated solution can truly delete data:
- It must maintain custody of data from start to end and not allow it to leak all over the organization
- It must be accessible since any restrictions tempt users to create “working copies”, thus thwarting deletion
- It must be secure — Data must always be encrypted to avoid remnants on media
- It must be protected so data will not spread to external systems and sites
Data deletion is a real problem for most IT shops. I’m just getting my head around the ramifications, and continue to look for an ideal deletion-friendly storage solution.
If you’re interested in the topic of data deletion, I recommend joining me for a webinar on the topic on Wednesday, April 13. Sponsored by Nasuni, I will discuss the dilemma of deletion and CEO Andres Rodriguez will weigh in about the capabilities of his cloud storage solution. Register now!
Note: Nasuni is sponsoring this webinar, but the content was created by me. This blog post is intended to engage my audience in discussion of the subject, and is not a paid promotion or advertisement.
Image credit: “Delete” by blmurch
Ernst Lopes Cardozo says
I would love to see a business case for data deletion. My attempts showed that data deletion is 100 to 1000 times more expensive than not deleting it. The only data that can be delete economically is stuff that falls beyond the 10-year horizon and can be deleted without looking. But since we are on a exponential growth curve, 100% of the 10-year old data is only a few percent of what we have today.
Jason Boche says
Ernst makes a good point on exponential data growth and the diminishing returns on analyzing-to-death prior years data. I also think data has strength in numbers. My perception is that the larger data gets, the more it carries with it perhaps a false sense of importance because it represents irreplacable tangibles such as time. Microsoft Outlook personal folders (.PST files) are a great example of data which grows in a hurry due to the unintended use of a mailbox as a file server. The density of data should not be the sole weighted measurement of its value. That said, the push to archive or delete data scares the hell out of business units and is an uphill battle all the way. When I was in the thick of it, the BUs were usually inclined to throw hardware at the problem meaning more storage because in the grand scheme of things, the perception is storage is cheaper. That will scale for quite a while on the front end but the back end where we need windows of time to back up that same volume of data gets tricky without more scalable solutions. Data growth is out of control in my opinion. I don’t think deletion is the answer. I honestly don’t know what the answer is other than in 60 years or less I won’t have to deal with it.
Bill Hill says
I agree with both Jason and Ernst. I believe the part of the data archival and deletion processes that scares the hell out of the business units may be that data is so old that no one feels that they can adequately comment/authorize on deletion. The data is incorrect or irrelevant, but a CYA mentality does not allow them to bite the bullet.
Your comment on legal issues totally hits the mark and was my first focus when I started reading the post. Posted data retention policies and business units understanding legal requirements are a must for determining what to delete. IT departments can only really operate on factual information… date range, file size, file format, specific locations, etc… Unless IT is truely integrated into the business units (which would be hard to believe unless a very small business), IT cannot and should not be placed into a position to make those decisions. Legal ramifications and costs imposed by discovery, research, and penalties definitely outweigh the effort to delete the data.
End user education and empowerment would go a long way and is something we are working on internally. Drafts, temporary files, personal files on enterprise stores, and transitory documentation really do not need to exist beyond a fairly logical timeframe. Informing users of the need to cleanup after they have finished would help the cause… especially when that user has left the company.
Working on identification of data owners is a great step in determining what can be deleted. Empowerment and ownership goes a long way to making the change towards deletion.
Hopefully, I will be able to attend the webinar on Wednesday. Nice post!
Sean Regan says
You nailed it. Deletion needs to start happening. This is something we have been talking to customers about for the past year or so. They all believe in it. Few do it. The ones that do are seeing some incredible benefits on storage, back and E-Discovery costs. Once company in Arizona cut their storage 70%. Another in Manhattan recently deleted over 500m files and redeployed a massive amount of storage even while dealing with all sorts of regulations and discovery events. How? They delete confidently, they delete by default, not randomly.
Delete confidently is the call to action. Why? Because deletion is not happening and when it does, it feels like risky business. Information is simply kept with no real consideration of how long it is to be kept or why. This results in infinite retention. Infinite retention is INFINITE WASTE even if you throw dedupe, compression and tiered storage at it.
Everyone needs bigger inboxes and larger archives. But the problem is that their IT departments haven’t got the money to buy more storage. It reminds me of the Miller Lite commercials from the 1980’s. Less Filling! Tastes Great! But, instead of beer the argument goes something like this.
Less Storage! Less Backup!
The financial crisis and tighter IT budgets didn’t help settle this dispute. Between 2007 and 2009, the part of IT budget spent on storage has jumped up from 7 per cent to 17 percent. Blah, blah, blah, we have heard it all before. But, our behavior as IT and end users still doesn’t change. So, at a time when IT budgets are flat or negative, and storage budget as a component of the overall IT budget keeps burning a bigger hole, IT departments have to live the cliché of our times: do more with less.
Guess that I-Pad deployment can wait… IT is blowing IT’s budget on more storage.
If the definition of insanity is doing the same thing over and over and expecting different results then the datacenter is sometimes the home court of insanity.
What if we tried this: Delete and manage
Delete your backup tapes after 30 days. Deploy archiving technology and put a real retention and deletion policy in play. Without this, you are keeping everything in a PST or on the server or on the tapes, forever. In that situation you are building costs and frustration into your backup policies. And, you are piling up data that may eventually cost 1,500-3,000 times more money to review in E-Discovery than it did to store it.
The other option is to put restrictions on how much storage your users can have. That’s not a very good option. Start archiving, then managing and as a default policy, deleting your content. Your retention could be for 3 years, 5 years, 25 years, etc. But, please don’t make it infinity. Your IT budget can’t afford it.
Why is it that when it comes to paper, we are concerned about Green IT, but when it comes to storage and the datacenter, it seems organizations just don’t care.
It is not the case that enterprises or their employees don’t want to improve the situation. We ran a survey of 1,700 IT pros and almost 90% of respondents said that they wanted to be allowed to delete email but three quarters of them were not actually doing it (as part of their back up plan).
Companies save information indefinitely because they fear deleting information that may be important to the business or may be required as part of a future e-discovery request. As this data piles up information becomes harder to find and the costs of storing and searching for that information rise.
The result is pain: in terms of discovery (where’s that email I have been looking for?) and bumped up storage costs.
The world has completely misused back up. We have broken the backup window by trying to backup everything we have ever created. I have been to companies that keep over 800,000 tapes on site. Thirty days of back up should be enough. 800,000 is crazy. Dedupe can help. But, dedupe is just a short-term fix when the underlying problem, over retention continues.
Discovery is also part of what broke backup and put the fear of deletion in the datacenter. But, with a good information management strategy, the law will set you free and keep you out of deposition chairs. Protect the data that is relevant or might be relevant to a case and the rest is up to you. For those in regulated industries it is even easier because you are told exactly what you need to keep.
The cloud is often seen as the great hope, the great reset of the datacenter. It is the place where we can start all over and do it right. Yes. But……
Please don’t smog the cloud! As the world starts to leverage cloud storage and services lets also get religion around information management. Doing so will help prevent the pain of infinite retention, and it might actually help IT free up the money they need to roll out the I-Pads the execs want or the videoconferencing project that has been on hold due to storage spend.
Sean Regan says
Lets flip the math for the fun of statistics….
1 GB can hold 100k unique email
Your average attorney on coffee and redbull can review 250? 500? 1000? items an hour.
If they charge $100, $300, $500 an hour how long does it take them to review your PST in the event of a case?
I think Amazon will store a GB of email for around 17 cents a month…
Discovery doesn’t happen to everyone, everyday and manual deletion is a painfully slow process, especially for legacy data. But, building retention and deletion into your backup infrastructure can help immediately and with no cost to acquire the tools to do it. Then look at archiving to do the same with unstructured content.
Mark Olsen says
I recently worked for a company that allowed employees to keep everything forever. That changed when they lost two law suits totalling $20 Million in legal fees, penalties, and brand damage — all because of 2 E-mails that had been kept long past their business value. Expiring non-business records after their useful life makes complete business sense.
The default IT stance seems to be “eternal retention” while the default for legal is “delete quickly”. We need to get these two together tom come up with a workable balance between retention and deletion.
Thanks for the comment!
Howard Marks says
The problem is there are three groups with three different views of the problem and incentives:
IT knows nothing about data value, has no authority to delete, and will be blamed if something’s missing. Keep forever means bigger budgets but no blame.
Legal believes in reducing liability but doesn’t understand the business cost there of. Delete immediately eliminates liability, but also eliminates evidence needed in defense. Legal would rather nothing was ever written down that they didn’t review first.
Biz balances risk/reward. Legal just looks at risk regardless of reward. Biz has to figure out retention but doesn’t wanna, deleting email takes time you know, after all the cost is IT’s budget not the biz unit.
Now I’m going to have to write my own post. Darn you Foskett.
This comment from @SeanJRegan is a blog post in disguise! Thanks so much for weighing in! Now I have to consider your points… 🙂