DataGravity just released their embargo and my little techie corner of the Internet is on fire. There’s a very good reason for that, but it might not be obvious at a glance. Read on to learn why DataGravity is a Big Deal even though it might not work out.
Let’s start with the hardware: DataGravity just introduced a pretty ordinary storage array aimed at the fat middle of the datacenter market. Sure, they call it “state-of-the-art primary storage” and it ticks the current checkboxes (unified access, flash-optimized, hybrid architecture, in-line data reduction), but that’s nothing you couldn’t get from lots of other companies. And those other companies are already established, with proven code and sales/support teams on the ground.
The next part of the DataGravity pitch is continuous data protection thanks to integrated snapshots. The array tracks changes and can roll back to a previous point in time, something like Apple’s Time Machine. This aspect reminds me of the old “no more backups” pitch I rejected from companies like NetApp and Nimble Storage. Like Actifio, another New England company, DataGravity is pitching their change tracking for disaster recovery and even eDiscovery activities. Yet the DataGravity solution is integrated with their own array rather than being a separate component like Actifio.
So why is DataGravity such a big deal?
Put simply, DataGravity realized that storage arrays store data, data can yield information, and information is incredibly valuable!
Storage systems (and storage admins) have always avoided getting their hands dirty with content. “We just store the data” they will say. “It’s the application owner’s responsibility to make use of it.” This politically-expedient answer was helped by the fact that block storage protocols limited the visibility of the data being stored. Rather than storing files and folders (which is what users and operating systems see), SCSI stores uniform blocks.
This is what makes DataGravity something special. It’s also their biggest go-to-market risk. DataGravity is violating the traditional enterprise storage firewall and actually looking into the data being stored. The array actively monitors reads and writes, maintaining continual snapshots of the system over time. The company cleverly leverages the fact that the dual-active array has CPU and I/O power to spare, spending these resources analyzing data rather than just storing it.
The DataGravity system will identify LUNs, crack open logical volumes, access filesystems, and even read file content. This is technically very challenging, given the vast combination of partition formats, filesystems, and such in the wild. So the company will only support certain popular combinations initially and will have to work to maintain compatibility as operating systems evolve. Likely they’ll never support some more challenging formats.
For what it’s worth, Drobo is the only other storage system I know of that actively peeks into volumes and filesystems, but they don’t offer anything even resembling the data services DataGravity is promising. Yet the Drobo experience is telling: They don’t support resizing volumes, for example, or the ZFS or ReFS filesystems. Of course, DataGravity and Drobo are in vastly different markets, and the fact that this function is the critical differentiator for their product demands broader support for enterprise systems. But the comparison shows how difficult and unusual the DataGravity approach is.
So DataGravity quite literally sees the data being stored, and will make use of this access in increasingly aggressive and creative ways. Initially, DataGravity is maintaining a searchable index of array content, including both metadata (file name, ownership, size, and modification time) and full file content. That’s right: Right from their user interface, you can search on not just the filesystem parameters but also the text contained inside a file!
It’s like someone mashed up a storage array, a CDP appliance, an SRM application, and an eDiscovery tool all in one. Using their bright and friendly interface, an administrator can show filesystem changes, search for inappropriate content, roll back a file version, or preserve a point-in-time snapshot. All of this is granular down to the file level, even if those files happen to be stored inside a VM guest’s virtual disk.
This product shows a lot of promise, to say the least. It could literally revolutionize what we think of as “storage”, bringing a new era of data management and a new relationship between IT and the business. But this is also the biggest risk for DataGravity.
IT is siloed for technical and practical reasons, of course, but there are also political considerations. DataGravity isn’t the first company to try to bring full-text search to the IT market, but all those other products failed. We’ve seen storage companies like EMC, IBM, and HDS stumble in this space. One reason was finding the money – how does IT justify spending big bucks on storage resource management (SRM) tools without support from corporate general counsel or HR? Another was a belief among IT people that when it comes to being on the hook for what users are doing, sometimes it’s better not to know.
Of course, products like this already exist outside the storage sector. A thriving industry exists in eDiscovery solutions, and these companies already have mind-bending search capabilities. From their perspective, the initial DataGravity solution is pretty pathetic when compared to eDiscovery powerhouses like HP Autonomy, Kroll Ontrack, Symantec, Guidance, and AccessData. And those companies all have a long history of selling to legal and HR rather than IT.
Still, the DataGravity solution is an impressive achievement in a storage array. The company has a great team working on the product, and it’s well-differentiated in the crowded market of enterprise storage challengers. I wouldn’t be at all surprised to see DataGravity rapidly grow and take a seat alongside successful EMC/NetApp alternatives like Nimble Storage, Tintri, and Pure Storage in the next year or two.
Impressively, DataGravity promises that all functions are included at one price. Many on the DataGravity team come from EqualLogic, a company for which the same all-inclusive strategy became a marvelous differentiator. And it’s priced competitively, at about $2,500 per terabyte.
Update: Here are some links for more info!
Stephen’s Stance
DataGravity is coming to market with a mainstream product differentiated by unique features at a reasonable price. Although similar data management technology has existed for a long time, DataGravity is bringing it to the IT infrastructure market at no additional cost. The questions are simple: Will IT want a new array with these capabilities? And will DataGravity have the resources to mature their initial product to compete with “real” e-discovery solutions?
To learn more about DataGravity, tune in to their presentation at Tech Field Day Extra at VMworld. It will be broadcast live Monday, August 25 at 1 PM Pacific time!
Disclaimer: DataGravity is a Tech Field Day sponsor and I worked with them to organize a presentation at VMworld and pre-briefing for some of the delegate panel, including myself.
David Siles says
Stephen thanks for the coverage and we look forward to seeing you at Tech Field Day.
Mike Riley says
Simple question: Will IT want a new array with these capabilities? Simple answer: no. It’s not differentiated. As you point out, if customers want it, they already have options that provide superior functionality and doesn’t require array qualification (security scans, benchmarking, full function testing); application integration; backup integration; and a data migration. All of that amounts to risk for zero differentiation (unlike Nimble, Tintri, Pure).
JohnFul says
hmm. There may be a problem or two here with the entire concept. If I use metadata about data to create maps of relationships and then reward certain users based on those relationships, that might work in what we would term a “functional” organization. What about a dysfunctional organization a la Enron? So the few bad apples would quickly lead the lemmings over the cliff. The relationship mapping doesn’t distinguish between positive or lawful behaviors and those that are negative or unlawful. What’s missing here is a “lemming switch”.
Then again, in a post Snowden world, do organizations really want to keep all that metadata around.. subject to snooping and/or interception by the NSA?
I think this is an idea that, given the current state of affairs, will only have a negative impact if any. Lots of liability, no upside.
J
Mark May says
While Legal and HR teams have been doing this type of discovery for longer than I can remember, IT usually avoids things like this. However, imagine giving your hadoop platform access to all this data and metadata without having to copy it from the server to the hadoop environment. That could provide some excellent value to the data analysts in IT organizations.
Greg W. Stuart says
As always Stephen, good write up, keep at it!