DataGravity just released their embargo and my little techie corner of the Internet is on fire. There’s a very good reason for that, but it might not be obvious at a glance. Read on to learn why DataGravity is a Big Deal even though it might not work out.
Let’s start with the hardware: DataGravity just introduced a pretty ordinary storage array aimed at the fat middle of the datacenter market. Sure, they call it “state-of-the-art primary storage” and it ticks the current checkboxes (unified access, flash-optimized, hybrid architecture, in-line data reduction), but that’s nothing you couldn’t get from lots of other companies. And those other companies are already established, with proven code and sales/support teams on the ground.
The next part of the DataGravity pitch is continuous data protection thanks to integrated snapshots. The array tracks changes and can roll back to a previous point in time, something like Apple’s Time Machine. This aspect reminds me of the old “no more backups” pitch I rejected from companies like NetApp and Nimble Storage. Like Actifio, another New England company, DataGravity is pitching their change tracking for disaster recovery and even eDiscovery activities. Yet the DataGravity solution is integrated with their own array rather than being a separate component like Actifio.
So why is DataGravity such a big deal?
Put simply, DataGravity realized that storage arrays store data, data can yield information, and information is incredibly valuable!
Storage systems (and storage admins) have always avoided getting their hands dirty with content. “We just store the data” they will say. “It’s the application owner’s responsibility to make use of it.” This politically-expedient answer was helped by the fact that block storage protocols limited the visibility of the data being stored. Rather than storing files and folders (which is what users and operating systems see), SCSI stores uniform blocks.
This is what makes DataGravity something special. It’s also their biggest go-to-market risk. DataGravity is violating the traditional enterprise storage firewall and actually looking into the data being stored. The array actively monitors reads and writes, maintaining continual snapshots of the system over time. The company cleverly leverages the fact that the dual-active array has CPU and I/O power to spare, spending these resources analyzing data rather than just storing it.
The DataGravity system will identify LUNs, crack open logical volumes, access filesystems, and even read file content. This is technically very challenging, given the vast combination of partition formats, filesystems, and such in the wild. So the company will only support certain popular combinations initially and will have to work to maintain compatibility as operating systems evolve. Likely they’ll never support some more challenging formats.
For what it’s worth, Drobo is the only other storage system I know of that actively peeks into volumes and filesystems, but they don’t offer anything even resembling the data services DataGravity is promising. Yet the Drobo experience is telling: They don’t support resizing volumes, for example, or the ZFS or ReFS filesystems. Of course, DataGravity and Drobo are in vastly different markets, and the fact that this function is the critical differentiator for their product demands broader support for enterprise systems. But the comparison shows how difficult and unusual the DataGravity approach is.
So DataGravity quite literally sees the data being stored, and will make use of this access in increasingly aggressive and creative ways. Initially, DataGravity is maintaining a searchable index of array content, including both metadata (file name, ownership, size, and modification time) and full file content. That’s right: Right from their user interface, you can search on not just the filesystem parameters but also the text contained inside a file!
It’s like someone mashed up a storage array, a CDP appliance, an SRM application, and an eDiscovery tool all in one. Using their bright and friendly interface, an administrator can show filesystem changes, search for inappropriate content, roll back a file version, or preserve a point-in-time snapshot. All of this is granular down to the file level, even if those files happen to be stored inside a VM guest’s virtual disk.
This product shows a lot of promise, to say the least. It could literally revolutionize what we think of as “storage”, bringing a new era of data management and a new relationship between IT and the business. But this is also the biggest risk for DataGravity.
IT is siloed for technical and practical reasons, of course, but there are also political considerations. DataGravity isn’t the first company to try to bring full-text search to the IT market, but all those other products failed. We’ve seen storage companies like EMC, IBM, and HDS stumble in this space. One reason was finding the money – how does IT justify spending big bucks on storage resource management (SRM) tools without support from corporate general counsel or HR? Another was a belief among IT people that when it comes to being on the hook for what users are doing, sometimes it’s better not to know.
Of course, products like this already exist outside the storage sector. A thriving industry exists in eDiscovery solutions, and these companies already have mind-bending search capabilities. From their perspective, the initial DataGravity solution is pretty pathetic when compared to eDiscovery powerhouses like HP Autonomy, Kroll Ontrack, Symantec, Guidance, and AccessData. And those companies all have a long history of selling to legal and HR rather than IT.
Still, the DataGravity solution is an impressive achievement in a storage array. The company has a great team working on the product, and it’s well-differentiated in the crowded market of enterprise storage challengers. I wouldn’t be at all surprised to see DataGravity rapidly grow and take a seat alongside successful EMC/NetApp alternatives like Nimble Storage, Tintri, and Pure Storage in the next year or two.
Impressively, DataGravity promises that all functions are included at one price. Many on the DataGravity team come from EqualLogic, a company for which the same all-inclusive strategy became a marvelous differentiator. And it’s priced competitively, at about $2,500 per terabyte.
Update: Here are some links for more info!
DataGravity is coming to market with a mainstream product differentiated by unique features at a reasonable price. Although similar data management technology has existed for a long time, DataGravity is bringing it to the IT infrastructure market at no additional cost. The questions are simple: Will IT want a new array with these capabilities? And will DataGravity have the resources to mature their initial product to compete with “real” e-discovery solutions?
To learn more about DataGravity, tune in to their presentation at Tech Field Day Extra at VMworld. It will be broadcast live Monday, August 25 at 1 PM Pacific time!
Disclaimer: DataGravity is a Tech Field Day sponsor and I worked with them to organize a presentation at VMworld and pre-briefing for some of the delegate panel, including myself.