“There are two kinds of people in this world: those who believe there are two kinds of people in this world, and those who don’t.”
This week, EMC’s Chief Development Officer, Mark Lewis, posted a thoughtful blog “episode” supposing that there are two kinds of data in this world: OLTP and all of the rest. It’s an interesting idea, so let’s bat it around a little.
Mark claims that, as technology allows data to be better structured and classified, the old distinction between structured and unstructured will disappear. Surely, many types of unstructured data are gaining tagging and searchability as they move into a networked world. E-mail is clearly semi-structured at least, and on-line document repositories like SharePoint and Google Docs have followed in the footsteps of Lotus Notes and others to bring structure to the file server. I know of lots of businesses with historical applications that are transitioning to structured, tiered, and archivable formats, as well. So we’re certainly moving in that direction, but I think it’s too soon to say that unstructured data is at an end.
I doubt that Mr. Lewis believes that all data is structured, either. His job is to make sure that EMC does not become irrelevant like so many other big Massachusetts technology companies, so he’s certainly trying to get in front of the market. As data gains structure, storage products that exploit it will undoubtedly be in demand. So EMC is wise to work on XAM and to purchase XHive. But I doubt that the bulk of their revenues will come from storage systems integrated with data structures anytime soon.
The core of Mr. Lewis’s discussion revolves around classifying data according to performance, and latency in particular: his OLTP type needs each transaction to flow quickly, while his “web” type is characterized by other attributes. This metadata would be communicated to the storage system through some structured mechanism like XML.
Undoubtedly, latency is one way to divide up the world of data. But is this single element, the low latency requirements of OLTP applications, truly a valid way to characterize the entire wide world of data? It seems to me an excellent way to isolate one data type, but I believe that there are many others which also need isolation, examination, and tuned storage services. I understand the argument that OLTP systems need exceptional storage, but I don’t feel that this distinction suggested by Mr. Lewis is the correct way to split enterprise data.
Me? I believe there are lots more than two kinds of people in this world.