Many storage challenges focus on correlating high-level uses of data (such as applications) with the nuts and bolts of storage infrastructure. These discussions often revolve around the conflict between data management, which demands an ever-smaller unit of management, and storage management, which benefits most from consolidation. Developing data management capability that is both granular enough for applications and scalable enough for storage is one key to the future of storage.
Storage Management: Scaling Up
As I discussed in a previous Sunday Series piece, Turning the Page on RAID, the data storage industry has traditionally focused on reducing granularity. Disk capacity has expanded, and RAID technology has multiplied this by combining multiple physical drive mechanisms into a single virtual one. Storage virtualization technologies, from the SAN to the server, have also often been touted primarily as a mechanism to reduce heterogeneity. From a technical perspective, therefore, granularity has been an obstacle to overcome.
The core organizational best practice for storage management is the reduction of complexity and the enforcement of standardization. Consolidation of storage arrays and file servers is a common goal, as IT seeks to benefit from economies of scale. The goal of both initiatives is the creation of a storage utility or managed storage service. This mirrors efforts on the server and network sides to consolidate and virtualize hardware.
Although both technological and organizational factors have traditionally driven granularity out of storage, this does not have to be the case. Virtual pools of storage are ideal for providing storage on demand, as disk-focused RAID groups give way to more flexible sub-disk storage arrangements. And an operational focus on standardized storage service offerings has the potential to enable scalable management of these smaller units.
File-based protocols would seem to have more potential for granular storage management, but they have been undermined by the hierarchical nature of modern file storage. Whether the connection to a file server uses NFS, CIFS, or AFP, the key unit of management is actually the shared directory, not the file. All files in the share \\firefly\backups would be located on the same server and would be managed as a unit.
NAS virtualization can change this somewhat, as can more specialized NAS servers. Although Microsoft DFS enables consolidation and virtualization of NAS shares, it does not allow subdivision of shares below the directory level – all files in a directory must be placed on the same server. Tricks like stubbing and links allow for some movement, but these do not solve the core issue. Specialized virtual NAS devices from F5 (the ARX, nee Acopia), NetApp, BlueArc, Symantec, and others have the ability to move files individually, providing as much a virtualized storage environment as any block-focused enterprise array. Avere is also beginning to talk about granular file management.
But even an ideal virtualized file server lacks the kind of granularity demanded by users. They care about data, not files, and most applications consolidate their data storage into a few files. Consider a database, for example, where users want each record treated uniquely but storage devices see just a few much larger files. We need a storage revolution, where someone creates an ideal storage platform in which each individual record or object includes custom metadata and is managed independently. This would truly be a massive change, however, and it is not clear that all applications will follow the object storage model of Google and Amazon.
Small is Beautiful
Barring a revolution in data management, our best hope is to allow greater granularity in storage management. As mentioned above, virtualization technology has the potential to enable management and protection of any unit of storage, right down to the individual block or record. But the reality of storage virtualization has not matched its promise.
What is needed is greater integration. Each layer of virtualization (file system, volume manager, hypervisor, network, array, and RAID) also hides necessary details from lower layers. Consider the case of a virtual server snapshot: The application and filesystem must be in a quiesced state to allow a snapshot to be taken at the storage level, but the storage array has no intrinsic information about how its capacity is used. A given LUN might contain dozens of servers on a shared VMFS volume, so all must be snapped together.
Integration can be enabled by sharing more information through APIs. VMware leverages Microsoft Volume Shadow Copy Service (VSS) integration for shared storage so a VMFS snapshot can call the operating system and even applications (Windows Server 2003 only, for now) to prepare the data. Similarly, VSS can communicate directly with supported iSCSI and Fibre Channel arrays, calling a snapshot at the right moment. And Microsoft is, no doubt, enhancing VSS as we speak.
As virtualization technology matures, expect this type of integration to improve. We hope to see more APIs exposed by VMware and Microsoft, allowing communication up and down the stack to break through the information barrier. Imagine a future where a standard API like VSS can pass a message through VMware, Xen, and Hyper-V to the underlying storage array to initiate a snap. I predict that this kind of integration-enabled granularity is not too far off.