September 23, 2014

Storage Changes in VMware vSphere 5.1

VMware storage features Series

As I have done since version 3.5, I’m charting the storage changes in VMware’s latest release of vSphere, 5.1. Detailed at VMworld but not released as of this writing, vSphere 5.1 is lauded for eliminating the “vTax” and bringing features like replication, a revised backup application, and shared-nothing vMotion. Unlike version 5, which included many new technical storage features, 5.1 mainly tweaks existing features and adds these new elements to the mix.

For more information on earlier updates, see my articles:

Although it’s tempting to just refer to VMware’s whitepaper, What’s New in VMware vSphere 5.1 – Storage, that doesn’t tell the whole story. I think of storage in broader terms than VMware, apparently. Also see Cormac Hogan’s post, vSphere 5.1 New Storage Features.

VMware has packed vSphere 5.1 with features, but I’m here to talk storage!

Advances Features for Standard Users

One of the most obvious changes in vSphere 5.1 will be the elimination of the hated “vTax” licensing model. vSphere reverts to per-socket licensing rather than setting RAM tiers, though the free version still has a 32 GB limit. That’s not a storage change, but it’s one less controversy to cover at my seminars!

But there’s a real story in the licensing of vSphere 5.1, and that’s all about improved availability of storage features! VMware is increasing quality and availability of features at lower licensing levels, and I heartily approve.

  • VMware vSphere Replication is an all-new any-to-any software replication feature and is included in all versions from Essentials Plus, though larger sites might find little use for it
  • Storage vMotion has been moved from Enterprise to Standard
  • The new “Enhanced vMotion” (moving both storage and system state at once) is also a Standard feature
  • The VMware vSphere Storage Appliance (VSA) remains licensed as before (Essentials Plus and above), but new capabilities make it more interesting in larger environments
  • VMware replaced the unloved VDR with a new Avamar-based vSphere Data Protection (VDP) product, and this is included in all versions from Essentials Plus

In short, Standard gets a bit more compelling with Storage vMotion and Enhanced vMotion, and Essentials Plus is a lot better thanks to bundled replication and a better data protection offering. Many advanced features remain limited to Enterprise or Enterprise Plus, however. VAAI, Multipathing, and DRS are Enterprise-level, while Storage DRS, profile-driven storage, and Storage and Network I/O Control are Enterprise Plus only.

vMotion Here, There, and Everywhere!

Perhaps the best new storage-related capability in vSphere 5.1 is a raft of adjustments to vMotion, which I’ve long said is the killer feature for vSphere in the enterprise. Simply put, vMotion enables just about everything cool in vSphere, and now it’s enhanced to be even more accessible.

Here’s a quick rundown of the vMotion-y awesomeness in vSphere 5.1:

  1. Storage vMotion is now included in the Standard edition of vSphere, making it much more affordable
  2. Enhanced vMotion enables movement of all VM guest components at once, thus eliminating the requirement of shared storage for vMotion
  3. Storage vMotions can run in parallel, with up to 4 movements at once

Let’s take it from the top. Although “server” vMotion has long been a Standard (and even Essentials Plus) feature, Storage vMotion was limited to Enterprise licenses. This made some sense considering that only large shops would have a sufficiently diverse storage environment to warrant the feature. But even the little guys are getting more and more complex, so it makes sense to expand availability of Storage vMotion as well. But I suppose it’s too much to ask to bring it to Essentials Plus!

The big news is Enhanced vMotion, which live-moves both storage and system state (memory and CPU) in one operation. Enhanced vMotion opens a world of flexibility, allowing running virtual machines to me moved about at will regardless of storage configuration. And, although it’s only mentioned in passing in VMware’s documentation, this eliminates the requirement of shared storage (SAN or NAS) for vMotion!

Then there’s parallel Storage vMotion. vSphere 5.1 allows up to four simultaneous Storage vMotions, as long as the source and target datastores are unique. But limits remain, including just 2 Storage vMotions and 8 active vMotions per host. And these vMotion limits affect Enhanced vMotion, too, so one can imagine running afoul of one or the other in an active environment.

Enhanced vMotion and parallel operation brings VMware up to par with shared-nothing Live Migration in Microsoft Hyper-V 3.0. So that’s good. But Hyper-V supports 8 parallel migrations to VMware’s 4, and they support DAS-to-DAS Live Migration too. I don’t think this makes much difference “on the ground” but Microsoft still has slight bragging rights.

Any-to-Any vSphere Replication and Improved SRM

VMware doesn’t include SRM and Replication in their “storage features” whitepaper, but this is my blog so I’ll add it here!

In vSphere 5.1, anyone with Essentials Plus or above can move a virtual machine to another location, whether inside a cluster or between multiple clusters. This is the same technology found in Site Recovery Manager (SRM) 5.0 and above, but it’s now open for all. VMware is the only virtualization platform to include replication at the hypervisor level. So far so good!

vSphere Replication uses a Changed-Block Tracking (CBT) mechanism to set up and track replications. This is similar to the technology used for data protection since vSphere 4. Although many people soured on Storage vMotion in vSphere 4 because of performance impacts and never-ending vMotions, I have been assured this will work better. We shall see!

SRM 5.0 users complained about many limitations that might also affect Replication in other use cases. In 5.0, it was a one-way street, moving data from here to there but not offering any other options. But now there’s an automatic re-protect/failback option in SRM!

The minimum RPO is 15 minutes, so vSphere Replication is not a “zero data loss” solution. And the maximum RPO is 24 hours and there is no snapshot support, so you’re not going to use this as a data protection solution. The Replication engine meets the set RPO but does not work on a strict schedule. And vSphere Replication only works with powered-on VMs and not with FT, linked clones, or templates. There’s also no encryption or traffic shaping, though of course there are lots of great third-party WAN tools.

One improvement is Microsoft VSS integration in vSphere 5.1. Users of vSphere Replication in 5.1 can quiesce any Windows applications that “understand” VSS rather than just slurping the data over in the background. This extends to SRM as well! And you can now force recovery even if there are  timeouts and errors.

It’s important to know that vSphere Replication is a technology, not really a solution. It doesn’t include all the extra elements of popular third-party offerings from EMC, HDS, NetApp, Zerto, and others. SRM is the full replication solution from VMware, but it requires a separate license.

Bigger VMware vSphere Storage Appliance (VSA)

VMware’s vSphere Storage Appliance (VSA) is a great solution for smaller shops that need shared-storage features but don’t want to buy a shared storage device. It works, it’s supported, and it’s reasonably priced for Essentials Plus users. But the VMware VSA was definitely not a compelling alternative to a real SAN or NAS device, or even a Virtual Storage Appliance (the thing people called “VSA” before VMware’s product) from some other storage vendor.

VMware’s VSA has been enhanced in vSphere 5.1 to make it bigger and tougher, but it’s still no match for a real storage array. Although RAID 5/6 was added in January, I think most VMpeople overlooked this change. I know I did!

Here’s what’s new in VMware VSA 5.1:

  • Support for up to 12 internal disk drives plus up to 16 external drives per host (still limited to 8 if they’re 3 TB drives) for a theoretical total of 28 drives, up from 8
  • Users can dynamically allocate local storage to the VSA, allowing it to be added to existing hosts with running machines
  • A single vCenter Server instance can manage multiple VSA instances
  • The VSA and vCenter Server can be on multiple IP subnets, ideal for RO/BO situations

I like that the VMware VSA exists, and I like the direction it’s headed. I even like the core idea of it – it could have been just a cheesy Linux instance, but it’s got interesting clustering technology there. So it’s not a bad product, just not a competitor for a real storage array. You know, the kind you’d buy from VMware parent, EMC!

Enhanced Storage I/O Device Management

Now we get into some esoteric stuff, but please bear with me.

Coolest to a storage geek like me is the new “I/O Injector” built into SIOC. It generates synthetic I/O load to detect elements of the storage stack and tune vSphere. There are two benefits of the Injector:

  1. Datastore Correlation for Storage DRS
  2. Automated tuning of Storage I/O Control

Since VASA is a total flop, the injector can detect when two LUNs share the same spindles and direct Storage DRS to avoid placing conflicting heavy loads on them. This “Datastore Correlation” technology is clever, but might get confused by auto-tiering storage arrays, caching solutions, and the like. Still, it’s better than nothing.

One big gripe about Storage I/O Control (SIOC) was the default latency threshold of 30 ms. This was fine for conventional disk-based storage arrays, but totally inappropriate for modern solid state-backed systems. The Injector will hammer on these systems a bit to determine when throughput drops to 90% and use that as the latency threshold instead. Or, as usual, the administrator can enter something.

There’s also a “stats only” mode which collects data before SIOC is enabled. This reduces the “crunch time” when you enable SIOC. Excellent!

SANs are pretty reliable if built right, but sometimes things break. And when storage devices “disappear”, the VMkernel hostd can freak out and stop responding. This is really, really bad for things beyond storage.

“All Paths Down” (APD) is just as horrible as it sounds, but what if the storage array is only mostly dead? in versions prior to 5.1, VKernel hostd would continue to try to reconnect forever, using up all its worker threads in the process. Now it only retries for 140 seconds and work to determine whether or not a “permanent device loss” (PDL) situation had occurred. In that case, vSphere HA will actually try to come back online rather than locking up. Nice!

Some Other Stuff You Probably Don’t Care About

In the category of “things that don’t really matter to most people” are booting from software FCoE, 16 Gb FC support, iSCSI jumbo frames, and SSD monitoring.

Although there is considerable debate about the effectiveness of jumbo frames in iSCSI environments, vSphere 5.1 now supports it in all configurations. Probably the most interesting aspect of this feature is the documentation’s detour into the three types of iSCSI initiators (software, dependent hardware, and independent hardware).

If you happen to have an FCoE SAN but don’t have a hardware FCoE HBA, you can boot from FCoE using the Intel-derived software driver. This is a tiny corner case today, but it’ll be a requirement sometime. Maybe.

And although 16 Gb Fibre Channel is exceedingly rare as well, vSphere 5.1 supports these HBAs just fine, and will work with 16 Gb FC SANs whenever they appear on the market. For now, your 16 Gb HBA can talk to an array using paired 8 Gb links.

Then there is SMART SSD monitoring. A new daemon, smartd, queries directly-connected SSDs every 30 minutes, gathering their status. This allows VMware to “know” things like media wearout/lifetime and temperature. Vendors can even plug in their own SSD querying tools! But this information isn’t exposed in the GUI; the administrator has to use esxcli or the new smartinfo.sh script to display the output. And then they will likely ask themselves, “so what?” since none of this really predicts anything about SSD longevity.

Space-Efficient Sparse Virtual Disks (SE Sparse)

VMware has added another virtual disk type, and one that really grabbed my attention. The new Space-Efficient Sparse Virtual Disk (SE Sparse) leverages VMware Tools to detect deleted files in the guest filesystem, pass this information to the VMkernel using the SCSI UNMAP command, and reclaim this capacity using a nifty shuffle-to-the-end-then-truncate maneuver. But this is a View-only feature currently, so let’s not get too excited.

One thing I’ve often harped about is the inability of thin storage arrays to detect filesystem-level deletes and reclaim space. This leads to a gradual un-thinning of capacity over time unless some clever mechanism is used to clear out the capacity inside the filesystem. Lots of vendors use host agents to clear out unused space (case in point: NetApp thin reclamation) and this is pretty much what VMware is doing, too. Except that VMware already has a host agent running (VMware Tools) and can just add thin reclamation to that!

What is really clever about SE Sparse is the card tricks that happen on the back end. Essentially, VMkernel intercepts these SCSI UNMAPs and shuffles the blocks to move these to the end of the virtual disk image. It can then signal the underlying storage to reclaim this space in one of two ways:

  1. Block (SCSI) devices get another UNMAP command for the blocks at the end of the VMDK
  2. File (NFS) arrays get a TRUNCATE RPC call to lop off the end of the VMDK

Either way, the space is reclaimed all the way from the guest filesystem to the underlying storage. And that’s a good thing.

SE Sparse disks also get a tunable “grain size”, enabling applications to specify something other than the default 4 KB allocation unit. But this is not exposed in any known interface and will thus not be tunable by mere mortals. So this only matters if your name is VMware and you’re using SE Sparse disks for some other currently nonexistent application.

Storage Enhancements for vCloud and View

vSphere 5.1 also includes some other enhancements that sound cool until you look deeper. Although these may become relevant in the future, they’re not server features today.

  1. There are no new VAAI primitives, but NFS Fast File Clone (aka Linked Clone Offload) now supports vCloud Director as well as View.
  2. Sharing of read-only files in VMFS is now supported up to 32 hosts, dramatically enhancing the value of VMFS (block) storage in View and vCloud Director use cases.
  3. As mentioned earlier, SE Sparse disks are new for View, and they will likely make their way to vCloud Director and beyond soon.
  4. vCloud Director now uses Storage DRS and Profile-Driven Storage when placing linked clones.

Most of this is irrelevant for server workloads, but nowadays VMware has more than server workloads in mind!

Stephen’s Stance

Where vSphere 5 went all-out with awesome new storage capabilities, vSphere 5.1 mainly tweaks existing technology and adds some big-tent features. Replication and Data Protection will get the attention, but Enhanced vMotion and the I/O Injector ought to be the marquee feature to storage nerds like me. And everyone should cheer about the licensing changes. Then there’s all the View/vCloud Director stuff. All in all, a nice update.

  • http://twitter.com/millardjk Jim Millard

    I sat through the VSA deep dive session at VMworld (STO1521), and while the updates to the VSA are good and necessary, I’m still trying to figure out how this can possibly compete against something like the HP [Lefthand rebranded as] StorVirtual VSA.

  • http://blog.fosketts.net sfoskett

    VMware VSA can compete because it’s from VMware. It’s pre-packaged and runs on anything. Non-HP shops might be reluctant to look into the StorVirtual VSA, or might not even have heard of it. But the VMware VSA is everywhere!

    Honestly, it’s not a great virtual storage offering compared to HP, Nexenta, StorMagic, FalconStor and the rest. And VMware VSA isn’t that cheap either. But it’s great to have that capability available to customers who want it, even though most would be better off with a different solution.

  • Steven Santini

    You’re right Stephen. Annoyingly, VMwares VSA is everywhere and even I can admit that they are moving in the right direction with their VSA.

    I think its going to do well on the SMB front but I think that comes down to the bundling into essentials + and above. Most SMB’s wont be too eager to spend more on another SVA solution regardless of better solutions available to them.

    On the ROBO front though, I don’t see them being able to meet the needs here as although they have made some improvements to management of multiple sites, there are still some key check boxes they’ve missed.

    Great post by the way, better explanation and detail than the What’s New in VMware vSphere 5.1 pdf

  • http://twitter.com/christiankelly Christian Kelly

    Great post! Here is one note that I picked up from a VMWorld
    session.

    vSphere replication doesn’t use CBT. It uses a new technology
    (not an open API unfortunately and the name escapes me at the moment) and
    unlike CBT doesn’t use snapshots at all. This is nice as many replication
    products based on CBT hold the running VM in a snapshotted state for the full
    duration of the replication which can be a long time if you have a slow link.

  • Pingback: Welcome to vSphere-land! » Top 10 things you must read about vSphere 5.1

  • Kyle Weir

    As an FYI 3par Arrays, do thin provision and stay thin on all disks, regardless of OS, as it doesn’t write empty/blank blocks. And actively removes them on the SAN itself. It does have tie ins to VMware but these are for vaai, 3par peer persistence, and a few other things. But the thinning part isn’t, I looked at Netapp, as well as several other vendors none of the other ones do it on the SAN, they just reclaim what VMware tells them to.