EMC’s Chuck Hollis is one smart guy, and a very verbose blogger. As usual, he sparked a bit of a storm recently when comparing unified storage on EMC’s Celerra NX4 to NetApp’s multiprotocol FAS2020 filer. But it was one phrase in particular that got the attention of Alex McDonald and Kostadis Russos of NetApp, Martin/Storagebod, and Tony Asaro: “just because a vendor says they can emulate FC SAN behavior doesn’t mean it’s a real FC SAN.”
What was he getting at? Read the comments in Chuck’s post and you’ll understand his implication: Chuck suggests that NetApp “emulates” Fibre Channel in their FAS/OnTap devices on top of the WAFL “file system”, while EMC’s AX storage (behind the NX4) uses “real” Fibre Channel, so it’s better. He goes on to say that EMC is doing a brisk business replacing misfit NetApp FC arrays with real FC kit from EMC. But, as is so often the case, the truth is a little more complex than this: All enterprise storage arrays “emulate” Fibre Channel drives to one extent or another, and using the wrong tool for the job will always lead to trouble.
Is It Real Or Is It Virtual?
Let’s knock one thing out right away: Chuck is right, NetApp is emulating Fibre Channel drives with their FAS arrays. They really do lay out chunks of storage on something kind of like a file system with a bunch of logic mixed in and then pretend that this space is a plain-jane SCSI drive connected over Fibre Channel. And I’ll add to the “scandal” by pointing out that NetApp does exactly the same thing with their iSCSI drives!
Now let’s move on to an even more important point: All modern storage arrays emulate SCSI drives! That’s right, every enterprise storage array is lying, pretending to serve up basic drives but really slicing and dicing them in the background for their own nefarious purposes!
Who is responsible for this deceit? I place the blame on a few: Patterson, Gibson, and Katz started the game with their so-called RAID concept, which kicked things off by allowing a few drives to pretend to be a single one. Data General implemented this with cache in their oh-so-clever HADA, further separating us from The True Disk. But the worst was EMC, with their fully-virtualized Symmetrix array, where there was no definite relationship at all between the LUNs presented to servers and the disks that do all the real work. Some folks would even go so far as to praise this type of post-RAID virtualized storage as innovative!
NetApp takes this “automated lying” to the extreme, forcing their innocent hardware to take honest, well-laid-out blocks of intelligent WAFL space and twist them into vast tracts of dumb pretend-disks. The nerve! Compellent, 3PAR, Dell/EqualLogic, and the rest are just as bad, scattering blocks of data willy-nilly across their disks in so-called “wide stripes“. But don’t let Chuck’s misdirection fool you: EMC is just as guilty with each of their different storage platforms, masquerading as disk drives or file servers and intelligently managing storage underneath! And don’t get me started on the twisted things VMware does to storage!
Modern? Feh! Let’s all hope Apple starts producing their no-feature Xserve RAID again!
Waiting On Angels
So every modern array emulates disks. What was Chuck’s point again? Oh yeah, that the AX Fibre Channel storage used by EMC’s NX4 is superior to the integrated Fibre Channel capability of the NetApp FAS2020! I’m sure he’s right for some use cases and wrong for others. FC on the FAS2020 is a perfect match for some, and the NX4/AX wins in a landslide in some circumstances.
The crux of the argument is the fact that NetApp does all sorts of stuff behind the scenes build and support an FC LUN that the EMC AX FC array doesn’t do. So, although it wouldn’t be fair to say that one was “emulated” and another was not, Chuck would be correct in saying that an FC LUN on an AX is more “real” than one on a NetApp FAS. But arguing over technicalities like this is all angels and pins and doesn’t matter in the real world!
What does matter? In block storage, latency is king. Generally speaking, more cogs and wheels leads to more latency. This is why storage arrays rely so much on large, intelligent caches and vendors are experimenting with all sorts of cool caching technology. But, ignoring cache, high-end arrays generally have worse latency than low-end ones because they have all sorts of translation and virtualization going on in the background. In any I/O situation, increased latency hurts throughput and the perception of performance. And there comes a point when block applications give up waiting and it’s “game over, man!”
I remember migrating from an old CLARiiON 3100 to a brand new Symmetrix 3930 and watching the Symmetrix choke on the incoming data stream. It just couldn’t write fast enough to handle full streaming reads from the (old-tech) CLARiiON. But once everything was migrated and running, the Symmetrix, with its massive (for the time) 16 GB of cache, widely-spaced data layout, and multiple internal channels, completely destroyed the CLARiiON in real-world performance. This pattern continues today, with devices like the DMX and USP offering much better real-world performance than benchmarks or theoretical techno-arguments would suggest.
So Which Is Better?
But Chuck and the rest were not talking about high-end stuff here. They are comparing the architecture of entry-level enterprise kit and drawing conclusions about which is best. I personally don’t care what the internals of the system look like. I care how well it works.
I have personally seen Microsoft Exchange running on low-end FC-connected NetApp FAS arrays, and it worked great. I also helped a customer migrate off of EMC AX that didn’t give them the performance they needed for their databases. In truth, lower-end gear is often over-sold and unable to deliver the performance, features, and reliability specified on data sheets and in vendor presentations.
That’s right, there’s more to this picture than raw performance. Consider manageability, for one. NetApp is offering a single-interface integrated system with all protocols (CIFS, NFS, iSCSI, and FC) available from one device. They also offer similar levels of integration for their (really nice) snapshot, replication, and deduplication technology. WAFL is busy doing a lot of great stuff, so I really wouldn’t be surprised if EMC’s less-integrated NX/AX offering beats them on performance at the same price point. Which is more important to you, integration, performance, or features? And I bet that, if you spent a bit more on a higher-end NetApp box, you could have it all.
On the flip side, EMC is offering a really compelling entry-enterprise combination at a nice price point. The latest NX should be on everyone’s NAS short list, and I’m sure the simple FC of the AX array would work well in a smallish Exchange, VMware, or SQL Server environment. It’s not as unified as NetApp’s offering management- or feature-wise, but it’s still pretty good.
Pick the right tool for the job, though. Neither the NX4 nor the FAS2020 is a good fit for a high-I/O application, and that’s a fact!
This post can also be found on Gestalt IT: Of Emulated Fibre Channel, Virtualization, And The Right Tool For The Job
Storagezilla says
Around NX4 integration what exactly is needed here? Does everything need to be mashed together or can it be integrated at the management layer?
Will or should a customer care if I select “Replicate this volume” from a properties list and Celerra Replicator does it or MirrorView does it depending on if it’s an FC attach or NAS volume?
Storagezilla says
Around NX4 integration what exactly is needed here? Does everything need to be mashed together or can it be integrated at the management layer?
Will or should a customer care if I select “Replicate this volume” from a properties list and Celerra Replicator does it or MirrorView does it depending on if it's an FC attach or NAS volume?
Marc Farley says
As you know, translation layers and the time it takes to do them can differ significantly and depend heavily on storage-system architecture. If you know how to read benchmarks, you can get some indication of what the virtualization latencies are. Storage performance benchmarks define 100% workload as the amount of throughput (I/Os or streaming) at some reasonable low latency level. You should look at the whole performance curve to understand the latencies given for various throughput measurements. For instance, Netapp’s SPC data tends to indicate competitive latency levels at low to mid workloads and much higher latency levels at high workloads. Industry insiders suspect EMC doesn’t publish SPC benchmarks because they would indicate a relatively sad tale of latency – despite their claims about benchmark fairness and the necessity of mixed workload benchmarks.
3parfarley says
As you know, translation layers and the time it takes to do them can differ significantly and depend heavily on storage-system architecture. If you know how to read benchmarks, you can get some indication of what the virtualization latencies are. Storage performance benchmarks define 100% workload as the amount of throughput (I/Os or streaming) at some reasonable low latency level. You should look at the whole performance curve to understand the latencies given for various throughput measurements. For instance, Netapp's SPC data tends to indicate competitive latency levels at low to mid workloads and much higher latency levels at high workloads. Industry insiders suspect EMC doesn't publish SPC benchmarks because they would indicate a relatively sad tale of latency – despite their claims about benchmark fairness and the necessity of mixed workload benchmarks.
sfoskett says
I would bet that integrated management would be enough for most people, since that’s what they interact with. I can imagine that, if replicating NFS uses totally different protocols than replicating a block volume, they might care about the underlying technology. But if you can make it “just work”, then why would they care about it unless they were techies arguing over whose is bigger?
I tend to be a realist. Whatever works (really works, not just sounds like it will) is always the best solution. And this is the point I was trying to make here: It doesn’t really matter how unified you are or how emulated you are – it only matters if it’s the right tool to get done what you need.
sfoskett says
I would bet that integrated management would be enough for most people, since that's what they interact with. I can imagine that, if replicating NFS uses totally different protocols than replicating a block volume, they might care about the underlying technology. But if you can make it “just work”, then why would they care about it unless they were techies arguing over whose is bigger?
I tend to be a realist. Whatever works (really works, not just sounds like it will) is always the best solution. And this is the point I was trying to make here: It doesn't really matter how unified you are or how emulated you are – it only matters if it's the right tool to get done what you need.
sfoskett says
Yes again, Marc! What’s the takeaway message? Regardless of the rightness or wrongness of a certain approach to solve a problem, the proof is in the pudding. Does it work? Does it continue to work as you use it?
I’ve seen some really BLAZING performance from EMC’s CX series, for example. And I’ve also seen it suck. It depends on how you configure it and what you are looking for!
sfoskett says
Yes again, Marc! What's the takeaway message? Regardless of the rightness or wrongness of a certain approach to solve a problem, the proof is in the pudding. Does it work? Does it continue to work as you use it?
I've seen some really BLAZING performance from EMC's CX series, for example. And I've also seen it suck. It depends on how you configure it and what you are looking for!
Chuck Hollis says
Hi Steve — great post, as usual. There’s a good reason why you’re on my blog rail!
If you don’t mind, a few clarifying statements?
You are quite correct that most FC and iSCSI arrays compose physical disks and allow users to slice them up into logical chunks. However, this can be done preserving the physical nature of the drives, if desired — for example, putting data on outer tracks to get more performance, or laying things out sequentially if needed, or building certain kinds of wide stripes.
My issue with NetApp and similar emulated approaches is that you lose this capability. The array decides how to lay out the data, and not a skilled user. In some cases, this is not a big deal — especially when the workloads are not demanding. But, when workloads are demanding, you’ve taken away an important tool from the storage admin’s bag of tricks. And this tends to show up in demanding FC environments.
Put differently, sometimes it’s OK to have a file system abstraction layer between you and your disks, and sometimes it’s not.
My second comment is that I disagree as to where this might be important. You state it’s all about overselling low-end kit. I tend to think it’s about having the right tools for the job at hand. Part of the problem is that statements like “Exchange ran OK” are meaningless without context, given the incredibly wide range of how people actually use these applications. I’ve seen people run Exchange inside a virtual machine from a desktop. It worked great — for what they were doing!
Finally, this argument extends well outside of the low-end, and into healthy-sized configurations of both NetApp’s FAS products, as well as EMC’s Celerra / CX product line, where perhaps the differences would be more noticeable.
Thanks again for a thoughtful post!
— Chuck
Chuck Hollis says
Hi Steve — great post, as usual. There's a good reason why you're on my blog rail!
If you don't mind, a few clarifying statements?
You are quite correct that most FC and iSCSI arrays compose physical disks and allow users to slice them up into logical chunks. However, this can be done preserving the physical nature of the drives, if desired — for example, putting data on outer tracks to get more performance, or laying things out sequentially if needed, or building certain kinds of wide stripes.
My issue with NetApp and similar emulated approaches is that you lose this capability. The array decides how to lay out the data, and not a skilled user. In some cases, this is not a big deal — especially when the workloads are not demanding. But, when workloads are demanding, you've taken away an important tool from the storage admin's bag of tricks. And this tends to show up in demanding FC environments.
Put differently, sometimes it's OK to have a file system abstraction layer between you and your disks, and sometimes it's not.
My second comment is that I disagree as to where this might be important. You state it's all about overselling low-end kit. I tend to think it's about having the right tools for the job at hand. Part of the problem is that statements like “Exchange ran OK” are meaningless without context, given the incredibly wide range of how people actually use these applications. I've seen people run Exchange inside a virtual machine from a desktop. It worked great — for what they were doing!
Finally, this argument extends well outside of the low-end, and into healthy-sized configurations of both NetApp's FAS products, as well as EMC's Celerra / CX product line, where perhaps the differences would be more noticeable.
Thanks again for a thoughtful post!
— Chuck
sfoskett says
I see your point now, Chuck – true, some arrays give lots of manual control over layout (like the EMC CX line) and others are totally automated (for example, EqualLogic). I’m not sure which is truly “better” – it depends on the aims of whoever is managing it, I suppose! This is a great topic for a future post!
I certainly agree that this is important at all levels, but the discussion I linked to was totally focused on the specific case of the NX/AX and lower-end NetApp gear, at least to my eyes. If we open up discussion to the whole of enterprise storage, I bet the discussion would be similar, though the differences might indeed be greater.
Thanks for reading and posting!
sfoskett says
I see your point now, Chuck – true, some arrays give lots of manual control over layout (like the EMC CX line) and others are totally automated (for example, EqualLogic). I'm not sure which is truly “better” – it depends on the aims of whoever is managing it, I suppose! This is a great topic for a future post!
I certainly agree that this is important at all levels, but the discussion I linked to was totally focused on the specific case of the NX/AX and lower-end NetApp gear, at least to my eyes. If we open up discussion to the whole of enterprise storage, I bet the discussion would be similar, though the differences might indeed be greater.
Thanks for reading and posting!
Storagezilla says
Industry Insiders need to stop whining about it and move on, it is now assured to never change. This year we saw that SPC is a marketing con job from an anti-EMC organisation. They confirmed that when they accepted bullshit numbers about EMC gear from a competitor.
Anyone else benchmarked the other guys tin yet?
I’ll make sure we never support that piece of shit or that organisation.
As for translation layers and so on, not my issue on CX. At it’s base the CX is cache and mathematical lookup. No file systems, not file systems or mutant duck/snake/spider animals. No worrying about where the metadata is or how it’s protected or what happens if that metadata goes away. You give me the address you wrote the data to I’ll perform a calculation then tell you the disk and sector the data is sitting on as well as point to the data integrity bits are right next to it.
Elegant, fast and not prone to bugs.
Storagezilla says
Industry Insiders need to stop whining about it and move on, it is now assured to never change. This year we saw that SPC is a marketing con job from an anti-EMC organisation. They confirmed that when they accepted bullshit numbers about EMC gear from a competitor.
Anyone else benchmarked the other guys tin yet?
I'll make sure we never support that piece of shit or that organisation.
As for translation layers and so on, not my issue on CX. At it's base the CX is cache and mathematical lookup. No file systems, not file systems or mutant duck/snake/spider animals. No worrying about where the metadata is or how it's protected or what happens if that metadata goes away. You give me the address you wrote the data to I'll perform a calculation then tell you the disk and sector the data is sitting on as well as point to the data integrity bits are right next to it.
Elegant, fast and not prone to bugs.
Marc Farley says
Showing a little teeth? I agree that the SPC should not have published Netapp’s Clariion numbers. That was an unfortunate stunt.
3parfarley says
Showing a little teeth? I agree that the SPC should not have published Netapp's Clariion numbers. That was an unfortunate stunt.
Storagezilla says
SPC brings down the red mist. I had no problem with it until they showed they had a problem with us.
I’m Irish. I’ll hate it forever.
Storagezilla says
SPC brings down the red mist. I had no problem with it until they showed they had a problem with us.
I'm Irish. I'll hate it forever.
Michael Shea says
Nice article Stephen. Disclaimer – I am a NetApp employee and a former employee of EMC. Been there, done that.
Chuck makes an interesting point below – the ability for an admin to choose where on a set of spindles his data will reside is a nice thing to have for the most demanding workloads. But these workloads are comparatively few. In fact, my personal experience with installed DMX and CLARiiON arrays bears out this one bit of real world experience –
If you are going to need dedicated IO’s that the outer tracks of a set of spindles will give you, then *most likely* the workload is high enough that the inner tracks will go unused because you cannot afford to throw any other application workload at the spindles.
Hence the cost is very high – but sometimes you do run into this need. But as a rule, it is not needed for the vast majority of applications… just a very few.
Any other conversation is pure FUD. Been there.
I will probably blog on the Low Art of FUD soon. Why ?? Because the IT community needs to know this – FUD wastes *your* time and energy – not mine. All vendors have a database of some sort full of razor sharp answers to any and all FUD. Especially the kind that Chuck tosses around. Vendor Olympics is about the customer tiring himself out – not exercising the vendor.
We do this for a living folks!
Cheers !
Mike Shea says
Nice article Stephen. Disclaimer – I am a NetApp employee and a former employee of EMC. Been there, done that.
Chuck makes an interesting point below – the ability for an admin to choose where on a set of spindles his data will reside is a nice thing to have for the most demanding workloads. But these workloads are comparatively few. In fact, my personal experience with installed DMX and CLARiiON arrays bears out this one bit of real world experience –
If you are going to need dedicated IO's that the outer tracks of a set of spindles will give you, then *most likely* the workload is high enough that the inner tracks will go unused because you cannot afford to throw any other application workload at the spindles.
Hence the cost is very high – but sometimes you do run into this need. But as a rule, it is not needed for the vast majority of applications… just a very few.
Any other conversation is pure FUD. Been there.
I will probably blog on the Low Art of FUD soon. Why ?? Because the IT community needs to know this – FUD wastes *your* time and energy – not mine. All vendors have a database of some sort full of razor sharp answers to any and all FUD. Especially the kind that Chuck tosses around. Vendor Olympics is about the customer tiring himself out – not exercising the vendor.
We do this for a living folks!
Cheers !