I get the same questions all the time: Should I buy X or Y? Is Z better than Q? But as much as it sounds like a cop-out, I always answer, “well, this sounds like a cop-out, but that depends on what you’re doing with it…”
Now EMC’s Chuck Hollis has (bravely) stuck his neck out to try to actually compare the capacity efficiency three storage arrays in a realistic way. Good luck, Chuck! I can hear the knives sharpening over at NetApp and HP already!
Why This is a Good Idea
In all seriousness, this is exactly the sort of analysis that customers ought to be taking on. Buying a new storage device? Spec out how you want to use it and ask for proposals from vendors that are configured according to recommended practice. That’s the only way to really compare two devices: Real usable configurations.
It’s not realistic to expect that an EMC array with the same capacity and number of disks as a supposedly similar device is going to give you the same usable space, performance, energy efficiency, manageability, or really anything else. Despite the basic architectural similarities of, say, a CLARiiON and an EVA, there are just too many critical differences to think of them as a pair of apples, even if you strive for the same specs.
In fact, there are no apples or oranges anymore. No one uses straight textbook RAID. No one makes a pure NAS filer or Fibre Channel array or anything. They’ve all evolved away from the basics we think we understand, adding in a little midrange, a dash of green, and a dollop of iSCSI to become a field of grapples, tangelos, limequats, and pluots.
Why EMC Shouldn’t Be Doing It
All that being said, I think it’s beyond perilous for a vendor to try to set up a standardized comparison of capacity, just like it is foolish to try to get some kind of meaningful performance statistic between such diverse platforms.
Even when (as appears to be the case here) a vendor tries to follow their competitors’ recommendations, they’ll likely not end up with the same configuration that an experienced Systems Engineer from that company would put together. Often, these smart guys know the real-world implications of the system and can put together a system that matches the requirements better.
Regarding EMC’s specific comparison, I do have some questions, however:
- Does EMC really support using the five vault reserve disks on a CLARiiON to hold production data? EMC SE’s have suggested to me in the past that this is a bad idea…
- Would EMC really suggest 8+1 RAID 5 for a production Exchange and SQL Server environment?
- Is one hot spare per two DAEs (30 drives) really sufficient for a whole pile of 9-disk RAID 5 sets that are maxed-out with production data? I’d feel much more comfortable with a few more spares with such large RAID 5 sets.
- There is no way 14+2 RAID DP is equivalent to 4+1 RAID 5, let alone 8+1. It’s in a different league of reliability.
- Yeah, NetApp’s space reserve recommendation stinks. But you probably won’t need 100% in production – the real amount is something one would work out when testing and piloting and is probably substantially less than this.
I’m not trying to get into an argument about this, mind, just noting a few items that immediately jumped out at me. And if I could see these five issues in my quick read, I can just imaging what HP and NetApp will see! Watch out for the knives, Chuck! I know you mean well, but exercises like this just won’t ever work.
And where are HDS, Sun, and IBM? Plus, I would love to see 3PAR, Compellent, Dell/EqualLogic, LeftHand, and the rest jump in with their numbers! Maybe I should set up a sham RFP and ask the vendors to respond with their own systems for some real comparison!
(This post was updated for clarification and to add that last suggestion…)
Chuck Hollis says
Hi Steve — I thought your questions were good, so I replayed them as a comment on my blog.
Yes, this may seem a bit foolhardy, but we are convinced that there are fundamental differences in approaches that matter in terms of overall efficiency.
Here’s what I wrote on my blog:
—–
You write: Does EMC really support using the five vault reserve disks on a CLARiiON to hold production data?
Answer: yes, we do.
You write: Would EMC really suggest 8+1 RAID 5 for a production Exchange and SQL Server environment?
Answer: yes, we do — and we’ve got the test data to back it up. The performance and availability characterizations are publicly available, I think. I’d be glad to send them to you to review, if you’d like.
You write: Is one hot spare per two DAEs (30 drives) really sufficient for a whole pile of 9-disk RAID 5 sets that are maxed-out with production data? I
Chuck Hollis says
Hi Steve — I thought your questions were good, so I replayed them as a comment on my blog.
Yes, this may seem a bit foolhardy, but we are convinced that there are fundamental differences in approaches that matter in terms of overall efficiency.
Here’s what I wrote on my blog:
—–
You write: Does EMC really support using the five vault reserve disks on a CLARiiON to hold production data?
Answer: yes, we do.
You write: Would EMC really suggest 8+1 RAID 5 for a production Exchange and SQL Server environment?
Answer: yes, we do — and we’ve got the test data to back it up. The performance and availability characterizations are publicly available, I think. I’d be glad to send them to you to review, if you’d like.
You write: Is one hot spare per two DAEs (30 drives) really sufficient for a whole pile of 9-disk RAID 5 sets that are maxed-out with production data? I’d feel much more comfortable with a few more spares with such large RAID 5 sets.
Answer: the configuration has multiple global proactive hot spares, so it’s not useful to think of it that way. Also, the proactive sparing algorithm means we usually get to a drive before it totally fails. CLARiiON engineering considers this recommendation “conservative”. We stand by it.
However, please add as many drives as makes you feel comfortable. We can always make more 🙂
You write: There is no way 14+2 RAID DP is equivalent to 4+1 RAID 5, let alone 8+1. It’s in a different league of reliability.
Answer: although we’d debate that it’s far more than your choice of RAID algorithm that impacts overall availability, we just went with what each vendor recommended as best practice.
We made our recommendation, HP made theirs, NetApp made theirs.
You write: Yeah, NetApp’s space reserve recommendation stinks. But you probably won’t need 100% in production – the real amount is something one would work out when testing and piloting and is probably substantially less than this.
Answer: gee, NetApp’s pitch is all about simplicity. You mean I have to run a bunch of trials to get to the optimum setting? That wasn’t in the marketing deck I saw! 🙂
Also, let’s not forget that the penalty for getting things wrong is a catastrophic application crash, which is pretty severe ..
The bottom line we simply went with what each vendor recommended, and was willing to support, because that’s what the majority of customers will end up running, right?
There are no value judgments here on whether the recommendations made by different vendors are “right” or “wrong”. That’s an entirely different discussion, isn’t it?
I know you think I’m putting my neck out there, but I think this is a good discussion to have.
The differences in approaches are striking, wouldn’t you agree?
— Chuck
Cleanur says
There are very few similarities between the CX and EVA architecture, the only part that comes even close to being comparable is how they are physically put together, i.e dual controllers and multiple enclosures, the similarities end there. In terms of how the system is managed, and how functionality is implemented, the two are completely different animals.
Here’s some numbers for the EVA, BTW I don’t work for HP but do like the EVA technology. Once you’ve used it’s a bit like a good sat nav (EVA) after using a paper map atlas (traditional array) all your life. Initially you completely mistrust where it’s taking you and try to go your own way, over time your mistakes become more appernt and you learn to trust the technology.
Equivalent of 120x146GB = 17.5TB required total usable (inc Raid 5 + Sparing)
Single DIsk group (all VR5 as per Chuck)
173x146GB = 25,258GB RAW
173x146GB = 24,666GB after Rightsizing
173x146GB = 23,643GB after Metadata
Set 90% occupancy alarm = 10% reserve = 21,279GB
Add distributed sparing = 21,033GB
Vraid 5 available = 17,527GB
% Available = 17,527GB / 24,666GB = 71% utilization
173 Disks = 13 Enclosures
A more realistic config for a large Exchange environment would be.
Two DIsk groups (all VR5 as per Chuck)
144x146GB = 21,024GB RAW
144x146GB = 20,531GB after Rightsizing
144x146GB = 19,680GB after Metadata
Set 90% occupancy alarm = 10% reserve = 17,712GB
Add distributed sparing = 17,466GB
Vraid 5 available = 14,555GB
% Available = 14,555GB / 20,531GB = 70%
Disk group 2 = 32x144GB = 4672GB RAW
32x144GB = 4,562GB after Rightsizing
8x144GB = 4,373GB after Metadata
Set 90% occupancy alarm = 10% reserve = 3,936GB
Add distributed sparing = 3,690GB
Vraid 5 available = 3,075GB
% Available = 3,075GB / 4,562GB = 67%
70% + 67% = 137 / 2 Disk groups = 68.5% utilization
144 Disks + 32 Disks = 176 = 13 Enclosures
14,555GB + 3,075GB = 17,630GB
Note these are real world figures without vendor spin. A key thing about the use of disk groups and vdisks is that of the 17.xTB available on each configuration, every GB displayed as usable, is absolutely usable within the disk group i.e No stranded capacity within the stripe set as is usually the case for a non virtualized array. I know from experience that with a traditional array environment, over time you tend to create pockets of stranded space all over the array with little chance of reclamation, unless your predictions and planning are spot on. With virtual pool you just take the capacity required at any point in time, you’ve divorced capacity from the underlying pool so when you need some more just take it. Not enough space in the pool, just add some more and draw the capcity you need. It’s all on line and totally transparent to the hosts and applications. HP have had technology for 6+ years now but have failed to market the benefits to there full potential.
Cleanur says
There are very few similarities between the CX and EVA architecture, the only part that comes even close to being comparable is how they are physically put together, i.e dual controllers and multiple enclosures, the similarities end there. In terms of how the system is managed, and how functionality is implemented, the two are completely different animals.
Here's some numbers for the EVA, BTW I don't work for HP but do like the EVA technology. Once you've used it's a bit like a good sat nav (EVA) after using a paper map atlas (traditional array) all your life. Initially you completely mistrust where it's taking you and try to go your own way, over time your mistakes become more appernt and you learn to trust the technology.
Equivalent of 120x146GB = 17.5TB required total usable (inc Raid 5 + Sparing)
Single DIsk group (all VR5 as per Chuck)
173x146GB = 25,258GB RAW
173x146GB = 24,666GB after Rightsizing
173x146GB = 23,643GB after Metadata
Set 90% occupancy alarm = 10% reserve = 21,279GB
Add distributed sparing = 21,033GB
Vraid 5 available = 17,527GB
% Available = 17,527GB / 24,666GB = 71% utilization
173 Disks = 13 Enclosures
A more realistic config for a large Exchange environment would be.
Two DIsk groups (all VR5 as per Chuck)
144x146GB = 21,024GB RAW
144x146GB = 20,531GB after Rightsizing
144x146GB = 19,680GB after Metadata
Set 90% occupancy alarm = 10% reserve = 17,712GB
Add distributed sparing = 17,466GB
Vraid 5 available = 14,555GB
% Available = 14,555GB / 20,531GB = 70%
Disk group 2 = 32x144GB = 4672GB RAW
32x144GB = 4,562GB after Rightsizing
8x144GB = 4,373GB after Metadata
Set 90% occupancy alarm = 10% reserve = 3,936GB
Add distributed sparing = 3,690GB
Vraid 5 available = 3,075GB
% Available = 3,075GB / 4,562GB = 67%
70% + 67% = 137 / 2 Disk groups = 68.5% utilization
144 Disks + 32 Disks = 176 = 13 Enclosures
14,555GB + 3,075GB = 17,630GB
Note these are real world figures without vendor spin. A key thing about the use of disk groups and vdisks is that of the 17.xTB available on each configuration, every GB displayed as usable, is absolutely usable within the disk group i.e No stranded capacity within the stripe set as is usually the case for a non virtualized array. I know from experience that with a traditional array environment, over time you tend to create pockets of stranded space all over the array with little chance of reclamation, unless your predictions and planning are spot on. With virtual pool you just take the capacity required at any point in time, you've divorced capacity from the underlying pool so when you need some more just take it. Not enough space in the pool, just add some more and draw the capcity you need. It's all on line and totally transparent to the hosts and applications. HP have had technology for 6+ years now but have failed to market the benefits to there full potential.