Ever since Microsoft and Intel declared that the combination of Windows and Nehalem could deliver over a million iSCSI IOPS, I’ve been curious about just how they did it. What black magic could push that many I/Os over a single Ethernet connection? And what was on the other end? Now Intel has revealed all in a whitepaper, and the results are surprising!
What iSCSI Did
Let’s review the test for a moment. In March, Microsoft and Intel demonstrated that the combination of Windows Server 2008 R2 and the Xeon 5500 could saturate a 10 Gb Ethernet link, pushing iSCSI throughput to wire speed. That’s 1,174 MB/s, right around the theoretical maximum of a ten-gigabit link, given a tiny bit of overhead. The pair reunited in January to show that this same combination could deliver an astonishing million I/O operations per second, too.
Both of these results are astonishing. Sure, many high-end Fibre Channel SANs and storage systems blast out gigabytes of data and millions of I/O operations every second, but these tests are much more focused. Benchmarks are perilous, but the folks at Microsoft and Intel devised a fairly clever and focused set. Rather than a “mine’s bigger” contest, the pair only needs to prove that iSCSI can play with the pros.
The side effect is a demonstration of the capabilities of Microsoft and Intel components. Microsoft showed off the capabilities of Windows Server 2008 R2, Hyper-V, and their software iSCSI initiator, while Intel can brag about the Xeon 5500 server platform and X520-2 10 Gb Ethernet Server Adapter with their 82599EB controller. Your mileage may vary, but it is possible to construct a true storage monster on an average server budget.
Let’s start by looking at the configuration of the local end of the tested configuration. I’m a storage guy so I think of it as the initiator, but you might say it’s the server, the client, or the host. Regardless, the system under test (SUT) is what was put under the microscope. The configuration was a common one: A high-end computer packing an Intel Xeon CPU and 82599-based 10 Gb Ethernet adapter. Most data centers have a machine or two just like this one.
Looking closely, we see that the test in question relied on the following key components:
- Intel’s “Shady Cove” S5520SC workstation-class motherboard
- The Intel Xeon W5580 CPU (4 cores, 8 MB cache, 3.20 GHz)
- 24 GB of DDR3 RAM
- Intel “Niantic” 82599EB 10 Gb Ethernet controller
- Microsoft Windows Server 2008 R2 x64
This combination would set you back about $7,500 – $450 for the motherboard, $1,500 for the CPU, 6 2 GB DDR3 SDRAM modules at $80 each, $1,200 for the Intel X520 NIC, and $4,000 for an Enterprise copy of Windows Server 2008 R2. Not cheap, but not an exotic server either.
Initiate and Optimize
The secret to push the tested system to perform like it did is in the optimizations in the server platform, the NIC, and Windows Server itself.
- The Xeon 5500 processor series includes many enhancements:
- An integrated memory controller allows for faster RAM access
- QuickPath interconnect (QPI) replaces the old front-side bus and enhances I/O off the core
- A new I/O subsystem with PCIe integrated into the CPU
- MSI-X expands the number of interrupts a PCI device can use
- New instructions for on-board CRC-32C decoding, speeding up iSCSI digest processing
- The 82599 Ethernet controller also includes enhanced capabilities:
- VMDq maps I/O queues to multiple cores and virtual machines, reducing I/O bottlenecks
- Offload of TCP segmentation and receive-side coalescing
- Interestingly, it does not appear that VMDc/SR-IOV was employed in the test
- Microsoft Windows Server 2008 R2 and Hyper-V are ready to use all of these features and more:
- R2 uses multi-core CPUs more effectively in general
- Receive-side scaling (RSS) spreads the I/O workload across all four Xeon cores
- The iSCSI initiator now allows CRC digest offload (using the new Xeon command set)
- Numerous “NUMA I/O” optimizations in the initiator
- TCP/IP Nagle can be disabled in the registry
- Hyper-V VMQ allows the network packets to be copied directly into the guest virtual machine’s memory
Whew! Put all of these optimizations in a blender and Hyper-V virtual machine iSCSI access will be twice as fast as before. No kidding!
Stay On Target
But we knew all of this back in January. We also saw that a Cisco Nexus 5020 switch was used to fan out to 10 software iSCSI targets. But until now there was no mention of what targets were used exactly.
The final footnotes in Intel’s whitepaper reveals that the storage backing the million IOPS test was none other than StarWind Software‘s iSCSI SAN! It is unclear what led Microsoft and Intel to use this particular iSCSI target (the earlier throughput tests ran on NetApp filers), but it does speak to the quality of this product.
It is not clear how many disk drives were used, but I would guess that SSDs or ramdisks might have been employed to pull a million IOPS. Network optimizations are also not mentioned, though jumbo frames would not be a benefit in an IOPS test.
StarWind’s software runs on Microsoft Windows and creates a full-featured iSCSI target, complete with data mirroring, automatic failover and failback, replication, snapshots, and thin provisioning. The company prices their iSCSI SAN at $6,000 for two nodes and competes with the likes of DataCore and Open-E. But the StarWind solution seems at a glance to be more full-featured than these other offerings.
Try It Yourself!
I imagine many folks like me might be tempted to try to reproduce these results. More valuable would be a set of best practice guidelines for the deployment of software iSCSI in Windows Server 2008 R2 and Hyper-V environments. Given the relatively modest hardware involved, there should be nothing stopping us!
These test results also prompted me to get in touch with StarWind to try their iSCSI target software. I was pleasantly surprised to learn that they are currently offering free non-production licenses to VMware vExperts, VCPs, and VCIs as well as Microsoft MVPs, MCPs, and MCT Professionals. Many of my readers fall into one (or more) of those buckets, and I applaud the company for this offer. If only more companies realized the value in giving away test licenses to influencers and thought leaders!
DataCore’s SANmelody is the only competitor actually… Open-E is a very basic “el cheapo” solution, it fakes HA using active-passive Linux cluster (which is it’s based on). It’s a definite different league compared to what StarWind Software and DataCore people do.
office 2007 key says
I would guess that SSDs or ramdisks might have been employed to pull a million IOPS.
I feel that they cheater when they ran iSCSI and SMB 3.0 side by side and got the same results. The SMB was running with RDMA, the iSCSI was not. Apples to apples, NAS can never beat SAN because you have an extra layer of a filesystem on top of the real FS. Not to mention that you are loosing lot of the features of the real FS like caching, journaling, fancy ZFS stuff
NFS running with DAS and not SAN back end (for example implemented by NetApp) is faster then iSCSI as iSCSI needs local file system on top of it (NTFS/CSVFS or VMFS) and NFS does not.