Greg “EtherealMind” Ferro recently “mused” that it might be a good idea to replace PCI Express (PCIe) inside servers or rack-scale infrastructure with Ethernet. But this seems to be the exact opposite of the direction the industry is headed. Rather than replacing PCIe with Ethernet, companies like Intel seem set on replacing short-range Ethernet (in rack-scale systems) with PCIe!
PCIe vs. Ethernet
Greg points out (rightly) that electrical signals copper traces on motherboards are currently limited to 15.75 Gbps in PCIe 4.0. With 16 lanes, this brings us to 252 Gbps of throughput on the PCIe 4.0 bus. Greg is also correct that current Ethernet switches operating at 25 Gb can handle this kind of throughput across 10 or so connections. QED, right?
Read Greg’s post: Musing: Could We Replace PCIe Bus With Ethernet Switch?
Sorry, Greg! Stuffing an Ethernet switch into a server is exactly the wrong direction for many reasons.
Most pressing is the issue of latency. PCIe latency is measured in hundreds of nanoseconds, while Ethernet interconnects are measured in tens of microseconds. This might not sound like much, but it’s literally two orders of magnitude difference and would be a huge step back in real-world use.
Just because you can push the same amount of data across a link (throughput) doesn’t mean you can do the same tasks. PCIe is like a fleet of shopping carts filling the aisles at your local Costco, while Ethernet is the street of SUVs taking those big boxes of cereal and lightbulbs back home. Although they are theoretically carrying the same payload, Explorers and Caravans just weren’t designed to navigate inside the store!
There are many other issues to consider as well. Ethernet NICs and switches are complex, being designed to handle the vagaries of topology changes, speed differences, and relatively frequent reconfiguration. An in-server Ethernet variant could be stripped down to the basics and integrated into the chipset just like PCIe but this would obviate the external connectivity benefits suggested by Greg. So every device would have to be a full-featured Ethernet endpoint, likely with TCP/IP besides!
Rack-Scale Computing, OPCIe, and SiPh
Greg mentions Intel’s work on rack-scale computing and silicon photonics. Good! But then he suggests running Ethernet over this lovely next-generation interface. Bad!
The intent is to run PCIe over all those integrated silicon/optical interconnects and extend it to rack-scale, rather than ingesting Ethernet. This has a whole raft of benefits, including better real-world performance (thanks to low latency and little protocol overhead) and easier integration, since PCIe is already in use at all points in a rack-scale infrastructure.
IT folks usually express some serious skepticism when I mention PCIe as an externally-exposed interconnect. But then I point out that this entire system is already in use! Apple’s Thunderbolt is simply PCIe over copper DisplayPort cables and long-range optical cables are on sale today. Intel’s silicon photonics (SiPh) optical PCIe (OPCIe) technology has been sampling for over a year now, and Fujitsu has demonstrated a server using these optical interconnects for peripheral interconnection.
Proponents of rack-scale computing seem poised to adopt OPCIe as an interconnect within the rack in the next year or so. This will encroach on the market for current server-to-server and server-to-storage interconnects like Ethernet, Fibre Channel, and InfiniBand. Enterprise products based on OPCIe are being developed as well, though few if any have yet been announced.
You might like to read my Rack Endgame series:
Note that Fibre Channel, InfiniBand, RapidIO, and many other technologies besides have attempted to do just what Greg is suggesting: Unify internal and external connectivity with a “master” protocol. But none have succeeded. It seems more logical to standardize on a fast, scalable, low-latency bus like PCIe for short-range communication and a ubiquitous network like Ethernet for longer-range use.
Rather than pushing Ethernet into the server, the industry is pushing it out of the rack. Soon, racks will function like blade chassis, with high-speed interconnects for internal communication and Ethernet termination points for communication outside the rack. Probably the closest thing to reality in Greg’s vision is the concept of tunneling Ethernet over PCIe and integrating it into server chipsets. This would function something like FCoE, providing a path for a legacy interconnect (Ethernet) right into the heart of the new converged rack.
Note that the title of this piece if farcical, and based on Greg’s title. No, we cannot replace all of Ethernet with PCIe. It’s still the king of the campus and larger-radius networks. But it has no place in the world of PCIe!
I’ve done some more research and it seems that at least three different business units inside Intel are actually pursuing Ethernet, OPCIe and Infiniband products around this idea.
Personally, I don’t see that PCIe or IB will win. The protocol stacks that drive them are too niche to survive over time. I’ve learned to never bet against Ethernet. Over time it kills all other link layer protocols. Token Ring, ATM, Frame Relay, FibreChannel are just a few of the casualties.
I can see niche-ness killing IB, but PCIe is everywhere. It’s in everything. It’s not niche at all! Ethernet and PCIe are the dynamic duo of ubiquitous interconnects these days…
Geoff Arnold says
Help me out here. PCIe performs address allocation by enumeration, doesn’t it? So slot numbering is potentially different every time a new device is added. That may be OK for a host-based PCIe driver in the server OS, but how the hell does it work with ACS-enabled endpoint-to-endpoint transfers?
One reason for looking at Ethernet (or a derivative) rather than PCIe is to get away from the host-centric SPOF….