October 31, 2014

Why I Am Biased Against FCoE

FCoE Reality Check Series

I stirred up a lot of controversy last week after posting what I thought was a fairly innocuous question: Will 16 Gb Fibre Channel Derail FCoE? That short post focused on the prospects for 16 Gb FC and was the result of questions I received from the audience at Interop New York. Yet the resulting controversy is all about the fitness of FCoE and my personal motivations. So I suppose it’s time to clarify my position more fully. This will be a multi-part series, since it’s getting kind of long, but let me spoil the ending for you: I believe that FCoE will displace traditional Fibre Channel (“FCoFC”) in about a decade.

This Is Storage!

The enterprise storage industry and market is very, very different from other sectors of IT. It can seem illogical and even foolish to outsiders, but there’s a method to the madness. It isn’t easy to “do storage right” and failures are catastrophic.

This is why storage architects are in love with “best practices” and hardware compatibility lists. And it’s also why I never recommend any solution that isn’t prudent, low-risk, and in widespread usage. Read that again. As much as I love startups and cool new technologies, I never recommend their products for enterprise production use. Storage people have always been cautious, and I’m a storage guy.

There’s also an extremely long tail for storage protocols since minor compatibility and interoperability issues can have major consequences, too. It’s crazy, really, that we still use SCSI as our primary protocol, but we do. Rather than replacing it with something more suited to virtualized environments we extend it with nips and tucks to make it keep working. As much as I’d like to just ditch SCSI and use a protocol that can handle unreliable networks, I know that won’t happen for a long time.

One must also consider the useful lifetime of enterprise storage devices and architectures. Today’s buyers will continue to use and grow their SAN for years to come. If history is a guide, it will take many years for anything to replace 8 Gb Fibre Channel as the majority datacenter SAN interconnect regardless of how awesome the replacement product is.

Just An Ignorant Old Man

So this is my axe, and this is how I grind it:

  1. I don’t care if anyone buys anything from anyone, let alone what they buy. I work for no man and I don’t need any vendor’s love. I take money from most companies from time to time, but none owns my loyalty.
  2. Although I love cutting-edge tech, I’m professionally very conservative and base all of my recommendations on my three-part definition of “Best Practice”. That includes the part about “widespread usage” – let someone else risk his job on cool new technology.
  3. I’ve watched all this happen before, and know it takes a long, long time for serious adoption of any storage technology. Real value is about way more than technical elegance, but it does eventually show through.
  4. I don’t care about protocols per se, I care about what customers do with them. I’m not an idiot or a luddite but I’m not a willfully-ignorant cheerleader, either.

I am biased against FCoE because it’s too new to be blithely and broadly recommended for production enterprise use. That’s all. Yes, the standards are standardized and there are products extant. But that’s not enough for me. Next, I’ll talk more about why FCoE is not ready for prime time.

Note: The original post, Will 16 Gb Fibre Channel Derail FCoE? was written as part of an ongoing paid contract with IBM Storage Community, a site funded by IBM. But that post was entirely my own conception and creation with no input from the site editors or IBM and should not be construed to reflect their strategy or opinion.

  • http://etherealmind.com Etherealmind

    I like your interpretation, but I doubt a that 10 years is long enough for the storage industry to pick up a new idea. And by then, networking will change to a new model. 

    Ethernet enhancement:  By the time FCoE manages to be production ready it will have been replaced by IP Storage that is managed, controlled and improved by new protocols. Thus, technologies such as TRILL dramatically improve reliability and performance of Ethernet by improving bandwidth and resilience. 

    Commoditisation – the current fashion trend is to commodity hardware and feature rich software. FC/FCoE does not fit this trend and threatens to obsolete all existing forms of storage. 

    SDN/Controller networking: The rise of OpenFlow/SDN will most likely change the networking landscape is such a way that FC will look like Token Ring within three years. 

    Finally, Ethernet Networking is a very large market. And Ethernet scale will dwarf the price/performance of FC in a short period of time. Already, SME deployments with iSCSI are commonplace, and FC is shrinking. Say hello to the ITIL-compliant buggy whip. 

    Farewell FC/FCoE, thank goodness I never had to know you. 

  • http://blogstu.wordpress.com stu

    Hi Stephen,
    FCoE is not an all-or-nothing solution. If you want to talk about “widespread usage” – you seem to discount that the new generation of blade servers (including from HP, Cisco, IBM and others) all use FCoE (I’d agree that most customers don’t even know/care it’s there, but this was the first mission of FCoE – eliminate the 2 networks inside embedded space). The other solutions are end-to-end FCoE and multi-hop FCoE – solutions are shipping, and the early adopter customers are using these. I’ve seen more small end-to-end native FCoE solutions (NetApp and EMC) than I have large multi-hop, but that is to be expected since it is a newer solution and smaller is easier/faster to deploy than a 1000 node environment. 
    Where you raised some hackles was in your characterization of the standards and stating the the technology isn’t ready for prime time. All of your arguments above are about how the storage industry is slow to adopt technology – of course you are right. But the standards are done (Cisco’s does have the only shipping multi-hop solution, it does meet standards and can even be done with other servers – see HP’s solution with Cisco FEX as an example) and there are no bugs or technology hurdles. My advice to users is always that they can go as fast or as slow to deploy the new solutions – more of the products that they buy will have the capability for FCoE (on both the Ethernet-only solutions and newer technologies that can do FC and Ethernet) and that the promise of a single network and standardized configurations is compelling.
    The general trend from both the server and storage supplier-side is to offer flexibility while lowering costs – the convergence of FC and Ethernet (with FCoE being only one of many options) will continue.
    Stu Miniman
    Wikibon.org

  • http://blog.fosketts.net sfoskett

    Don’t wave bye to FC yet, Greg. It’s going to take a LONG time for it to decline even to be a minority in the datacenter, let alone for it to go away. FC will still be a multi-billion dollar market 10 years from now, mark my words!

    But I agree with your basic point: Things are changing, and post-SCSI IP-based storage will rise to be a real force in the coming decade. New applications will make SCSI-based block storage redundant and will eventually consign it to a corner of the datacenter.

  • http://blog.fosketts.net sfoskett

    I don’t discount edge-only FCoE for blades. In fact, I frequently go out of my way to applaud that use case. But I was writing about multi-hop FCoE, and don’t think a few products make that ready for prime time. See the next post…

  • http://twitter.com/the_socialist Jon Hudson

    I’m clapping and cheering (though you can’t hear me)

    Change = Risk. 

    One thing you learn after years and years of working in Production data centers is that unneeded Risk is just stupid. Results in meetings, and reports, witch hunts and firing squads. 

    Change however is often good. So what do you do if  Change = Risk.

    YOU LET OTHERS BLEED FOR YOU

    Some of these environments are really serious. Take one system that is used to authorized the release of live weapons so that planes can be armed and re-armed. If that goes down, so do all the planes tied to that system. Or how about systems that watch for nuclear launches? How about Credit Card authorization? This isn’t all video games and Facebook. 

    I mean just look at poor RIM. No one even died and look what they are being put through for their network outage. If you were the human that recommended the technology that let to that failure, how well do you think you’d be sleeping about now?

    FCoE freaking rocks. I’m very excited about it. Some of the performance I’ve gotten is just stellar. 9.7Gb/s I love FCoE. 

    However how long did it take people to run VMware in production? How many freaking years did we have all of QA/DEV/Test/etc happily running on VMware before Production even became an option?

    Production Datacenters need to be treated like hallowed ground. Where Uptime and the Cult of 9’s go to pray. 

    How long did it take VMware? MPLS? iSCSI? Linux? This is not a sprint, it’s a marathon. 

    The time will come where no net new FC installs will happen, they will probably be all be FCoE. Could be less, but about 10yrs is probably what it’s going to take. 

    When will the time come when existing stable FC networks are ripped out? How long will it take till FC is no longer supported or in use? 

    Well lets see, IBM last year made $2.5B on mainframes. 

    I understand someone needs to go first. I understand that FCoE Product Managers need to grow their business and make their numbers. Nothing wrong with that. 

    However “bleeding edge” and “Production” do not belong in the same rack.

  • http://twitter.com/BRCDbreams Brook Reams

    Hi Stu,

    I’m curious about the comment “Cisco’s does (sic) have the only shipping multi-hop solution …” which I assume applied to FCoE.  As I work for Brocade, why would you exclude our solution?  Always interested in the view from the other side, so to speak.

    Thanks.

  • http://blog.fosketts.net sfoskett

    I don’t want to speak for @Stu, but I think he was referring to a FCF “dense mode” multi-hop solution as opposed to Brocade’s multi-hop solution, which while functional doesn’t create VE_ports per the spec… Or at least that’s my understanding…

  • http://blog.fosketts.net sfoskett

    Awesome, awesome comment from someone who gets it. THIS IS STORAGE!!!

  • http://blogstu.wordpress.com stu

    Thanks – right, not counting FIP snooping as a hop. Looking for a VE_port solution to be multi-hop. Not looking to overlook anything.

  • http://datacenteroverlords.com/ tonybourke

    I must apologize for proliferating the term “dense mode” and “sparse mode” :) I think a better term might be FC-forwarded and Ethernet forwarded for the two technologies. 

  • http://blog.fosketts.net sfoskett

    “Dense Mode” is a really terrible name for it. I like “Full FCF” and “Forwarded” or something similar…

  • Erik Smith

    The terms FC forwarded and Ethernet forwarded will become a problem as soon as you start talking about FC-BB-6 and FDFs. 

    That having been said, I don’t have better suggestion (yet)… 

  • http://stormcontrol.tumblr.com Pablo Carlier

    I believe the real wonder with “Real FCF Multihop” is that in reality it is the same old SAN design we have been working with for ages. The only change is the fact that the FC switch is now also working, at a separate level, as a LAN switch. But SAN designs remain untouched.

    FIP Snooping solutions, “Lossless Ethernet blind forwarders” and even to some extent NPV mean some kind of change to how SANs are designed and managed. This is not the case with Full FCF Multihop, as it just means you are using Ethernet to carry FC frames in your cabling.

    Mechanisms for FC traffic protection and lossless transmission have been there for ages too (DCBX and all). As you very cleverly state, Edge FCoE has been using it for years with success. The only difference is now they are being applied to ISLs, instead of edge N-F links.

    A true multi hop FCoE SAN is no different from a regular SAN except for the encapsulation of the carrier frames, and the fact that the Fabric switch is also behaving as LAN switch in a separate plane.

    I agree with you in the fact that implementations that don’t go the whole way actually represent a change in how SANs are perceived: is a FIP snooping bridge really a “hop” in my SAN, or is it not? How can I troubleshoot the Flogi process in that scenario?

    Full FCF Multihop FCoE is not a leap of faith though, nor a matter of waiting several years to get accustomed to a new technology, such as what happened with server virtualization. This is more akin to adding a new routing protocol like OSPF on top of your RIPv2 running routers: it’s done in the same box as something else, but it’s the same old operation, the same old troubleshooting tools, the same old skill set that’s required with all other FC networks… because, in fact, it is just another FC network.

    (May it serve as a disclaimer that I work for Cisco) :)

  • Anonymous

    Amen to that! You can also reference a Network World article I published last month, which talks about Data Center Unified Fabric. I hope you find it useful as well. http://www.networkworld.com/news/2011/121311-converging-san-traffic-254023.html