A few weeks ago, I was in the midst of a conversation about EVPNs, how they work, and the use cases for deploying them, when one of the participants exclaimed: “This is so complicated… why don’t we stick with the older way of doing things with multi-chassis link aggregation and virtual chassis device?” Sometimes it does seem like we create complex solutions when a simpler solution is already available. Since simpler is always better, why not just use them? After all, simpler solutions are easier to understand, which means they are easier to deploy and troubleshoot.
The problem is we too often forget the other side of the simplicity equation—complexity is required to solve hard problems and adapt to demanding environments. While complex systems can be fragile (primarily through ossification), simple solutions can flat out fail just because they can’t cope with changes in their environment.
As an example, consider MLAG. On the surface, MLAG is a useful technology. If you have a server that has two network interfaces but is running an application that only supports a single IP address as a service access point, MLAG is a neat and useful solution. Connect a single (presumably high reliability) server to two different upstream switches through two different network interface cards, which act like one logical Ethernet network. If one of the two network devices, optics, cables, etc., fails, the server still has network connectivity.
But MLAG has well-known downsides, as well. There is a little problem with the physical locality of the cables and systems involved. If you have a service split among multiple services, MLAG is no longer useful. If the upstream switches are widely separated, then you have lots of cabling fun and various problems with jitter and delay to look forward to.
There is also the little problem of MLAG solutions being mostly proprietary. When something fails, your best hope is a clueful technical assistance engineer on the other end of a phone line. If you want to switch vendors for some reason, you have the fun of taking the entire server out of operation for a maintenance window to do the switch, along with the vendor lock-in in the case of failure, etc.
EVPN, with its ability to attach a single host through multiple virtual Ethernet connections across an IP network, is a lot more complex on the surface. But looks can be deceiving… In the case of EVPN, you see the complexity “upfront,” which means you can (largely) understand what it is doing and how it is doing it. There is no “MLAG black box;” useful in cases where you must troubleshoot.
And there will be cases where you will need to troubleshoot.
Further, because EVPN is standards-based and implemented by multiple vendors, you can switch over one virtual connection at a time; you can switch vendors without taking the server down. EVPN is a much more flexible solution overall, opening up possibilities around multihoming across different pods in a butterfly fabric, or allowing the use of default MAC addresses to reduce table sizes.
Virtual chassis systems can often solve some of the same problems as EVPN, but again—you are dealing with a black box. The black box will likely never scale to the same size, and cover the same use cases, as a community-built standard like EVPN.
The bottom line
Sometimes starting with a more complex set of base technologies will result in a simpler overall system. The right system will not avoid complexity, but rather reduce it where possible and contain it where it cannot be avoided. If you find your system is inflexible and difficult to manage, maybe its time to go back to the drawing board and start with something a little more complex, or where the complexity is “on the surface” rather than buried in an abstraction. The result might actually be simpler.