MegaSwitch: an interesting new data center fabric
Data center fabrics are built today using spine and leaf fabrics, lots of fiber, and a lot of routers. There has been a lot of research in all-optical solutions to replace current designs with something different; MegaSwitch is a recent paper that illustrates the research, and potentially a future trend, in data center design. The basic idea is this: give every host its own fiber in a ring that reaches to every other host. Then use optical multiplexers to pull off the signal from each ring any particular host needs in order to provide a switchable set of connections in near real time. The figure below will be used to explain.
In the illustration, there are four hosts, each of which is connected to an electrical switch (EWS). The EWS, in turn, connects to an optical switch (OWS). The OWS channels the outbound (transmitted) traffic from each host onto a single ring, where it is carried to every other OWS in the network. The optical signal is terminated at the hop before the transmitter to prevent any loops from forming (so A’s optical signal is terminated at D, for instance, assuming the ring runs clockwise in the diagram).
The receive side is where things get interesting; there are four full fibers feeding a single fiber towards the server, so it is possible for four times as much information to be transmitted towards the server as the server can receive. The reality is, however, that not every server needs to talk to every other server all the time; some form of switching seems to be in order to only carry the traffic towards the server from the optical rings.
To support switching, the OWS is dynamically programmed to only pull traffic from rings the attached host is currently communicating with. The OWS takes the traffic for each server sending to the local host, multiplexes it onto an optical interface, and sends it to the electrical switch, when then sends the correct information to the attached host. The OWS can increase the bandwidth between two servers by assigning more wavelengths on the OWS to EWS link to traffic being pulled off a particular ring, and reduce available bandwidth by assigning fewer wavelengths.
There are a number of possible problems with such a scheme; for instance—
- When a host sends its first packet to another host, or needs to send just a small stream, there is a massive amount of overhead in time and resources setting up a new wavelength allocation at the correct OWS. To resolve these problems, the researchers propose having a full mesh of connectivity at some small portion of the overall available bandwidth; they call this basemesh.
- This arrangement allows for bandwidth allocation as a per pair of hosts level, but much of the modern data networking world operates on a per flow basis. The researchers suggest this can be resolved by using the physical connectivity as a base for building a set of virtual LANs, and packets can be routed between these various vLANs. This means that traditional routing must stay in place to actually direct traffic to the correct destination in the network, so the EWS devices must either be routers, or there must be some centralized virtual router through which all traffic passes.
Is something like MegaSwitch the future of data center networks? Right now it is hard to tell—all optical fabrics have been a recurring idea in network design, but do not ever seem to have “broken out” as a preferred solution. The idea is attractive, but the complexity of what essentially amounts to a variable speed optical underlay combined with a more traditional routed overlay seems to add a lot of complexity into the mix, and it is hard to say if the complexity is really worth the tradeoff, which primarily seems to be simpler and cheaper cabling.
And people say you don’t need to know anything about token ring anymore! Rule 11 confirmed again.
Precisely! Thanks for stopping by and commenting.
[…] Wide-spread Communications on Optical Fabric with MegaSwitch (article): In this paper, we seek an optical interconnect that can enable unconstrained communications […]