I wonder how many times I’ve seen this sort of diagram across the many years I’ve been doing network design?
It’s usually held up as an example of how clever the engineer running the network is about resilience. “You see,” the diagram asserts, “I’m smart enough to purchase connectivity from two providers, rather than one.”
Can I point something out? Admittedly it might not be all that obvious from the diagram, but… Reality is just about as likely to squish your network connectivity like a bug no a windshield as it is any other network. Particularly if both of these connections are in the same regional area. The tricky part is knowing, of course, what a “regional area” might happen to mean for any particular provider.
The problem with this design is very basic, and tied to the concept of shared link risk groups. But let me start someplace a little simpler than that—with the basic, and important, point that putting fiber in the ground, and maintaining fiber that’s in the ground, is expensive. Unless you live in Greenland, fiber can be physically buried pretty easily (fiber in Greenland is generally buried with dynamite by a blasting crew, or through conduit that’s bolted to the surface of the ubiquitous rock). But it’s not the burying that costs a lot of money—it’s the politics.
To bury a cable, you must get a right of way. Getting a right of way could well be very expensive in any given city. I remember encountering one particular situation where the land under consideration was owned, in theory, by a railroad. Well, it was close enough to an old station that it must have been. But it took several years of looking through old piles of paper to find the correct paper trail and figure out how, precisely actually owned the land in a legally provable way. This is not a task for the faint of heart.
What has this to do with the image above? A lot, actually. It’s so expensive to install last mile fiber providers often share this last mile. To explain, let’s look at a small picture, just below, that might be helpful.
This is the way many providers actually build their last mile. There is (normally a pair of) fiber ring(s), with a set of ROADM’s at key locations in the region (ROADM actually means “randomly dropping all de traffic that matters,” but don’t tell anyone, it’s a secret). When a customer is connected to the network, they are assigned a lightwave on the fiber that carries their traffic, from the customer edge device, over a virtual layer 2 circuit (generally point-to-point, but not always), to a central office or exchange point. Here the different lightwaves are split up and handed to different providers through good old fashioned routing. One provider normally owns the fiber, and other providers lease wavelengths, or bandwidth, etc., to reach customers in the region.
Looking at this second image, you might be able to see what the problem is with the first. It’s possible—actually probable, in fact I’ve seen it happen in real life—that a single backhoe fade within the same region will take out both provider’s circuits at the same time.
The problem here isn’t really the lack of diversity. Rather, it’s that the lack of diversity is hidden through the magical abstraction of virtualization. Two logical circuits that share the the same fate because they both run on the same physical media, by the way, are called a Shared Risk Link Group (SRLG). Providers aren’t likely to tell you when you’re at risk from this sort of problem for several reasons.
First, telling you who leased fiber from whom is bad business. Second, they may not actually know enough about their competitors to point this problem out. Third, it’s really in their business interest to try to convince you not to do this, but rather to buy all your upstream from them.
So—what can you do about this?
If you’re going to connect to two providers, try to do so in two different regions. This is often difficult, as you don’t really know where the regions are, and connecting two sites that provide backup for one another across multiple regional rings can be a challenge for geographical reasons.
One alternative here is to connect to a local exchange point (an IXP), and from their fabric to the various providers. While the IXP will likely lease their circuits from others, they will have a much better idea of where the cables physically run, and how to provide diverse circuits (but only if you know what you’re asking for).
Another alternative is to simply stick with a single provider, and insist on physical diversity in any resilient links. This plays into the provider’s hand of trying to get you to buy from a single source, but it gets around the problem of trying to figure out what cable is where, and who uses what (information you’re not generally going to be able to find anyway), and puts it on the shoulders of the provider—who does know, at least for their network.
The next time you think you’ve solved the resilience problem by quickly and easily dual homing, remember shared risk, and remember to look for the deeper problem that’s been hidden away through an abstraction—an abstraction that far too often is leaky.