The Open Systems Interconnect (OSI) model is the most often taught model of data transmission—although it is not all that useful in terms of describing how modern networks work. What many engineers who have come into network engineering more recently do not know is there was an entire protocol suite that went with the OSI model. Each of the layers within the OSI model, in fact, had multiple protocols specified to fill the functions of that layer. For instance, X.25, while older than the OSI model, was adopted into the OSI suite to provide point-to-point connectivity over some specific kinds of physical circuits. Moving up the stack a little, there were several protocols that provided much the same service as the widely used Internet Protocol (IP). @ECI
One of the nice things about IS-IS is the ability to run IPv6 and IPv4 in the same protocol, over a single instance. So long as the two topologies are congruent, deploying v6 as dual stack is very simply. But what if your topologies are not congruent? The figure below illustrates the difference.
In this network, there are two topologies, and each topology has two different set of level 1/level 2 flooding domain boundaries. If topology 1 is running IPv4, and topology 2 is running IPv4, it is difficult to describe such a pair of topologies with “standard” IS-IS. The actual flooding process assumes the flooding domain boundaries are on the same intermediate systems, or that the two topologies are congruent.
One way to solve this problem today is to use IS-IS multi-topology, which allows the IPv6 and IPv4 routing information to be carried in separate TLVs so two different Link State Databases (LSDBs), so each IS can compute a different Shortest Path Tree (SPT), one for IPv4, and another for IPv6. Some engineers might find the concept of multi-topology confusing, and it seems like it might be overkill for other use cases. For instance, perhaps you do not care about incongruent topologies, but you do care about carrying IPv6 in a separate instance of IS-IS just to see how it all works, with the eventual goal of combining IPv4 and IPv6 into a single instance of IS-IS. Or perhaps there is some new feature you want to try on a production network in a different instance of IS-IS, without impacting the IS-IS instance that provides the production routing information. There are a number of use cases, of course—you can probably imagine a few.
How can these kind of use cases be solved in IS-IS? In EIGRP, for instance, the Autonomous System (AS) number is used as the EIGRP protocol port number on the wire, allowing multiple EIGRP processes to run in parallel on the same link (even though this capability has never been implemented, as far as I know). Some sort of parallel capability would need to be created for IS-IS; this is what RFC8202, IS-IS Multi-Instance, provides. Not only does RFC8202 provide this capability, it also illustrates an interesting use case for using TLVs in protocol design, rather than fixed length fields.
For readers not familiar with these concepts, fixed length field protocols marshal data into fields of a fixed length, each of which represents a single piece of information. The metadata required to interpret the data is carried entirely in the protocol specification; the protocol itself does not carry any information about the information, only the information itself. The positive attribute of working with fixed length fields is the amount of information carried on the wire is minimized. The negative is that any change in the protocol requires deploying a new version throughout the network. It is difficult to “ignore” bits that are carried without introducing failures. Further, in a fixed length field format, as new information is pushed into the protocol, either new packet formats must be created and handled, or the length of any given packet must be increased.
Type/Length/Value (TLV) formats carry the kind of information in the specification, but they carry information about the kind of information being carried, and the size of the information being carried, in the protocol itself. This means the packet format is larger, but the protocol is more flexible.
In the case of RFC8202, adding this kind of multi-instance capability in a fixed length field formatted protocol would require a shift in the packet format. In a TLV based protocol, like IS-IS, you can add new features can be added by adding a new TLV; this is precisely what RFC8202 does. To provide multi-instance capability, RFC8202 adds a new multi-instance TLV to the IS-IS PDU, which is the “outer packet format” used to carry every other kind of IS-IS information, including hellos, link state information, etc. This new TLV carries an instance ID, which differentiates each instance of IS-IS.
The instance IDs must be configured the same on each IS so they match in order to build adjacencies. Point to point and broadcast operation works the same as “standard” IS-IS, including Designated Intermediate System operation on each instance, etc. IS-IS would be implemented on each IS so each instance will have a separate LSDB, an a separate SPT would be computed across each of these LSDBs. The other key factor will be implementing multiple routing tables, and then finding some way to route traffic using the correct routing table. In the case of IPv4 and IPv6, this is fairly simple to sort out, but it would be more complex in other cases.
RFC8202 adds a new and interesting capability to IS-IS—it may take some time for vendors to implement and deploy this new capability, but this should make IS-IS more flexible in real world situations where multiple interior gateway protocol on a single network.
A couple of weeks ago, I attended a special segment routing Networking Field Day. This set me to thinking about how I would actually use segment routing in a live data center. As always, I’m not so concerned about the configuration aspects, but rather with what bits and pieces I would (or could) put together to make something useful out of these particular 0’s and 1’s. The fabric below will be used as our example; we’ll work through this in some detail (which is why there is a “first part” marker in the title).
This is a Benes fabric, a larger variation of which which you might find in any number of large scale data center. In this network, there are many paths between A and E; three of them are marked out with red lines to give you the idea. Normally, the specific path taken by any given flow would be selected on a somewhat random basis, using a hash across various packet headers. What if I wanted to pin a particular flow, or set of flows, to the path outlined in green?
Let’s ask a different question first—why would I want to do such a thing? There are a number of reasons, of course, such as pinning an elephant flow to a single path so I can move other traffic around it. Or perhaps I want to move a specific small (mouse) flow onto this path, while somehow preventing other traffic from taking it. Given I have a good reason, though, how could I do this with segment routing?
Let’s begin at the beginning. Assume I’m running IS-IS on this network, I have MPLS forwarding enabled, but I don’t have any form of label distribution enabled (LDP, BGP on top, etc.). To make my life simple, I’m going to assign a loopback address to each router in the fabric, and either—
- just allow IS-IS to use the IPv6 link local addresses (don’t assign any IPv6 address to the fabric links in an IPv6 only fabric)
- assign some private IPv4 address and configure IS-IS to advertise only passive interfaces, so the fabric link addresses aren’t actually advertised into IS-IS
If you’re uncertain what either of these two options mean, you might want to take a run through my recent IS-IS Livelesson to understand IS-IS as a routing protocol better).
With this background, let’s roll segment routing onto this network. Segment routing, in order to allow many different transports, contains the concept of a Segment Identifier, or a SID. The SID is used for many things, a point which can make reading the segment routing drafts a bit confusing to read. For this particular network, though, we’re going to simplify to two specific kinds of SIDs, because these are the only two we really care about—
- IGP-Prefix Segment, defined in section 3.2 of draft-ietf-spring-segment-routing, and variously called the Prefix-SID, the IGP-Prefix-SID, and the Prefix Segment Identifier in various places. IS-IS carries this in the Prefix Segment Identifier (Prefix-SID Sub-TLV), defined in section 2.1 of draft-ietf-isis-segment-routing-extensions. This sub-TLV can be added to any of the four main “wide metric” TLVs in IS-IS (again, I’ll refer you here to the recent Livelesson if you need to brush up on IS-IS packet formatting).
- IGP-Adjacency Segment, defined in defined in section 3.5 of draft-ietf-spring-segment-routing, and variously called the Adj-SID and the IGP-Adjacency-SID in various other documents. IS-IS carries this in the Adjacency Segment Identifier, defined in section 2.2 of draft-ietf-isis-segment-routing-extensions. This sub-TLV can be carried in any of the IS node reachability TLVs, such as 22, 222, 23, etc.
These SIDs, in the world of MPLS, are actually just MPLS labels. This means you don’t need a separate form of MPLS label distribution if you’re using the IS-IS segment routing extensions; these labels can be carried in IS-IS itself, along with the topology and reachability information.
To get segment routing up and running, I’ll need each router in the network to create two different MPLS labels, AKA SIDs, and advertise them through IS-IS (using the correct sub-TLV, of course)—
- An IGP-Prefix segment for each loopback address.
- An IGP-Adjacency segment for each fabric interface.
This means Router A would create an IGP-Prefix segment for its loopback address, and an IGP-Adjacency segment towards B, F, and its other neighbors.
There is, in fact, another type of SID described in the segment routing documentation, a IGP-Node-Segment. This actually describes a loopback address for a particular node, and hence describes the device itself. This is discussed in section 2.1 of draft-ietf-isis-segment-routing-extensions as a single flag within the IGP-Prefix segment. In reality, there is no functional difference between a node identifier and a prefix identifier in this case, so there’s no need to spend a lot of time on this here.
OSPF and IS-IS, both link state protocols, use mechanisms that manage flooding on a broadcast link, as well as simplify the shortest path tree passing through the broadcast link. OSPF elects a Designated Router (or DR) to simplify broadcast links, and IS-IS elects a Designated Intermediate System (or DIS—a topic covered in depth in the IS-IS Livelesson I recently recorded). Beyond their being used in two different protocols, there are actually subtle differences in the operation of the two mechanisms. So what is the difference?
Before we dive into differences, let’s discuss the similarities. We’ll use the illustration below as a basis for discussion.
Q1 and Q2 illustrate the operation of a link state protocol without any optimization on a broadcast network, with Q1 showing the network, and Q2 showing the resulting shortest path tree. Q3 and Q4 illustrate link state operation with optimization over a broadcast link. It’s important to differentiate between building a shortest path tree (SPT) across the broadcast link and flooding across the broadcast link—flooding is where the primary differences lie in the handling of broadcast links in the two protocols.
Let’s consider building the SPT first. Both protocols operate roughly the same in this area, so I’ll describe both at the same time. In Q1, there is no DIS (DR)—called a pseudonode from this point forward, so each pair of intermediate systems (routers) connected to the link will advertise connectivity to every other IS (router) connected to the broadcast link (so A will advertise B, C, and D as connected; B will advertise A, C, and D as connected, etc.). From E’s perspective, then, the broadcast link will appear to be a full mesh network, as shown in Q2. Full mesh connectivity adds a good bit of complexity to the tree, as you can see in the diagram.
To reduce this complexity, the intermediate systems (routers) connected to the broadcast link can elect an intermediate system (router) to generate a pseudonode, as shown in Q3. Regardless of their actual adjacency state, each intermediate system (router) reports it is only connected to the pseudonode, and the pseudonode reports what appears to be a set of point-to-point links to each of the intermediate systems (routers) connected to the broadcast link. The result is an SPT that looks like Q4; there is an extra hop in the SPT, but it is much simpler to calculate (even in reaction to topology changes).
For calculating the SPT, then, OSPF and IS-IS act much the same. The difference between the two actually lies in the way flooding is handled across the broadcast link.
Assume, for a moment, the illustrated network is running OSPF, and router A receives an updated LSA. Router A will flood this new LSA to a special multicast address that only the DR (and BDR) listens to. Once the DR has received (and acknowledged) the LSA, it will reflood the new LSA to the “all routers” multicast address on the broadcast segment.
Now let’s change the situation, and say the network is running IS-IS. Again, intermediate system A receives an updated LSP—but rather than sending this new information just to the DIS, it floods the LSP onto the entire link. So what does the DIS do in terms of flooding? The only thing a DIS does on the flooding side of things in IS-IS is to send out periodic packets describing its database (Complete Sequence Number Packets, or CSNPs). If an IS happens to fail to receive a particular LSP that was flooded by another IS connected to the same link, it will notice the missing LSP in its database, and request it from the DIS.
The flooding mechanisms, then, are completely different between the two protocols—differences that show up in the implementation details. For instance, IS-IS doesn’t elect a backup DIS, but OSPF does—why? Because if the OSPF DR fails, some router connected to the link must take over flooding changed LSPs, or the database can become desynchronized. On the other hand, if the DIS fails, there’s not much chance of anything bad happening. If one intermediate system drops an LSP, when a new DIS is elected and sends a CSNP, the loss will be noticed and taken care of. For much the same reason, the IS-IS can be preemptively replaced, while the OSPF DR cannot be.
At a fundamental level, OSPF and IS-IS are similar in operation. They both build neighbor adjacencies. They both use Dijkstra’s shortest path first (SPF) to find the shortest path to every destination in the network. They both advertise the state of each link connected to a network device. There are some differences, of course, such as the naming (OSI addresses versus IP addresses, intermediate systems versus routers). Many of the similarities and differences don’t play too much in the design of a network, though.
One difference that does play into network design, however, is the way in which the two protocols break up a single failure domain into multiple failure domains. In OSPF we have areas, while in IS-IS we have flooding domains. What’s the difference between these two, and how does it effect network design? Let’s use the illustration below as a helpful reference point for the two different solutions.
In the upper network, we have an illustration of how OSPF areas work. Each router at the border of a flooding domain (an Area Border Router, or ABR), has a certain number of interfaces in each area. Another way of saying this is that an OSPF ABR is never fully within a single area, but rather participates in many areas—thus each area is bounded within or on a router. Thus OSPF areas can be seen as a collection of flooding domains connected via hard boundaries at the ABRs.
In the lower network, we have an illustration of how IS-IS flooding domains work. Each intermediate system is entirely within a single flooding domain; every interface on the intermediate system is part of each flooding domain the IS itself is a part of. Any given IS may be a part of multiple flooding domains, and hence provide connectivity between the flooding domains it’s in. The easiest way to understand IS-IS flooding domains is as a set of overlapping instances of IS-IS; some intermediate systems just happen to be connected to more than one instance, and thus can provide routing between them.
The difference between these two is, by the way, related to the protocol suite in which each was developed. In the ISO suite, each device has a single address; in IP, each interface has an address. Hence, in OSPF, area boundaries are thought of as occurring between interfaces, while in ISO, flooding domain boundaries are thought of as occurring between devices. The old saw is “OSPF areas break on devices, IS-IS flooding domains break on wires.” This isn’t absolutely true, as IS-IS flooding domains still “break” on devices—it’s just that each device is entirely contained in a single flooding domain.
This fundamental difference leads to other, less obvious differences. For instance—
- Links in an IS-IS network can be configured in multiple flooding domains, as can a device. In OSPF, each link is in precisely one area (although there are workarounds to place a single link in multiple areas).
- OSPF assumes reachability information at the link (or interface) level, which means reachability information is automatically carried from one area to another. IS-IS assumes reachability at the device level, which means information about link or interface reachability is not automatically carried between flooding domains. Rather, interface level reachability must be leaked, or more properly redistributed, between flooding domains (remember, each flooding domain can be seen as a separate instance of IS-IS, and devices just happen to be in more than one flooding domain).
Learning to design with IS-IS isn’t just “OSPF on stilts;” it’s a different way of looking at the problem space. If you want to know more, take a look at my recently published LiveLesson on the IS-IS protocol.
Finally, let’s consider the first issue, the SPF run time. First, if you’ve been keeping track of the SPF run time in several locations throughout your network (you have been, right? Right?!? This should be a regular part of your documentation!), then you’ll know when there’s a big jump. But a big jump without a big change in some corresponding network design parameter (size of the network, etc.), isn’t a good reason to break up a flooding domain. Rather, it’s a good reason to go find out why the SPF run time changed, which means a good session of troubleshooting what’s probably an esoteric problem someplace.
Assume, however, that we’re not talking about a big jump. Rather, the SPF run time has been increasing over time, or you’re just looking at a particular network without any past history. My rule of thumb is to start really asking questions when the SPF run time gets to around 100ms. I don’t know where that number came from—it’s a “seat of the pants thing,” I suppose. Most networks today seem to run SPF in less than 10ms, though I’ve seen a few that seem to run around 30ms, so 100ms seems excessive. I know a lot of people do lots of fancy calculations here (the speed of the processor and the percentage of processor used for other things and the SPF run time and…), but I’m not one for doing fancy stuff when a simple rule of thumb seems to work to alert me to problems going into a situation.
But before reaching for my flooding domain slicing tools because of a 100ms SPF run time, I’m going to try and bring the time down in other ways.
First, I’m going to make certain incremental and partial SPF are enabled. There’s little to no cost here, so just do it. Second, I’m going to look at using exponential timers to batch up large numbers of changes. Third, I’m going to make certain I’m removing all the information I can from the link state database—see the answer to the third question on the LSDB size, above.
If you’ve done all this—keeping in mind that you need to consider the trade offs (if you don’t see the trade offs, you’re not looking hard enough), then I would consider splitting the flooding domain. If it sounds like I would never split a flooding domain for purely performance or technical reasons, you’ve come to the right conclusion on reading these two posts.
All that said, let me tell you the real reasons I would split a flooding domain.
First, just to make my life easier when troubleshooting the network. The router has a lot larger capacity for looking through screens full of link state information than I do. At 2AM, when the network is down, any little advantage I can give myself to troubleshoot the network faster is worth considering.
Second, again, to make my life easier in the troubleshooting process. Go back and think about the OODA loop. Where can I observe the network to best understand what’s going on? If you thought, “at the flooding domain boundary,” you earn a gold star. You can pick it up at the local office supply store.
Third, to break apart the network in case of a real failure—to provide a “firewall” (in the original sense of the word, rather than the appliance sense) to keep one part of the network from going down when another part falls apart.
Finally, to provide a “choke point” where you can implement policy.
So in the end—you shouldn’t build the world’s largest flooding domain just because you can, and you shouldn’t build a ton of tiny flooding domains just because you can. The technical reasons for slicing and dicing a flooding domain aren’t really that strong, but don’t discount using flooding domains on a more practical level.