Network engineers do not need to become full-time coders to succeed—but some coding skills are really useful. In this episode of the Hedge, David Barrosso (you can find David’s github repositories here), Phill Simonds, and Russ White discuss which programming skills are useful for network engineers.

download

The Internet, and networking protocols more broadly, were grounded in a few simple principles. For instance, there is the end-to-end principle, which argues the network should be a simple fat pipe that does not modify data in transit. Many of these principles have tradeoffs—if you haven’t found the tradeoffs, you haven’t looked hard enough—and not looking for them can result in massive failures at the network and protocol level.

Another principle networking is grounded in is the Robustness Principle, which states: “Be liberal in what you accept, and conservative in what you send.” In protocol design and implementation, this means you should accept the widest range of inputs possible without negative consequences. A recent draft, however, challenges the robustness principle—draft-iab-protocol-maintenance.

According to the authors, the basic premise of the robustness principle lies in the problem of updating older software for new features or fixes at the scale of an Internet sized network. The general idea is a protocol designer can set aside some “reserved bits,” using them in a later version of the protocol, and not worry about older implementations misinterpreting them—new meanings of old reserved bits will be silently ignored. In a world where even a very old operating system, such as Windows XP, is still widely used, and people complain endlessly about forced updates, it seems like the robustness principle is on solid ground in this regard.

The argument against this in the draft is implementing the robustness principle allows a protocol to degenerate over time. Older implementations are not removed from service because it still works, implementations are not updated in a timely manner, and the protocol tends to have an ever-increasing amount of “dead code” in the form of older expressions of data formats. Given an infinite amount of time, an infinity number of versions of any given protocol will be deployed. As a result, the protocol can and will break in an infinite number of ways.

The logic of the draft is something along the lines of: old ways of doing things should be removed from protocols which are actively maintained in order to unify and simplify the protocol. At least for actively maintained protocols, reliance on the robustness principle should be toned down a little.

Given the long list of examples in the draft, the authors make a good case.

There is another side to the argument, however. The robustness principle is not “just” about keeping older versions of software working “at scale.” All implementations, no matter how good their quality, have defects (or rather, unintended features). Many of these defects involve failing to release or initialize memory, failing to bounds check inputs, and other similar oversights. A common way to find these errors is to fuzz test code—throw lots of different inputs at it to see if it spits up an error or crash.

The robustness principle runs deeper than infinite versions—it also helps implementations deal with defects in “the other end” that generate bad data. The robustness principle, then, can help keep a network running even in the face of an implementation defect.

Where does this leave us? Abandoning the robustness principle is clearly not a good thing—while the network might end up being more correct, it might also end up simply not running. Ever. The Internet is an interlocking system of protocols, hardware, and software; the robustness principle is the lubricant that makes it all work at all.

Clearly, then, there must be some sort of compromise position that will work. Perhaps a two pronged attack might work. First, don’t discard errors silently. Instead, build logging into software that catches all errors, regardless of how trivial they might seem. This will generate a lot of data, but we need to be clear on the difference between instrumenting something and actually paying attention to what is instrumented. Instrumenting code so that “unknown input” can be caught and logged periodically is not a bad thing.

Second, perhaps protocols need some way to end of life older versions. Part of the problem with the robustness principle is it allows an infinite number of versions of a single protocol to exist in the same network. Perhaps the IETF and other standards organizations should rethink this, explicitly taking older ways of doing things out of specs on a periodic basis. A draft that says “you shouldn’t do this any longer,” or “this is no longer in accordance with the specification,” would not be a bad thing.

For the more “average” network engineer, this discussion around the robustness principle should lead to some important lessons, particularly as we move ever more deeply into an automated world. Be clear about versioning of APIs and components. Deprecate older processes when they should no longer be used.

Control your technical debt, or it will control you.

The argument around learning to code, it seems, always runs something like this:

We don’t need network engineers any longer, or we won’t in five years. Everything is going to be automated. All we’ll really need is coders who can write a python script to make it all work. Forget those expert level certifications. Just go to a coding boot camp, or get a good solid degree in coding, and you’ll be set for the rest of your life!

It certainly seems plausible on the surface. The market is pretty clearly splitting into definite camps—cloud, disaggregated, and hyperconverged—and this split is certainly going to drive a lot of change in what network engineers do every day. But is this idea of abandoning network engineering skills and replacing them wholesale with coding skills really viable?

To think this question through, it’s best to start with another one. Assume everyone in the world decides to become a coder tomorrow. Every automotive engineer and mechanic, every civil engineer and architect, every chef, and every grocer moves into coding. The question that should rise just at this moment is: what is it that’s being coded? Back end coders code database systems and business logic. Front end coders code user interfaces. Graphics coders code ray tracing systems, artificial surfaces, and other such things. There is no way to do any of these things successfully if you don’t know the goal of the project. There’s no point in coding a GUI if you don’t understand user interface design. There’s no point in coding a back end system if you don’t understand accounting, or database design, or data analytics.

Given all of this, what piece of knowledge is lacking the path we are being urged to go down?

Network engineering.

If you want to code databases, you need to learn database theory. If you want to code accounting systems, you need to learn accounting. If you want to code networks, you need to learn network engineering.

But what do we mean when we say “network engineering?” Isn’t network engineering just buying some vendor gear, stringing it together, and the configuring it based on some set of arcane rules no-one really understands anyway? Isn’t network engineering much like building a castle out of plastic play blocks, just fitting them together in a way that makes sense, and ignoring or smoothing over the rough edges where things don’t quite fit right?

In short, no.

I’m not going to discourage you from learning to code—and I don’t just mean throwing around some python scripts to automate some odds and ends, or to complete a challenge on some web site. I truly believe that coding, real coding, is a good skill to have. But to believe we are going to eliminate network engineering through automation is to trade a skill set that has always been of rather limited value—purchasing, installing, and configuring vendor built appliances—with another one that is probably of less value—automating the configuration of vendor built devices. I fail to see how this is a good idea. If we all become coders, there will be no networks to code—because there will be no network engineers to build them.

Yes, silo’d engineers are going to be in less demand in the future than they are today—but this is old news, at least as old as my time administering a Netware Network at BASF, and even older. The market always wants specific skills right now, and engineers always need to build skills for the long term.

If you want to learn something now—if you want to learn something that will stand the test of time—if you want to learn something that will outlast vendors and appliances and white box and disaggregation and…

Learn network engineering.

Not how to configure devices—the skill that has stood in for real network engineering knowledge for far too long. Not how to automate the configuration of network devices—the skill that we are increasingly turning to, to replace our knowledge of the CLI.

Learn how the protocols really work, from theory to implementation, rather than how to configure them. Learn how devices switch packets, and why they work this way, rather than the available bandwidth on the latest gear. Learn how to design a network, rather than how to deploy vendor gear. Learn how to troubleshoot a network, rather than how to issue commands and look for responses.

It’s time we stopped spreading the “if you just learn to code, you’ll be in demand in five years” hype. If you have network engineering skills, then learning to code is a good thing. But I know plenty of really good coders who are not employed because they don’t have any skill other than coding (and to them, I say, learn network engineering). Learning to code is not a magic carpet that will take you to a field of dreams.

There is still hard work to do, there are still hard things to learn, there are still problems to be solved.

It’s fine—in fact crucial—to be an engineer who knows how to code. But you need to be an engineer, before learning to code is all that useful.

In the last post on this topic, we found the tail of the update chain. The actual event appears to be processed here—

case BGPEventUpdateMsg:
  st.fsm.StartHoldTimer()
  bgpMsg := data.(*packet.BGPMessage)
  st.fsm.ProcessUpdateMessage(bgpMsg)

—which is found around line 734 of fsm.go. The second line of code in this snippet is interesting; it’s a little difficult to understand what it’s actually doing. There are three crucial elements to figuring out what is going on here—

:=, in go, is a way of assigning information to a data structure. But what, precisely, is being assigned to bgpMsg from the data structure?

The * (asterisk) is a way to reference a pointer within a structure. We’ve not talked about pointers before, so it’s worth spending just a moment with them. The illustration below will help a bit.

Each letter in the string “this is a string” is stored in a single memory location (this isn’t necessarily true, but let’s assume it is for this example). Further, each memory location has a location identifier, or rather some form of number that says, “this is memory location x.” This memory locator is, of course a number—hence the memory locator itself can be assigned to a variable, which can then be treated as a separate object from the string itself.

This memory locator is called a pointer.

It should only make sense that the locator is called a pointer, because it points to the string. The question that should pop up in your head right now is—”but wait, if each letter is stored in a different memory location, then which memory location does the pointer actually point to?” If you’re trying to describe the entire string, the pointer would normally point to the first character in the string. You can, of course, also describe just some part of the string by pointing to a memory location that’s someplace in the middle of the string. For instance, you could point to just the part of the string “is a string” by finding the memory location of the second “i” in the string, and storing its memory location.

How can you find the location of a string, or some other data structure? You place an & (ampersand) in front of it. So, if you do this—

my-pointer = &a-string

Now I have the pointer, but how do I get back to the value from the pointer? Like this—

a-string-copy = *my-pointer

So the * takes the data that is pointed at by the pointer and pulls it out for assignment to another variable. In this case, then, this line of code—

bgpMsg := data.(*packet.BGPMessage)

—is—

  • taking the data located at packet.BGPMessage
  • assigning it to the data structure bgpMsg

In other words, this is copying the actual packet contents out of the buffer into which they were copied by the BGP FSM when the packet was received, and into another structure where they can be processed as an update. We need to look elsewhere for the code that removes messages from this data structure—we will most likely find it when we start looking through ProcessUpdateMessage, which is where we will start next time.

Just in time for Hallo’ween, the lucky thirteenth post in the BGP code dive series. In this series, we’re working through the Snaproute Go implementation of BGP just to see how a production, open source BGP implementation really works. Along the way, we’re learning something about how larger, more complex projects are structured, and also something about the Go programming language. The entire series can be found on the series page.

In the last post in this series, we left off with our newly established peer just sitting there sending and receiving keepalives. But BGP peers are not designed just to exchange random traffic, they’re actually designed to exchange reachability and topology information about the network. BGP carries routing information in updated, which are actually complicated containers for a lot of different kinds of reachability information. In BGP, a reachable destination is called an NLRI, or Network Layer Reachability Information. Starting with this code dive, we’re going to look at how the snaproute BGP implementation processes updates, sorting out NLRIs, etc.

When you’re reading through code, whether looking for a better understanding of an implementation, a better understanding of a protocol, or even to figure out “what went wrong” on the wire or in the network, the hardest part is often just figuring out where to start. In the first post in this series (as my dog would say, “much time ago—in fact, forever, I counted!”), I put the code on a box and ran some logging. The logging gave me a string to search for in the code base, which then led to the beginning of the call chain, and the unraveling of the finite state machine.

This time, being more familiar with the code base, we’re going to start from a different place. We’re going to guess where to start. 🙂 Given—

  • BGP works entirely around a finite state machine in this implementation (which is, in fact, how it operates in most implementations)
  • Updates are really only processed while the peering BGP speaker is in the established state
  • Receiving an update is what we’d probably consider an event

We’ll take a look at the code in the FSM for processing events while the peering relationship is in the established state to see what we can find. The relevant function in the FSM begins around line 673 (note the line numbers can change for various reasons, so this is just a quick reference that might not remain constant over time)—

func (st *EstablishedState) processEvent(event BGPFSMEvent, data interface{}) {
 switch event {

The top of this event gives us a large switch statement, each of which is going to relate to a specific event. The events are—

case BGPEventManualStop:
case BGPEventAutoStop:
case BGPEventHoldTimerExp:
case BGPEventKeepAliveTimerExp:
case BGPEventTcpConnValid:
case BGPEventTcpCrAcked, BGPEventTcpConnConfirmed:
case BGPEventTcpConnFails, BGPEventNotifMsgVerErr, BGPEventNotifMsg:
case BGPEventBGPOpen:
case BGPEventOpenCollisionDump:
case BGPEventUpdateMsg:
case BGPEventUpdateMsgErr:
case BGPEventConnRetryTimerExp, BGPEventDelayOpenTimerExp, BGPEventIdleHoldTimerExp, 
  BGPEventOpenMsgErr, BGPEventBGPOpenDelayOpenTimer, BGPEventHeaderErr:

 

In fact, the event we’re interested in is included in the list of events while a peer is in the established state. It’s around line 734—

case BGPEventUpdateMsg:
  st.fsm.StartHoldTimer()
  bgpMsg := data.(*packet.BGPMessage)
  st.fsm.ProcessUpdateMessage(bgpMsg)

case BGPEventUpdateMsgErr:
  bgpMsgErr := data.(*packet.BGPMessageError)
  st.fsm.SendNotificationMessage(bgpMsgErr.TypeCode, bgpMsgErr.SubTypeCode, bgpMsgErr.Data)
  st.fsm.StopConnectRetryTimer()
  st.fsm.ClearPeerConn()
  st.fsm.StopConnToPeer()
  st.fsm.IncrConnectRetryCounter()
  st.fsm.ChangeState(NewIdleState(st.fsm))

 

The first of the two case statements here deals with receiving an actual update; the second with an update error. The process of handling an update is really only three lines of code.

st.fsm.StartHoldTimer() resets the hold timer. Updates count as keepalives in BGP, allowing active peers to skip sending keepalive messages, saving bandwidth and processing power.

bgpMsg := data.(*packet.BGPMessage) copies the actual BGP message just received into a local data structure. This is a little bit of magic we’ll look at more closely in the next post.

st.fsm.ProcessUpdateMessage(bgpMsg) processes the actual BGP message that was just copied into a local structure. Again, chasing through this is something that will need to wait until a later post.

For now, we’ve found the tail of the update processing chain in the snaproute BGP code—more next time.

In last week’s post, the new BGP peer we’re tracing through the snaproute BGP code moved from open to openconfirmed by receiving, and processing, the open message. In processing the open message, the list of AFIs this peer will support was built, the hold timer set, and the hold timer started. The next step is to move to established. RFC 4271, around page 70, describes the process as—

If the local system receives a KEEPALIVE message (KeepAliveMsg (Event 26)), the local system:
 - restarts the HoldTimer and
 - changes its state to Established.

In response to any other event (Events 9, 12-13, 20, 27-28), the local system:
 - sends a NOTIFICATION with a code of Finite State Machine Error,
 - sets the ConnectRetryTimer to zero,
 - releases all BGP resources,
 - drops the TCP connection,
 - increments the ConnectRetryCounter by 1,
 - (optionally) performs peer oscillation damping if the DampPeerOscillations attribute is set to TRUE, and
 - changes its state to Idle.

 

For a bit of review (because this is running so long, you might forget how the state machine works), the way the snaproute code is written is as a state machine. The way the state machine works is there are a series of steps the BGP peer must go through, each step being represented by a function call in the fsm.go file. As the peer moves from one state to another, a function call “moves the pointer” from the current state to the next one, such that any event which occurs will call a different function, based on the current state. I know this is rather difficult to follow, but what this means, in practical terms, is that if the underlying TCP session is acknowledged or confirmed while the peer is in connected state, the following code from around line 272 in fsm.go are executed—

case BGPEventTcpCrAcked, BGPEventTcpConnConfirmed:
 st.fsm.StopConnectRetryTimer()
 st.fsm.SetPeerConn(data)
 st.fsm.sendOpenMessage()
 st.fsm.SetHoldTime(st.fsm.neighborConf.RunningConf.HoldTime,
  st.fsm.neighborConf.RunningConf.KeepaliveTime)
 st.fsm.StartHoldTimer()
 st.BaseState.fsm.ChangeState(NewOpenSentState(st.BaseState.fsm))

However, if this same event occurs—an open acknowledgement for the underlying TCP session is received—while the peer is in openconfirm state, a different set of code is executed, from around line 593 in fsm.go

case BGPEventTcpCrAcked, BGPEventTcpConnConfirmed:
 st.fsm.HandleAnotherConnection(data)

This is a general characteristic of any FSM—the event is matched against the current state to determine what action to take next. With all of this in mind, any event received while the peer is in openconfirm state will be processed by func (st *OpenConfirmState) processEvent, which is around line 558 is fsm.go. This code consists of a switch statement, which looks like this—

func (st *OpenConfirmState) processEvent(event BGPFSMEvent, data interface{}) {
 switch event {
  case BGPEventManualStop:
   ....
  case BGPEventAutoStop:
   ....
  case BGPEventHoldTimerExp:
   ....
  case BGPEventKeepAliveTimerExp:
   ....
  case BGPEventTcpConnValid: // Supported later
  case BGPEventTcpCrAcked, BGPEventTcpConnConfirmed: // Collision Detection... needs work
   ....
  case BGPEventTcpConnFails, BGPEventNotifMsg:
   ....
  case BGPEventBGPOpen: // Collision Detection... needs work
  case BGPEventHeaderErr, BGPEventOpenMsgErr:
   ....
  case BGPEventOpenCollisionDump:
   ....
  case BGPEventNotifMsgVerErr:
   ....
  case BGPEventKeepAliveMsg:
   .... 
  case BGPEventConnRetryTimerExp, BGPEventDelayOpenTimerExp, BGPEventIdleHoldTimerExp,
   ....
  }
}

 

I’ve cut out the actions taken in each case to make it easier to see the structure of the entire switch statement in one sweep. Most of these options are actually error conditions that take exactly the same steps. Let’s look at one to see what it does—

case BGPEventHoldTimerExp:
 st.fsm.SendNotificationMessage(packet.BGPHoldTimerExpired, 0, nil)
 st.fsm.StopConnectRetryTimer()
 st.fsm.ClearPeerConn()
 st.fsm.StopConnToPeer()
 st.fsm.IncrConnectRetryCounter()
 st.fsm.ChangeState(NewIdleState(st.fsm))

 

If the hold timer expires while the peer is in openconfirmed state—

  • A notification is sent by SendNotificationMessage; this will tell the peer that the session is being torn down, so the two speakers can have synchronized state
  • The connect retry timer is stopped, so the local BGP speaker will not try to reconnect until the peer has passed through the idle state; this prevents any problems that might result from stepping outside the BGP state machine
  • The peer connection is cleared; the just empties the various data structures associated with the peer, so old information isn’t carried into a new peering session
  • The peering connection is stopped by StopConnToPeer
  • The connection retry counter is incremented, which allows the operator to see how many times this peer has been torn down and restarted
  • The state of the peer is changed to idle

This set of actions only changes slightly from state to state; if you search for this set of steps, you’re likely to find it at least a few dozen times throughout fsm.go.

There is one other interesting point about this code worth mentioning. The folks at snaproute apparently haven’t implemented peer collision detection, as evidenced by the comments in the code itself. For instance—


  case BGPEventTcpConnValid: // Supported later
  case BGPEventTcpCrAcked, BGPEventTcpConnConfirmed: // Collision Detection... needs work
   ....
  case BGPEventTcpConnFails, BGPEventNotifMsg:
   ....
  case BGPEventBGPOpen: // Collision Detection... needs work

Each of these three events—receiving a new TCP connection towards a peer that is already in openconfirmed state, or receiving an open message from a peer that is already in openconfirmed state— represents an event that should not take place. What should the snaproute code do here? According to section 6.8 of RFC4271, it should—

Unless allowed via configuration, a connection collision with an existing BGP connection that is in the Established state causes closing of the newly created connection.

So when they eventually fill this bit of code in, you can be pretty certain what the actual code will do—it will reset the peering session in a way that’s similar to the other error code already present. The bit of code that’s interesting in the context of moving from openconfirmed to established are around line 627 in fsm.go

case BGPEventKeepAliveMsg:
 st.fsm.StartHoldTimer()
 st.fsm.ChangeState(NewEstablishedState(st.fsm))

 

The actual processing to move from openconfirmed to established is simple: if the local peer receives a keep alive message while in the openconfirmed state, move the peer to established.

As we’ve reached established state, the next step is to understand how updates are received and processed for this new peer.

In the last post in this series, we began considering the bgp code that handles the open message that begins moving a new peer to open confirmed state. This is the particular bit of code of interest—

case BGPEventBGPOpen:
  st.fsm.StopConnectRetryTimer()
  bgpMsg := data.(*packet.BGPMessage)
  if st.fsm.ProcessOpenMessage(bgpMsg) {
    st.fsm.sendKeepAliveMessage()
    st.fsm.StartHoldTimer()
    st.fsm.ChangeState(NewOpenConfirmState(st.fsm))
  }

We looked at how this code assigns the contents of the received packet to bgpMsg; now we need to look at how this information is actually processed. bgpMsg is passed to st.fsm.ProcessOpenMessage() in the next line. This call is preceded by the st.fsm, which means this function is going to be found in the FSM, which means fsm.go. Indeed, func (fsm *FSM) ProcessOpenMessage... is around line 1172 in fsm.go—

func (fsm *FSM) ProcessOpenMessage(pkt *packet.BGPMessage) bool {
 body := pkt.Body.(*packet.BGPOpen)

 if uint32(body.HoldTime) < fsm.holdTime {
  fsm.SetHoldTime(uint32(body.HoldTime), uint32(body.HoldTime/3))
 }

 if body.MyAS == fsm.Manager.gConf.AS {
  fsm.peerType = config.PeerTypeInternal—
 } else {
  fsm.peerType = config.PeerTypeExternal
 }

 afiSafiMap := packet.GetProtocolFromOpenMsg(body)
 for protoFamily, _ := range afiSafiMap {
  if fsm.neighborConf.AfiSafiMap[fusion_builder_container hundred_percent="yes" overflow="visible"][fusion_builder_row][fusion_builder_column type="1_1" background_position="left top" background_color="" border_size="" border_color="" border_style="solid" spacing="yes" background_image="" background_repeat="no-repeat" padding="" margin_top="0px" margin_bottom="0px" class="" id="" animation_type="" animation_speed="0.3" animation_direction="left" hide_on_mobile="no" center_content="no" min_height="none"][protoFamily] {
   fsm.afiSafiMap[protoFamily] = true
  }
 }

 return fsm.Manager.receivedBGPOpenMessage(fsm.id, fsm.peerConn.dir, body)
}

There are three “sections” in this function, each one takes care of a different thing. The first section—

if uint32(body.HoldTime) < fsm.holdTime {
 fsm.SetHoldTime(uint32(body.HoldTime), uint32(body.HoldTime/3))
}

This is fairly simple; it compares the received hold time with the locally configured hold time, setting the final hold time to the lower of these two numbers. This is in line with the most recent BGP specification (RFC 4271), section 4.2, which states—

This 2-octet unsigned integer indicates the number of seconds the sender proposes for the value of the Hold Timer. Upon receipt of an OPEN message, a BGP speaker MUST calculate the value of the Hold Timer by using the smaller of its configured Hold Time and the Hold Time received in the OPEN message.

The second section of this code is a little more confusing—

if body.MyAS == fsm.Manager.gConf.AS {
 fsm.peerType = config.PeerTypeInternal
} else {
 fsm.peerType = config.PeerTypeExternal
}

This obviously somehow sets the type of peer, internal (iBGP) or external (eBGP), but how precisely does this work? The if statement is the crucial point here; if the statement is true, then first branch is executed, which sets the peer type to iBGP. If the <codeif statement evaluates as !true, the second branch is executed, setting the peer type to eBGP.

Note the difference between = and ==. In both C and Go, = assigns the value or the contents of the variable on the right side of the = to the variable on the left side. The == operator compares the two values, returning a 0 if the values (or contents of the two variables) are the same, and 0 if the values (or contents of the two variables) does not match.

The if statement itself is comparing body.MyAS to fsm.Manager.gConf.AS; what do these contain? body.MyAS is an element of the body structure, which is taken from the packet contents at the beginning of the function by the line body := pkt.Body.(*packet.BGPOpen). body.MyAS, is, then the AS number of the remote peer. On the other hand, fsm.Manager.gConf.AS is being taken from the local fsm state, in particular the configuration state for the local peer process. Given these two definitions, these lines of code make sense; if the local and remote AS match, then the neighbor type should be set to iBGP. If they don’t match, then the neighbor type should be set to eBGP.

The final section of code is the most complex of the three—

afiSafiMap := packet.GetProtocolFromOpenMsg(body)
for protoFamily, _ := range afiSafiMap {
 if fsm.neighborConf.AfiSafiMap[protoFamily] {
  fsm.afiSafiMap[protoFamily] = true
 }

The first line of code here grabs a list of the address families (AFIs)/subaddress families (SAFIs) supported by the peer, as reported in its open message and places them into a list. The second line of code, for protoFamily, _ := range afiSafiMap {, walks through a list of each possible protocol family, checking each one to see if it’s included in the peer’s list of AFIs/SAFIs. If a particular AFI/SAFI is included in the peer’s supported list, then the AFI/SAFI is set to true, which will serve as an indicator to any other process interacting with this particular peer which specific AFIs/SAFIs are supported.

At this point, the open message received from the new peer has been processed. Once ProcessOpenMessage finishes, it will return to the main FSM, and to the remainder of the switch statement above.

st.fsm.sendKeepAliveMessage() will now send the first TCP keepalive to this new peer; as there is no timer for sending keepalive messages set at this point, and there is no way to tell how long processing the open message has taken, the safest thing to do is to send this first keepalive message.

st.fsm.StartHoldTimer() will now start a hold timer. If this timer expires, the peer will be brought down—this is something look at later, when we consider various error conditions the various bits of code might encounter, and the expiration (waking up) of various timers set along the way.

Finally, st.fsm.ChangeState(NewOpenConfirmState(st.fsm)) sets the current state to open confirm, bringing us one step closer to exchanging databases, and transitioning this new peer into the normal state for BGP neighbors.

We’ll consider the next step in this process in the next code dive.
[/fusion_builder_column][/fusion_builder_row][/fusion_builder_container]