snaproute Go BGP Code Dive (8): Moving to Open

Last week we left off with our BGP peer in connect state after looking through what this code, around line 261 of fsm.go in snaproute’s Go BGP implementation—

func (st *ConnectState) processEvent(event BGPFSMEvent, data interface{}) {
  switch event {
  ....
    case BGPEventConnRetryTimerExp:
      st.fsm.StopConnToPeer()
      st.fsm.StartConnectRetryTimer()
      st.fsm.InitiateConnToPeer()
....

What we want to do this week is pick up our BGP peering process, and figure out what the code does next. In this particular case, the next step in the process is fairly simple to find, because it’s just another case in the switch statement in (st *ConnectState) processEvent

case BGPEventTcpCrAcked, BGPEventTcpConnConfirmed:
  st.fsm.StopConnectRetryTimer()
  st.fsm.SetPeerConn(data)
  st.fsm.sendOpenMessage()
  st.fsm.SetHoldTime(st.fsm.neighborConf.RunningConf.HoldTime,
    st.fsm.neighborConf.RunningConf.KeepaliveTime)
  st.fsm.StartHoldTimer()
  st.BaseState.fsm.ChangeState(NewOpenSentState(st.BaseState.fsm))
....

This looks like the right place—we’re looking at events that occur while in the connect state, and the result seems to be sending an open message. Before we move down this path, however, I’d like to be certain I’m chasing the right call chain, or logical thread. How can I do this? This code is called when (st *ConnectState) processEvent is called with an event called BGPEventTcpCrAcked or BGPEventTcpConnConfirmed. Let’s chase down where these events might come from to see if this really is the next step in the call chain we’re trying to chase.

Note: Sometimes it’s easier to chase from the end result back towards the caller, and sometimes it’s not. There’s no way to know which is which until you have more experience in chasing through code. It takes time and practice to build these sorts of skills up, just like many other skills—but in chasing through code, you’re not only learning the protocols better, you’re also learning how to code better.

To find what we’re looking for, we can search through the project files for some instance of BGPEventTcpCrAcked, which seems to be the result of receiving an ACK for a TCP session initiated by BGP. We find a few places in fsm.go, as always, but most of them are using the event, rather than causing (or throwing) it—

272: case BGPEventTcpCrAcked, BGPEventTcpConnConfirmed:
371: case BGPEventTcpCrAcked, BGPEventTcpConnConfirmed:
475: case BGPEventTcpCrAcked, BGPEventTcpConnConfirmed:
592: case BGPEventTcpCrAcked, BGPEventTcpConnConfirmed:
709: case BGPEventTcpCrAcked, BGPEventTcpConnConfirmed:

Until we get to this one—

case inConnCh := 

What does this do? This is a little complex, but let’s try to work through it. When starting a new peer, a port was cloned on which to send TCP packets to the peer. Since the port is cloned to a port the main FSM function is watching—(fsm *FSM) StartFSM()—the main FSM function is going to be notified of any inbound TCP packets received on the local device. When one specific sort of packet is received, an acknowledgement in a new TCP session, the main FSM function is called, resulting in case inConnCh := <-fsm.inConnCh: being called. This, in turn, calls (st *ConnectState) processEvent with BGPEventTcpCrAcked.

If you followed that, you know this verifies what it looked like in the first place—the code above is, in fact, the correct code to process the next phase of peering. The call chain looks something like this—

  • (fsm *FSM) StartFSM() is watching the TCP ports for any new packets
  • When (fsm *FSM) StartFSM() recieves a new TCP ACK, it falls through to case inConnCh := <-fsm.inConnCh: in the switch statement
  • This, in turn, calls (st *ConnectState) processEvent with BGPEventTcpCrAcked
  • (st *ConnectState) processEvent falls through to the case statement case BGPEventTcpCrAcked, BGPEventTcpConnConfirmed, which then calls the correct functions to move beyond connect state

It’s okay if you have to read all of that several times—FSMs (Finite State Machines—remember?) can be very difficult to follow. This means we need to chase down each of these functions to find out how this implementation of BGP actually moves beyond the open state—

  • st.fsm.StopConnectRetryTimer()
  • st.fsm.SetPeerConn(data)
  • st.fsm.sendOpenMessage()
  • st.fsm.SetHoldTime(st.fsm.neighborConf.RunningConf.HoldTime, st.fsm.neighborConf.RunningConf.KeepaliveTime)
  • st.fsm.StartHoldTimer()
  • st.BaseState.fsm.ChangeState(NewOpenSentState(st.BaseState.fsm))

It’s pretty obvious what StopConnectRetryTimer does—it stops BGP from continuing to try to connect to this peer. Since the peer has acknowledged the initial TCP packet, we shouldn’t keep trying to send it initial TCP packets. SetPeerConn is a bit harder—

func (fsm *FSM) SetPeerConn(data interface{}) {
  if fsm.peerConn != nil {
    return
  }
  pConnDir := data.(PeerConnDir)
  fsm.peerConn = NewPeerConn(fsm, pConnDir.connDir, pConnDir.conn)
  go fsm.peerConn.StartReading()
}

This just does some general logging (which I’ve removed for clarity), and then tells the main process (through the FSM call) to start reading packets off this new peer’s data structure. I’m not going to dive into these functions deeply here.

Next time, we’ll look at the four remaining functions, as these are where the action really is from a BGP perspective.