Using UPM and BFD to control routing behavior in an MLAG/VRRP topology

  • 1
  • Idea
  • Updated 3 years ago
  • (Edited)
This is a request for comments on an idea to solve a problem.  We use UPM to trigger BFD to then change routing behavior.  An event occurs in one place and this mechanism triggers an event in another.

The asynchronous forwarding nature inherent in an MLAG with VRRP topology is well known and well tested.  In a highly-meshed design, the number of actively forwarding pathways can be numerous and difficult to troubleshoot.  In the topology shown below, the “LAN Core” consists of 4 switches in a cluster of 4 VRRP routers and 2 MLAG pairs.  Host VLANs exist only amongst ToR Stacks and this LAN core.  As the configuration is MSTR/BKUP/BKUP/BKUP, only one forwarder will be active at any time.  


The routing configuration is static between the “Upstream routers” and the “LAN Core” routers, protected by BFD sessions.  This is done to reduce convergence time to the lowest possible.  The upstream routers learn the rest of their routes via OSPF.  There are two static routes on each upstream router for each host VLAN; the higher priority gateway is the VIP of the VRRP cluster and the lower priority is to the other upstream router.  Testing showed that this design, paired with BFD provided very fast and reliable forwarding under any reasonable (where at least one pathway existed) circumstance.   

Extensive testing showed very positive results with just one scenario failed.  If either of the ISC links went completely down, we saw intermittent loss for some North-South flows.  East-West flows were unaffected. When investigating the root cause, we determined that as the upstream router was unaware of the loss of the ISC, it was continuing to load balance packets sourced in “the rest of network” destined for the host VLANs over the two ports connected to the VRRP cluster.  When the BKUP router in the cluster received a packet destined for a host VLAN and had no (viable) pathway to the VRRP MSTR, then it would drop the packet.  Loss was intermittent due to the hashing at the upstream router only sending some flows down the physical port that would ultimately be a dead-end.  When forwarding on the opposite side, there was no effect as the opposite pathway would always resolve to the MSTR and would be forwarded.

Several solutions were considered, such as:

  • Adding redundant pathways for the ISC
  • Convert to a dynamic routing protocol throughout
  • Convert to Active/Active VRRP

The primary design requirement was for very reliable and transparent interconnection of lots of links (pun intended), we looked for a solution to this admittedly extreme case and decided that if we could inform the upstream router to prefer the opposite pathway when the locally-connected MLAG-peer was down.  We used the log event VSM.RmtMLAGPeer{Up/Down} to trigger BFD to invalidate the static route on the upstream router, thereby forcing traffic to the opposite side.  This would result in temporarily inefficient forwarding, but that was determined to be acceptable in this case.

We considered other events (link down, LACP, etc) as triggers but the reliable event was the peer message.  We also attempted to “disable bfd” to gracefully bring the session down; this did not work as EXOS took no action when the bdf socket was ended by a close message.  We changed to simply blocking UDP port 3784 at the MLAG-Peers and this did the trick.

The very simple configuration is shown below:

#
# Module acl configuration.
#
create access-list disbfd " protocol udp ; destination-port 3784 ;" " deny  ;" application "Cli"
#
# Module ems configuration.
#
create log filter upm_ISC_LinkDown
create log filter upm_ISC_LinkUp
configure log filter upm_ISC_LinkDown add events VSM.RmtMLAGPeerDown 
configure log filter upm_ISC_LinkUp add events VSM.RmtMLAGPeerUp 
create log target upm disable_bfd2BB
enable log target upm disable_bfd2BB
configure log target upm disable_bfd2BB filter upm_ISC_LinkDown severity Info
create log target upm enable_bfd2BB
enable log target upm enable_bfd2BB
configure log target upm enable_bfd2BB filter upm_ISC_LinkUp severity Info
#
# Module upm configuration.
#
create upm profile disable_bfd2BB
conf access-list add disbfd last priority 0 zone SYSTEM vlan LAN-Interco-1 egress
.
create upm profile enable_bfd2BB
conf access-list del disbfd all

Please reply with questions and comments.  I would like to refine this to a knowledgeable article after it is clarified to a point of simple understanding.

- Mike Lane, SE in Bavaria

Photo of Lane, Mike

Lane, Mike, Employee

  • 904 Points 500 badge 2x thumb

Posted 3 years ago

  • 1
Photo of Grosjean, Stephane

Grosjean, Stephane, Employee

  • 13,676 Points 10k badge 2x thumb
(something else)
edit: I see you edited your post while I was answering...

First of all, it's highly recommended that you configure alternate ipaddress for the ISC, it helps for your convergence scenario.
Then I would consider also Active/Active VRRP. Why aren't you doing it?

If for some reason you need to react specifically to some remote events, then we can do it via Python App (15.7+). We already have such App that can sync several switches.
(Edited)
Photo of Lane, Mike

Lane, Mike, Employee

  • 904 Points 500 badge 2x thumb
We are trying to convince the customer of the value and reliability of this.  This deployment is a complete replacement of a Cisco environment and they wanted to emulate the previous configurations closely and change later.  We will likely be going Active/Active next.  

Python was definitely an option here, but you can see, this choice was very simple and easy to sell to the customer.

Alternate IP for the ISC would not necessarily work here because there would still be no pathway for traffic through the BKUP router, unless I missed something :)