VRRP best practices, preempt, tracking, fabric routing, accept-mode, host mobility

  • 0
  • 3
  • Question
  • Updated 3 months ago
  • Answered
Hi There,

Apologies for this question being a little long....

Just looking into the communities thoughts around some best practices around configuring VRRP.

Preempt

By default the preempt delay is 0 seconds and the preempt to master would therefore be 3 hello's, which are sent every 1 sec. So my question is would a 3 second preempt be deemed sufficient? I've seen some set to 90 seconds, the logic for that is giving the network a chance to stabilise before going to master to stop flapping. Is there a formula you could use, what if you have more than 2 routers in the VRID.

This article with VRRP and FREB shows a prempt delay of 5 seconds:

https://gtacknowledge.extremenetworks.com/articles/How_To/An-example-of-VRRP-fabric-routing-configur...

Accept Mode

In EOS I have in the past turned accept-mode on so that you are able to ping the VRRP VIP address, but in EXOS you do not need to do this. So wondering what other practical / best practice reasons there would be for turning it in. One example might be to support NTP over the VIP as per the following GTAC article:

https://gtacknowledge.extremenetworks.com/articles/How_To/How-to-Configure-NTP-Support-over-VIP-of-V...

Fabric Routing

This was mentioned above, but given its own heading for comment. In that example preempt delay was set to 5 seconds, so just wondering if the inclusion of fabric routing, and even the number of participating routers in the same VRID should be something to consider?

Tracking

VRRP can be tracked via pings, IP routes and VLANs. So there is probably some obvious aspects of when that might be a good idea, but interested in some practical examples and / or best practices. As as an example the GTAC case below shows how to configure VLAN tacking if a VLAN fails so that it will failover to the other one, which sounds great but could that be considered good practice to do that on every VLAN?

https://gtacknowledge.extremenetworks.com/articles/Solution/Configuring-VRRP-and-VLAN-tracking

Host Mobility

An explanation for this is given here:

http://documentation.extremenetworks.com/exos/EXOS_21_1/VRRP/c_vrrp-host-mobility.shtml

I can see this possibly making sense when using fabric routing mode and when multiple routers are in the same VRID. In fabric routing mode with MLAG my perception would be that traffic could end up at any switch in the MLAG pair, determined by the hashing algorithm configured on the LAG and then be routed from there. Both routers would essentially be advertising the same subnet so asymmetric routing could take place as traffic could land back at the other router (other switch in MLAG pair). Whether that actually matters though I don't think, because the switch would see the device directly attached through the other link in the LAG and therefore directly forward the request onto the client instead of passing it back to the originating router.

Interested in your thoughts.

Many thanks in advance

Photo of Martin Flammia

Martin Flammia

  • 5,480 Points 5k badge 2x thumb

Posted 6 months ago

  • 0
  • 3
Photo of Erik Auerswald

Erik Auerswald, Embassador

  • 12,742 Points 10k badge 2x thumb
Hi Martin,

thanks for posting a good question. :-) I'll try to add my 2¢ to your thoughts.

  1. Preemt Delay: The idea is to give routing protocols etc. a time to start up and converge. This is generally a good idea, but I am not sure if this still helps in combination with fabric routing.
  2. Accept Mode: I would enable that only if the VRRP address is supposed to provide some kind of service. That is usually not the case for switches. With switches or routers it is often better to bind services to an anycast address in combination with dynamic routing.
  3. Fabric Routing: I would generally enable this for symmetric setups, e.g. MLAG. But there might be some time during fail-back when the returning switch will use layer 3 forwarding, but does not yet have a complete routing table, which may result in traffic blackholing. I have never analyzed the exact behavior in that case.
  4. Tracking: Important for optimal traffic flow in asymmetric setups, e.g. with two routers that are connected to one of two core or distribution switches only. Fabric Routing kind of defeats the idea of interface tracking, because the backup router will forward packets anyway.
  5. Host Mobility: That is part of a specific solution to VM mobility across data centers. It can be used if VRRP with Fabric Routing is used to implement an anycast gateway with optimal return path for leaf & spine designs with routing on the leaf switches.
Thanks,
Erik
Photo of Grosjean, Stephane

Grosjean, Stephane, Employee

  • 12,280 Points 10k badge 2x thumb
For point 3, since 22.3 and a recent patch of 21.1, we have introduced a "restore timer" for MLAG ports. In the event an MLAG peer restart, the MLAG ports will wait that timer before enabling themselves, to give time to L3 protocols to converge.
Photo of Erik Auerswald

Erik Auerswald, Embassador

  • 12,742 Points 10k badge 2x thumb
Yes, that is a useful solution for MLAG setups. For symmetric non-MLAG (e.g. L3) setups Fabric Mode is useful during steady state, but something like a "Fabric Mode Forwarding Delay" for smooth fail-back would be nice. :-)
Photo of Erik Auerswald

Erik Auerswald, Embassador

  • 12,742 Points 10k badge 2x thumb
An equivalent of the EOS (S-Series) command vrrp interface-up-delay VRID SECONDS would be a great addition to EXOS, since it addresses just this problem.
Photo of Martin Flammia

Martin Flammia

  • 5,480 Points 5k badge 2x thumb
Hi Stephane / Erik,

Was meant to say thanks for posting.... 

Interestingly I am currently experiencing an issue that I think Erik is describing. Was running version 21.1 and now 22.4.

Here is a very high level overview of the network:


If Core 1 is rebooted whilst pinging Core 3 & Core 4 there is 7 dropped pings before service is restored, but pinging a device that hangs off Core 1 & Core 2, say a LAG'ed Switch or a PC hanging off it, its fine.

If Core 2 is rebooted for exactly the same scenario only one ping is dropped in each case.

Below are some observations I made:

Core 1 powered off, the show ospf is the neighbour table before power cycle (need to capture one during the process)
Farrer-Core2.1 # show ospf neighbour 

Neighbor ID     Pri State              Up/Dead Time             Address         Interface
          BFD Session State
==========================================================================================
10.0.255.201      1 2WAY      /DROTHER 00:02:43:15/00:00:00:04  10.0.254.145    Building-Building     Goes Down
10.0.255.203      1 FULL      /DR      00:02:43:15/00:00:00:05  10.0.254.147    Building-Building      
10.0.255.204      1 FULL      /BDR     00:02:43:20/00:00:00:00  10.0.254.148    Building-Building      
10.0.255.201      0 FULL      /DROTHER 00:02:43:14/00:00:00:04  10.0.254.201    Core1-Core2   Goes Down

01/06/2018 15:20:36.66 <Noti:ospf.neighbor.ChgState> Changing the state of neighbor rtid 10.0.255.201 ipa 10.0.254.145 to state = DOWN due to inactivity timer expiry.
01/06/2018 15:20:36.66 <Noti:ospf.neighbor.ChgState> Changing the state of neighbor rtid 10.0.255.201 ipa 10.0.254.201 to state = DOWN due to inactivity timer expiry.
Core 2 powered off, the show ospf is the neighbour table before power cycle (need to capture one during the process)
Farrer-Core1.1 # show ospf neighbor 

Neighbor ID     Pri State              Up/Dead Time             Address         Interface
          BFD Session State
==========================================================================================
10.0.255.202      1 2WAY      /DROTHER 00:02:45:52/00:00:00:01  10.0.254.146    Building-Building      Goes Down
10.0.255.204      1 FULL      /BDR     00:02:52:17/00:00:00:07  10.0.254.148    Building-Building      New Adjacency      
10.0.255.203      1 FULL      /DR      00:02:52:21/00:00:00:02  10.0.254.147    Building-Building
10.0.255.202      0 FULL      /DROTHER 00:02:45:51/00:00:00:01  10.0.254.202    Core1-Core2            Goes Down
Farrer-Core1.3 # show log

01/06/2018 15:27:04.30 <Noti:ospf.neighbor.ChgState> Changing the state of neighbor rtid 10.0.255.204 ipa 10.0.254.148 to state = FULL due to Loading done.
01/06/2018 15:27:04.30 <Noti:ospf.neighbor.ChgState> Changing the state of neighbor rtid 10.0.255.204 ipa 10.0.254.148 to state = LOADING due to exchange done event.
01/06/2018 15:27:04.30 <Noti:ospf.neighbor.ChgState> Changing the state of neighbor rtid 10.0.255.204 ipa 10.0.254.148 to state = EXCHANGE due to negotiation done event.
01/06/2018 15:26:59.30 <Noti:ospf.neighbor.ChgState> Changing the state of neighbor rtid 10.0.255.204 ipa 10.0.254.148 to state = EX_START due to AdjOK event.
01/06/2018 15:26:58.30 <Noti:ospf.neighbor.ChgState> Changing the state of neighbor rtid 10.0.255.202 ipa 10.0.254.146 to state = DOWN due to inactivity timer expiry.
01/06/2018 15:26:58.30 <Noti:ospf.neighbor.ChgState> Changing the state of neighbor rtid 10.0.255.202 ipa 10.0.254.202 to state = DOWN due to inactivity timer expiry.

Each core pair is running MLAG, as shown by the ISC link. The two cores are connected via a common LAG, and OSPF broadcast is configured between them with Core 1 as the DR and Core 2 the BDR.

VRRP is configured so that there is a common VRID between all four switches, for each of the common VLANs that exist between them all. There is also a VRID for VLANs that only exit on one pair, and another VRID for the VLANs that exist on the other. Fabric routing mode is enabled on all the VRRP instances.

What I think I am seeing is what Erik is describing, in that there is some delay, perhaps related to learning new routes (in relation to OSPF) before things normalise and what is causing this 7 second delay.

Pretty sure the same thing is happing when doing the same test on Core 3 and Core 4.

1) If this is indeed what's happening, do you know what I can do about it?
2) Could I adjust the restore timer?
3) I haven't looked it up, but what is that restore timer default time?

Many thanks in advance
(Edited)
Photo of Erik Auerswald

Erik Auerswald, Embassador

  • 12,742 Points 10k badge 2x thumb
Hi Martin,

although you have provided lots of info, I cannot positively determine if the problem is related to VRRP forwarding starting too early on a booting switch. From where are you sending the ping packets that are lost? Do they need to be routed across the the two tier MLAG between the four switches? If yes, then this may well be missing routing information.

You can try enabling the MLAG restore delay (it is disabled by default, the default time if just enabled is 30s). You might want to add a point-to-point OSPF transfer VLAN/network between the MLAG peers using the same link as the ISC VLAN to give the routing process time to fill the FIB before MLAG ports are enabled and traffic is received on the booting switch. Please note that this can only help if both source and destination of the ping packets are connected via MLAG ports.

For a general solution active/passive VRRP with preempt delay (do not use this with MLAG!) or a forwarding delay for fabric mode VRRP backup routers is needed. The latter is not documented to exist in EXOS or EOS.

BTW, it might be good to have the DR on one MLAG pair and the BDR on the other, but I'm not sure, this is just a hunch. You can configure OSPF priorities to achieve this.

Thanks,
Erik
Photo of Martin Flammia

Martin Flammia

  • 5,480 Points 5k badge 2x thumb
Hi Erik,

Thanks for taking the time to look at this closely.

The OSPF link between the MLAG pair (across ISC /30 point-to-point) is already in place, and currently there is no preempt, but good to hear your feedback as I wasn't too sure whether I should add it... I did, for testing but made no difference.

The pinging is taking place off a PC that is attached to a stack that is hanging off one MLAG pair to a PC off a stack attached to the other Core MLAG pair. When pinging to different places the drops seem to be taking place between these MLAG pairs.

Had considered moving the DR and splitting that, and the BDR between different MLAG pairs but couldn't make sense of why that would benefit - although that's an absolutely worthwhile thing to try, so will follow your guidance and let you know the outcome.

Will play with the MLAG restore delay also, depending on results.

Many thanks