MLAG/2-Tier/VRRP problem

  • 0
  • 2
  • Question
  • Updated 4 years ago
I keep finding new ways to get myself into trouble...

I had a two-tier setup like this (Exos 15.4):

Two 480s connected to each other via ISC, ports 1,2 (lag).
Two BDs connected to each other via ISC, ports 1-4 (lag)
Each 480 has one port connected to each of the BDs (lag), which is then MLAGed on the BDs.
In reverse, the ports on the each BD to the 480s are lagged and then MLAGed on the 480s.

From what I understand, that's the way to do this.

A vlan named "Internet" is part of all four switches
All switches have an IP address in vlan "Internet"
480-1: 10.0.0.244
480-2: 10.0.0.245
BD-1: 10.0.0.252
BD-2: 10.0.0.253

Additionally I have VRRP configured on the BDs (prio 250 on BD-1, default prio on BD-2)
create vrrp vlan Internet vrid 1
configure vrrp vlan Internet vrid 1 priority 250
configure vrrp vlan Internet vrid 1 add 10.0.0.254 enable vrrp vlan Internet vrid 1
I needed to switch the ISC on the 480s from ports 1,2 to ports 11,12 so I created a new share/lag, added that port to vlan "Internet" disconnected the cables in port 1 and 2 on both 480s and reconnected them in ports 11 and 12 on the 480s.

All hell broke loose on the BDs - a flood of notices from BD-1 (none from BD-2):
VRRP.Advert: MSM-A: Advert for VR on vlan Internet vrid 1 ignored: ignoring lower priority advert

After way too long of trying to figure out what I did wrong, I just disabled ports 11,12 on the 480s and everything returned to normal. Sadly, people who needed to use the Virtual Router IP to get anywhere beyond the "Internet" vlan were, well, effectively shut down :(

I have no clue what happened. All I can think of is that somehow when I had the 480s configured with the ISC in ports 1,2 something may have blocked things in a way that VRRP broadcasts didn't go wrong/circular/oddly? And that disconnecting/reconnecting in ports 11,12 brought connections up in a way that perhaps caused the VRRP flood?

I'm not even sure if my assumption is anywhere near correct that the BD somehow saw its own VRRP broadcast/advertisement from the wrong port (going in circles?) I've seen some forum posts that make me think that's what's happening and that I should perhaps policy-block VRRP broadcasts on the ports between the BDs and the 480s?

I would need my ISC between the 480s back some day, but I need to make sure that I don't break things in the process.

The good news is that I have another set of switches that I can play with and rebuild my considerably dented confidence.

I guess my questions are:
- did I set up the two-tier config correctly?
- am I even remotely close with my guess about VRRP broadcasts kicking me in the shins?
- How do I fix this mess?
- Eventually I need to also enable VRRP between the 480s. If I'm right with my guess about VRRP broadcasts, I think the "BD-fix" would be analogous to the "480-fix" (policy blocking VRRP broadcasts outside of the 480s / outside of the BDs)

And lastly: why did things not break way sooner :( But that could've been mere "luck" on which links were active when (and it's actually more of a curiosity question)

Thanks for your help!

Photo of Frank

Frank

  • 3,836 Points 3k badge 2x thumb
  • lost and confused

Posted 4 years ago

  • 0
  • 2
Photo of rbrt_weiler

rbrt_weiler

  • 834 Points 500 badge 2x thumb

You build something like that, with each of the # being a switch, top row 480s, bottom row BDs?

#-#
|X|
#-#

In that case you need MLAG for the links between the 480s and BDs on _both_ sides. To me it seems like your VRRP multicasts are looping, but I'm no expert on VRRP.

Photo of Frank

Frank

  • 3,836 Points 3k badge 2x thumb
Yes, I have MLAGs configured on the BDs going "down" to the 480s with the other BD being the peer. And I have MLAGs on th 480s going "up" to the BDs with the other 480 being the peer.
Photo of Stephane Grosjean

Stephane Grosjean

  • 762 Points 500 badge 2x thumb
Hi Frank,

You wrote:
"I needed to switch the ISC on the 480s from ports 1,2 to ports 11,12 so I created a new share/lag, added that port to vlan "Internet" disconnected the cables in port 1 and 2 on both 480s and reconnected them in ports 11 and 12 on the 480s."

Was that new LAG part of the ISC? You don't mention it, and obviously if it wasn't declared as a member of the ISC, you just created a loop. Likewise, if you had both LAG up in the ISC control vlan, that would be a loop as well.

One way to migrate an ISC LAG to another LAG could be to use SRP, temporarily.
Photo of Frank

Frank

  • 3,836 Points 3k badge 2x thumb
If I remember correctly, then yes, it was part of the ISC vlan and nothing else. I also disconnected the old ISC lag completely before connecting the new lag (physical fiber and SPFs) "to keep things easy" :( Thought it was all safe since the 480s don't have anything else connected to them yet.
Photo of Frank

Frank

  • 3,836 Points 3k badge 2x thumb
Tried my best to reproduce the issue with a pair of 670s and a pair of 460s. The best I could come up with is that somehow I managed to get the "regular" vlans connected to the ISC (shared) port and somehow failed to get that port properly attached to the ISC vlan - which of course results in spectacular loops if the ISC vlan doesn't do its job ;)

The safest/sanest way I figure is to remove the ISC port from all participating vlans (and the ISC vlan), then add the new port(s) to the ISC vlan, then add that port to all other vlans. Nice and orderly!

Stephane may have been onto something - and even though physically both ports (old shared ISC ports and new shared ISC ports) weren't up at the same time, they were part of the ISC vlan and were part of the "regular" vlan(s).

Thank you all for looking at it and all your help