ELRP loop warnings over a 2-Tier MLAG architecture

  • 0
  • 1
  • Problem
  • Updated 3 years ago
  • Solved
  • (Edited)
are having a problem in our Data centre where we have multiple 2-tier MLAG
setup between the 2 CORE’s and pairs of distribution switches.

Two 480s connected to each other via ISC, ports
51,52 (lag L3_L4 lacp - Master port 51).
Two BDs connected to each other via ISC, ports 1-4

Two 480s connected to the core via MLAG, ports
53,54 (lag L3_L4 lacp - Master port 54)

Emulator connected to Two 480s via MLAG


like this;

Across the Data Centre we have a VLAN used for
Emulators, and this is the only VLAN where this issue is occuring, these
Emulators themselves have built in switches which are LAG'ed to each of the
Distribution layer switches as above.

ELRP has been enabled on all ports, with the
action to disable the port upon seeing a loop. ports 51 (ISC) and 54 (MLAG) are
exception ports on the distro switches, so ELRP is configured on them but set
as exceptions as to not disable them upon seeing a loop.

Now, when we have a loop, we see it ingress on
port 51 (ISC) and egress out port 54 (MLAG). Sometimes in comes in and out the
same port 54.

This occurs randomly and seems to go around the
data centre from one pair of MLAG peered 480s to another.

Now, I'm thinking from what I read and the way I
am looking at the network structure that ELRP doesn't need to be enabled on the
uplink ports to the CORE, and the ISC links. Because by design there is a
natural loop in a 2-Tier MLAG setup. But then the ISC is supposed to block ELRP
messages isn't it? It says the following in the concept guide "flood and multicast
traffic will traverse the ISC but will be dropped from MLAG peer port transmission
by the ISC blocking filter mechanism." (EXOS Concept Guide 15.3, p.

And the ingress port the ELRP loop is reporting is
sometimes port 51 (ISC). Also, when it reports that a loop egressed on port 54,
and ingressed on port 54, I think this could be going out 53 (to CORE1) and in
on 54 (from CORE2) but it only reports 54 as it is the master port of the LAG
created to go to each CORE1 and CORE2. This would make sense as I don't believe
an ELRP packet would come in and then be sent out the same port. This would
again mean that it is traversing the ISC link between the 2 core switches.

So it seems to be that on occasion, ELRP loops go
over the ISC links, thus making it seem like a loop has occured. As it only
happens on this single VLAN, I think it is due to the heavy traffic occuring on
this VLAN which causes this to happen, but that’s just a hunch. Can anyone with
more knowledge than me confirm this for me or correct me as this is beginning
to drive me crazy.

Looking forward to your responses.

Photo of Ezeddean Osman Almansouri

Posted 3 years ago

  • 0
  • 1
Photo of Grosjean, Stephane

Grosjean, Stephane, Employee

  • 13,566 Points 10k badge 2x thumb

a network layout would be helpful.

Are you sure an emulator is not dual-homed to a MLAG pair without its LAG configuration? This is a typical situation where a loop could happen, and ELRP couldn't help, unless you go to 16.1 and use ELRP egress.

Photo of Paul Russo

Paul Russo, Alum

  • 9,694 Points 5k badge 2x thumb
Hello Ezeddean

Traffic is not blocked on the ISC link it is always blocked on the MLAG Peer Port.  The ISC is what is used to communicate between the two switches.  If it sees a broadcast or multicast that is coming across the ISC it tells the local MLAG Peer port to drop it.

As Stephane said a diagram would be helpful.
Photo of Grosjean, Stephane

Grosjean, Stephane, Employee

  • 13,516 Points 10k badge 2x thumb
Assuming there's no configuration error, this diagram is a valid (and quite typical) one.

The one reason I see for a loop to be detected, is the server losing its LAG. In such a case, ELRP on the MLAG Peer I would detect the loop. However, usual ELRP (prior to EXOS 16.1) could only break the loop by disabling the ISC, which is bad, and which is prevented as you have excluded it: thus only a warning.

Starting with EXOS 16.1, you can change ELRP behavior so that it can block another port than the ingress. In your example, that would block one of the port towards the server/switch. This is ELRP egress.

The emulator being a switch, it gives some room for a config mistake. Are you sure there isn't one emulator without a LAG configured?

We can also double check your MLAG config if you can paste it here. What EXOS release are you using?
Hi Grosjean,

Please see the network diagram Above. Here is a screenshot form NetSight so you can get an idea of how many of these 2-tiers are setup.

How would a multi-homed emulator cause this though? The emulator has actually got an in-built switch in it which is setup as a LAG so it's not like a multi NIC server. The other strange thing is that the elrp loop warnings rarely ever appear on the edge ports feeding the emulators, it's mostly on the MLAG peer ports over the ISC link, or on the uplink MLAG ports going to the 2 CORE's.