cancel
Showing results for 
Search instead for 
Did you mean: 

ELRP loop warnings over a 2-Tier MLAG architecture

ELRP loop warnings over a 2-Tier MLAG architecture

Ezeddean_Osman_
New Contributor
We are having a problem in our Data centre where we have multiple 2-tier MLAG setup between the 2 CORE’s and pairs of distribution switches.

Two 480s connected to each other via ISC, ports 51,52 (lag L3_L4 lacp - Master port 51).
Two BDs connected to each other via ISC, ports 1-4 (lag)

Two 480s connected to the core via MLAG, ports 53,54 (lag L3_L4 lacp - Master port 54)

Emulator connected to Two 480s via MLAG



Looks like this;



Across the Data Centre we have a VLAN used for Emulators, and this is the only VLAN where this issue is occuring, these Emulators themselves have built in switches which are LAG'ed to each of the Distribution layer switches as above.

ELRP has been enabled on all ports, with the action to disable the port upon seeing a loop. ports 51 (ISC) and 54 (MLAG) are exception ports on the distro switches, so ELRP is configured on them but set as exceptions as to not disable them upon seeing a loop.

Now, when we have a loop, we see it ingress on port 51 (ISC) and egress out port 54 (MLAG). Sometimes in comes in and out the same port 54.

This occurs randomly and seems to go around the data centre from one pair of MLAG peered 480s to another.

Now, I'm thinking from what I read and the way I am looking at the network structure that ELRP doesn't need to be enabled on the uplink ports to the CORE, and the ISC links. Because by design there is a natural loop in a 2-Tier MLAG setup. But then the ISC is supposed to block ELRP messages isn't it? It says the following in the concept guide "flood and multicast traffic will traverse the ISC but will be dropped from MLAG peer port transmission by the ISC blocking filter mechanism." (EXOS Concept Guide 15.3, p. 289)

And the ingress port the ELRP loop is reporting is sometimes port 51 (ISC). Also, when it reports that a loop egressed on port 54, and ingressed on port 54, I think this could be going out 53 (to CORE1) and in on 54 (from CORE2) but it only reports 54 as it is the master port of the LAG created to go to each CORE1 and CORE2. This would make sense as I don't believe an ELRP packet would come in and then be sent out the same port. This would again mean that it is traversing the ISC link between the 2 core switches.

So it seems to be that on occasion, ELRP loops go over the ISC links, thus making it seem like a loop has occured. As it only happens on this single VLAN, I think it is due to the heavy traffic occuring on this VLAN which causes this to happen, but that’s just a hunch. Can anyone with more knowledge than me confirm this for me or correct me as this is beginning to drive me crazy.

Looking forward to your responses.


Ezeddean
5 REPLIES 5

Ezeddean_Osman_
New Contributor
Hi Grosjean,

Please see the network diagram Above. Here is a screenshot form NetSight so you can get an idea of how many of these 2-tiers are setup.

How would a multi-homed emulator cause this though? The emulator has actually got an in-built switch in it which is setup as a LAG so it's not like a multi NIC server. The other strange thing is that the elrp loop warnings rarely ever appear on the edge ports feeding the emulators, it's mostly on the MLAG peer ports over the ISC link, or on the uplink MLAG ports going to the 2 CORE's.

06d983941b4a4702a9fd9efca5f350e3_RackMultipart20151112-22377-1a0ojir-DC_inline.png

Ezeddean_Osman_
New Contributor

f6d80a7481c54d36991ad6f629a4a941_RackMultipart20151111-24079-8uua7w-MLAG2Tier_inline.png


Assuming there's no configuration error, this diagram is a valid (and quite typical) one.

The one reason I see for a loop to be detected, is the server losing its LAG. In such a case, ELRP on the MLAG Peer I would detect the loop. However, usual ELRP (prior to EXOS 16.1) could only break the loop by disabling the ISC, which is bad, and which is prevented as you have excluded it: thus only a warning.

Starting with EXOS 16.1, you can change ELRP behavior so that it can block another port than the ingress. In your example, that would block one of the port towards the server/switch. This is ELRP egress.

The emulator being a switch, it gives some room for a config mistake. Are you sure there isn't one emulator without a LAG configured?

We can also double check your MLAG config if you can paste it here. What EXOS release are you using?

Paul_Russo
Extreme Employee
Hello Ezeddean

Traffic is not blocked on the ISC link it is always blocked on the MLAG Peer Port. The ISC is what is used to communicate between the two switches. If it sees a broadcast or multicast that is coming across the ISC it tells the local MLAG Peer port to drop it.

As Stephane said a diagram would be helpful.
GTM-P2G8KFN