We are having a problem in our Data centre where we have multiple 2-tier MLAG setup between the 2 CORE’s and pairs of distribution switches.
Two 480s connected to each other via ISC, ports 51,52 (lag L3_L4 lacp - Master port 51).
Two BDs connected to each other via ISC, ports 1-4 (lag)
Two 480s connected to the core via MLAG, ports 53,54 (lag L3_L4 lacp - Master port 54)
Emulator connected to Two 480s via MLAG
Looks like this;
Across the Data Centre we have a VLAN used for Emulators, and this is the only VLAN where this issue is occuring, these Emulators themselves have built in switches which are LAG'ed to each of the Distribution layer switches as above.
ELRP has been enabled on all ports, with the action to disable the port upon seeing a loop. ports 51 (ISC) and 54 (MLAG) are exception ports on the distro switches, so ELRP is configured on them but set as exceptions as to not disable them upon seeing a loop.
Now, when we have a loop, we see it ingress on port 51 (ISC) and egress out port 54 (MLAG). Sometimes in comes in and out the same port 54.
This occurs randomly and seems to go around the data centre from one pair of MLAG peered 480s to another.
Now, I'm thinking from what I read and the way I am looking at the network structure that ELRP doesn't need to be enabled on the uplink ports to the CORE, and the ISC links. Because by design there is a natural loop in a 2-Tier MLAG setup. But then the ISC is supposed to block ELRP messages isn't it? It says the following in the concept guide "flood and multicast traffic will traverse the ISC but will be dropped from MLAG peer port transmission by the ISC blocking filter mechanism." (EXOS Concept Guide 15.3, p. 289)
And the ingress port the ELRP loop is reporting is sometimes port 51 (ISC). Also, when it reports that a loop egressed on port 54, and ingressed on port 54, I think this could be going out 53 (to CORE1) and in on 54 (from CORE2) but it only reports 54 as it is the master port of the LAG created to go to each CORE1 and CORE2. This would make sense as I don't believe an ELRP packet would come in and then be sent out the same port. This would again mean that it is traversing the ISC link between the 2 core switches.
So it seems to be that on occasion, ELRP loops go over the ISC links, thus making it seem like a loop has occured. As it only happens on this single VLAN, I think it is due to the heavy traffic occuring on this VLAN which causes this to happen, but that’s just a hunch. Can anyone with more knowledge than me confirm this for me or correct me as this is beginning to drive me crazy.
Looking forward to your responses.