Header Only - DO NOT REMOVE - Extreme Networks
Question

MLAG setup - looks like hitting a L2 loop

  • 29 July 2019
  • 8 replies
  • 552 views

Hello,

We tried to set up a MLAG between 2 x670 switches and once we enabled the second "leg" (port 41 on sw1) looks like we did hit a loop. Unfortunately it's a production network and we are very limited in opportunities to reproduce it.

MLAG related configs are as follows:

sw1:
code:
create mlag peer "sw2" 
configure mlag peer "sw2" ipaddress 192.168.128.242 vr VR-Default
enable mlag port 41 peer "sw2" id 202
enable sharing 41 grouping 41-48 algorithm address-based L2 lacp




sw2:
code:
create mlag peer "sw1" 
configure mlag peer "sw1" ipaddress 192.168.128.241 vr VR-Default
enable mlag port 41 peer "sw1" id 202
enable sharing 41 grouping 37-48 algorithm address-based L2 lacp





MLAG peers see each other, checkpoint status is 'Up'. What caught my attention is this. On sw1:

code:
sw1.118 # debug hal show vsm 

VSM Blocking Filters:
Ingress port: 1:1
Blocked ports:
Unit 1 (inst 1 Fid A553 l3_inst 1 l3_Fid A551 l3rem_inst 1 l3rem_Fid A552 pend 0):
41 42 43 44 45 46 47 48

VSM Redirection: (Enabled)




But on sw2:

code:
sw2.29 # debug hal show vsm 

VSM Blocking Filters:
Ingress port: 1:1
Blocked ports:

VSM Redirection: (Enabled)




Could this be the cause of the problem (that there're no blocked ports for the filter)? If so, why they could've not been added?

Both switches are running 16.2.4.5-patch1-6.

8 replies

Userlevel 3
What does "show mlag peer" and "show mlag port" on both switches show?

Can you draw a network map of what you're trying to accomplish?

You created LAGs using ports 41-48 on one switch, and 37-48 on the other. Are you sure that is correct?

BTW 16.2 is end of service life in December. You should consider upgrading.
Userlevel 3
Hi, we have the same problem with a pair of x870 devices conected to another pair of x690. The firmware version running is 22.6.1.4

We have cheked mlag configuration running the mlag script and its all ok.

Also we have tested x870 with 30.2.1.8 versión connected to a pair of x690 with 22.6.1.4 and the loop behaviour is produced again (not at the moment but in days).

The next test is upgrading all the 4 switches to 30.2.1.8.
Userlevel 5
Hi FES,

Is that a two-tier MLAG design?
If 'show mlag peer' and 'show mlag port' and 'show sharing' are all good, is it possible that the loop is introduced elsewhere in the network (even on non-MLAG-considered VLAN) and hits the switches? You say that the loop doesn't happen at the moment but in days - is that loop happening randomly or was the test performed a while ago?

Kind regards,
Tomasz
Userlevel 2
Did you add the ISC vlan to the ports between the switches?

https://gtacknowledge.extremenetworks.com/articles/How_To/How-to-configure-MLAG-in-Extreme-switches/?q=mlag+config&l=en_US&c=Extreme_Software%3AExtremeXOS_EXOS&fs=Search&pn=1
Userlevel 3
Hi,
i have used the EXOS MLAG script to test the mlag configuration. The problems (seems like loops) apprears randomly and we dont know why.

We have updated to 30.2.1.8 to ensure that is not a bug. Otherwise we have seen a protocol to view mlag loops in this new version.
Userlevel 5
Hi FES,

You might want to try and use ELRP to spot the loop when it happens. Have a look at these:
https://gtacknowledge.extremenetworks.com/articles/How_To/How-to-configure-ELRP-to-disable-ports
https://gtacknowledge.extremenetworks.com/articles/Q_A/do-I-need-to-enable-ELRP-on-all-the-VLANs-where-physical-ports-are-identical

FYI, with EXOS 30.2 and older ELRP periodic test interval can be as small as 100 ms. With EXOS 30.3, hardware can be used for these tests, which allows to decrease the interval to just few milliseconds.

Hope that helps,
Tomasz
Userlevel 3
Hi,
finally we think that we have reached the problem. We need test it yet, but Im sure that this is the problem.

We have all devices updated to 30.2.1.8. I have seen that 30.3 fix some mlag bugs. We have seen in logs that the x870 devices dont have enough resources to manage MLAG ACL.

In this post are some information:
https://gtacknowledge.extremenetworks.com/pkb_mobile#/articles/en_US/Solution/MLAG-possible-loop-for-BUM-traffic-on-X870

We are going to asign ipv6 resources to mac resources to test again the mlag behaviour.

I hope this resolve the problem
Userlevel 3
Hi,
Does anybody knows how to show acl resources used by mlag?

The command " show policy resource-profile" does nos show any used resource by L2

show policy resource-profile

Current Configured Profile: default
Current Profile Modifier : none

MAC IPv6 IPv4 L2
Rules Rules Rules Rules
----- ----- ----- -----
Max 512 512 512 440
Used 0 0 53 0


Someone have tested the command " configure policy resource-profile more-mac-no-ipv6 " ??

Reply