cancel
Showing results for 
Search instead for 
Did you mean: 

MLAG Testing in Campus EXOS S&R [Lab 6 Section D]

MLAG Testing in Campus EXOS S&R [Lab 6 Section D]

Tomasz
Valued Contributor II

Hello everyone,

 

Yesterday I had some late hour struggle with MLAG lab and some student. Maybe it was lack of coffee but this time I got confused and would like to get this straight to make no room for future confusions.

MLAG lab design assumes PC-A and PC-D pings running between Switch-A/Switch-B as MLAG peers and Core-A as a downstream device. Pings are likely running through MLAG peers directly, ommiting Core-A. In Lab 6, Section D, step 5 we are asked to disable both ISC ports. This literally breaks MLAG peering. Step note says that pings may or may not keep going uninterrupted, but my two students I had a discussion with, had pings not going at all. I done some troubleshooting with port statistics and fdb and what were my findings:

  • I was pretty sure MLAG will break when ISC is broken, and that peers will present themselves with just their own MAC addresses to a downstream Core-A again, making Core-A able to establish LACP only to one of the devices (cannot create standard LACP with two different partners on the other side, that’s why we have MLAG actually). However…
  • However it didn’t happen that way, although show mlag commands could show things are bad, Core-A still considered LACP link as fully operational.

In both situations, pings are IMHO unable to pass through between PC-A/SW-A/C-A/SW-B/PC-D. In the first situation, Core-A should consider one of the ports not valid for LACP link establishment and thus could only communicate with either Switch-A or Switch-B, path between PCs would be incomplete. In the situation I had, when Core-A received ICMP request from PC-A through Switch-A, it could even do broadcasts, it would not go up to Switch-B as LACP considers that a single link and there is no way to receive a frame from LAG and put it back into LAG normally. And that’s what I could see, on Switch-B there was no incoming transmission from Core-A during ping attempts.

To overcome this I presented troubleshooting and explained that indeed ISC is crucial for MLAG operation, and showed them a bit different scenario where PC-A pings Core-A and we break one of the uplinks to see the failover happens thanks to MLAG and operational ISC.

 

I’d really appreciate some comments to make sure where am I wrong, or if my conclusions are correct, I’d recommend adjusting step 5 to ping Core-A (172.16.x1.1) from PC-A instead, and disable port 1 on (most likely) Switch-A instead. It resembles more “natural” use case where we have a downstream device dual-homed for its network access redundancy, and that this downstream device shouldn’t IMHO be considered a failover path for horizontal network traffic.

 

Thanks,

Tomasz

6 REPLIES 6

Tomasz
Valued Contributor II

Hi Patrick,

 

That’s an amazing take on the topic! Thanks for sharing!

Out of things beyond the class default routine, I have recently begun to use my own X440-G2 to present things between slides, along with useful commands for someone’s cheatsheet which are not listed there (such as about inline-power or transceivers). Slightly impaired but that’s something. 😉

It’s also good to have to discuss features that are not (or not yet) in the courseware (Policy, Telemetry and so on).


Cheers,

Tomasz

Patrick_Koppen
Contributor

Hi Tomasz,

 

there are a few rules about MLAG:

  1. every host/switch has to be connected to both peers
  2. you can disable ISC without any impact
  3. you can reboot one peer without any impact
  4. exos 22.5 and 22.6 are broken, you need >=22.7

Unfortunately, you can not recreate this in lab, because you violate rule 1. PCs are only single attached. We use the following lab for the mlag exercise:

 

44c78098609d4e348c61c85ac1fd3a0f_d75e6045-9eb2-4b76-a076-9eb42a5af63c.png

 

Core-A and -B are done by trainer. SW1, SW2 are the student switches. We cannot test with PC1 and PC2 because they are single attached. So we do another mlag to Core-B. On Core-B there 4 VRFs, so it simulates a host with a different ip for every group. Core-A/B are the access switches, SW1 /2 are the distribution/core switches. For testing I share my screen from the Server. I start something like:

watch -n 1 fping -q -c 1 10.{1,2,3,4}.1.100

So student can see connectivity during all test. Now they can disable every link (both ports, because of lag). There should be no error. Usual tests are (one at a time!):

  1. disable link (both ports) between SW1 and Core-A (or any other of the 4 links)
  2. disable ISC
  3. reboot SW1 or SW2

There should be no ping loss during the tests.

 

If you did STP or EAPS-Lab before, you have to reboot the switches depending on the software version.

 

And that’s why we use our own labs to do real live examples.

 

Greetings

Patrick

 

Tomasz
Valued Contributor II

Hi Tom,

 

That’s a good one, thank you! It was definitely my mistake following few students’ mistake. I have no issues getting the PC->Core situation. I got confused on that training, now it’s clearer why not much anyone reported similar issue few years back. At least it let us have some discussion around the topic and make sure students get the conclusion. 🙂

According to our recent meeting, such nice opportunities to use glitter pen occur when something has to be explained beyond slides… 😉

35e8f0e493024c219d063d2ba7a69ab5_87d8f33d-d610-480d-9350-9deaf5d08901.png

 

Cheers,

Tomasz

TLizotte
New Contributor

Testing MLAG Step 1 says On your PC-A and PC-D start a continuous ping to the Core-A switch.

As written, PC-A should never be pinging PC-B as part of the test.

On the return ping packet from Core A , both links are active and the outbound hash occurs to select a port, to the left or right. Either back to the PC source on the direct path (success) or not because the ISC ports are broken with no return path. To us, it’s a 50/50 chance that the packet picks the correct port. The SMAC/DMAC formula for path selection isn’t examined in the courseware, and I’m not necessarily recommending that here. 

So sometimes students say it works, sometimes students say its broken.

And step 2 says : On Switch-A, disable port 1 just as you suggest.

Let me add that it’s unfortunate that the lab is setup with the terms core on what is really positioned as the edge switch. The test would probably make a bit more sense if there was a PC on “CoreA” generating a ping. then again, you’d get the same results.

Conclusion: DON’T let the ISC break or you’ll get unpredictable results.

GTM-P2G8KFN