MLAG with VMware or HyperV - tons of DUP packets

  • 0
  • 1
  • Problem
  • Updated 2 years ago
  • Solved
We deployed our first MLAG scenario with 2 x670's at the core and with VMware 6.0 downstream.  We had unusual amounts of what appeared to be packet loss, overall slowness and VM's showing online, then offline, etc.  A packet trace revealed a TON of duplicate packets and retransmissions.  When we took the second peer offline, these ceased.  We've since confirmed that both sides of each MLAG are using L3 algorithm (IP HASH on vswitch), we removed all unused / standby adapters from the vswitch and we ensured that beacon probing was off.  We otherwise followed the 2012 white paper from Extreme on deploying MLAG in an ESXi environment.  Not sure if all of the duplicate traffic is expected in this config or if we're doing something wrong.  Ideas?
Photo of Eric Burke

Eric Burke

  • 3,500 Points 3k badge 2x thumb

Posted 2 years ago

  • 0
  • 1
Photo of Eric Burke

Eric Burke

  • 3,500 Points 3k badge 2x thumb
As a side note, when pinging devices from a downstream Extreme user switch (also connected upstream via MLAG), we received 2 responses for each ping (one labeled as DUP").  When we tried a two tier MLAG setup, we actually got 3 (two DUPs) per ping request. 
Photo of Patrick Voss

Patrick Voss, Alum

  • 11,744 Points 10k badge 2x thumb
Hello Eric,

Can you please provide the following outputs form both cores and answer the questions below:

  • "show config vsm"
  • "show mlag ports"
  • "show mlag peer"
  • What port# is the VMWARE host connected to on both cores.
  • Are both connections on the VMWARE host setup as a lag.
  • Are you using LACP?
Photo of Eric Burke

Eric Burke

  • 3,500 Points 3k badge 2x thumb
We had to tear it down (at 4am to be sure we made the business day) so I don't have any "show" stats available.  On the VMware side, we could not use LACP as this is not an enterprise plus client.  We simply had two nics in a singe vswitch, both active, using L3 hash (although the whitepaper says any type is okay, a technote found on VMware says they must match and be IP based).  Basically two 10GB nics, DAC connected to port 1 of each MLAG peer.  All VLANs on the MLAG were on the ISC.  End result was some VM's appeared offline but would then suddenly come online (via ping).  That combined with the retransmits and dupe traffic had us concerned enough to drop the idea.
Photo of Ty Kolff

Ty Kolff

  • 1,098 Points 1k badge 2x thumb
Eric,

We have not been configuring any LAG on our VMWare servers that are connected to an MLAG pair.  We simply plug one NIC from the VMWare servers into each switch and leave the default VMWare option which I believe is 'Route based on originating port ID'  

Here is another post discussing this:
https://community.extremenetworks.com/extreme/topics/dual-x670v-stacks-mlag-and-vmware-esx
Photo of Eric Burke

Eric Burke

  • 3,168 Points 3k badge 2x thumb
Thanks for the link Ty, I'll check it out...
Photo of Eric Burke

Eric Burke

  • 3,168 Points 3k badge 2x thumb
Thanks for the link Ty, I'll check it out...

BTW:  Are you saying that on the Extreme side you're not settings those ports for MLAG either (just standalone trunks in the same VLANs on each side)?  I get the feeling that's our problem - one side is a LAG and the other is not.
Photo of Ty Kolff

Ty Kolff

  • 1,098 Points 1k badge 2x thumb
Exactly.  I just left the ports as single trunk ports that the VMWare servers are plugged into.  On the VMWare side we left the default teaming option as 'Route based on originating virtual port ID'  VMWare handles the failover.

I have this installed in a production environment and have tested failover on multiple servers.  I was in the same position as you a few months back.
(Edited)
Photo of Eric Burke

Eric Burke

  • 3,168 Points 3k badge 2x thumb
Thanks for that.  Did you leave your other connections in MLAG?  For example, our design includes 8 standalone user switches (all configured as LACP L3_L4 via MLAG to the 670's), a couple of Windows servers (which we can use LACP on), etc.  My guess is, that you probably have a bit more traffic spanning the ISC since traffic is likely to need to cross that link to reach its destination?
Photo of simon bingham

simon bingham

  • 1,228 Points 1k badge 2x thumb
heres an idea
You get something similar  when one end is a aggregation and the other is not. 

imagine a 4 port agg, the end that is not a aggregation loops back 3 copies of the frame back into the aggregation. you see 4 of every packet if you wireshark 


Simon
(Edited)
Photo of Eric Burke

Eric Burke

  • 3,188 Points 3k badge 2x thumb
Thanks Simon.  Agreed, that's what had us thinking that the MLAGs were improperly configured (on one end or the other).  We were pretty confident in the Extreme side, but not on the VMware side.  Reading their article made it seem that we'd made a mistake in the method of aggregating (wrong hash, beacon probing originally on), but latter tests showed the same results.  I feel like I'm missing something in that to me, a LAG is simply two uplinks active in the same vswitch but other comments are leading me to think there is an added layer to making an actual LAG on the vmware side.  Am I missing something?
Photo of Eric Burke

Eric Burke

  • 3,500 Points 3k badge 2x thumb
Okay, so we did some extensive testing this morning.  (2) 440's as MLAG peers.  VMware host with a NIC to each (port 1), both in the same vSwitch.  Left teaming default (originating port ID) and configured MLAG port 1 on each switch.  This matches the old 2012 best practices document for the most part. We don't have Enterprise Plus, so we cannot use LAG/LACP.

Findings:

- No observed IP's on one NIC in VMware
- If we disable a port on one switch, pings continue to mgmt address from a node but actual application access (like hitting the host via http) fails most times
- Switch with inactive port cannot ping mgmt address across ISC via switch that still has an active port
- With IP hash, similar issues (also dupe packets in wireshark) - left it that way for next test.


If we remove sharing on the switch (which is setup as a single port on each switch, port 1) but leave IP hash on VMware and still with MLAG enabled (MLAG 1 on both, but no sharing underneath) it works as expected.

If we remove MLAG (peers / ISC still up), but port 1 on each as simply trunks to each pnic, it works as it did with MLAG.  Not sure MLAG is helping in this scenario.

So our plan (tentatively) is to use separate links to each core switch (trunks, no MLAG), knowing that since VMware is load balancing it will result in traffic across the ISC (which is way oversized in our scenario, so not a big deal).  For dual-connected windows servers, we'll instead use LACP with both sharing and MLAG with a hash type of "address" on the server, L3 on the LACP side of extreme.  For downstream user switches, we'll do the same.  MLAG, LACP, L3 on both sides.  In theory (as I understand it), this will result in only one link being actively handling traffic unless one core fails.  Then, the address table will move to the remaining peer.

Wish us luck!
Photo of Ty Kolff

Ty Kolff

  • 1,098 Points 1k badge 2x thumb
That sounds like a good plan. We did use MLAG on all of the other connections (primarily IDF's and other switch stacks) just not on the VMWare servers.