Okay, so we did some extensive testing this morning. (2) 440's as MLAG peers. VMware host with a NIC to each (port 1), both in the same vSwitch. Left teaming default (originating port ID) and configured MLAG port 1 on each switch. This matches the old 2012 best practices document for the most part. We don't have Enterprise Plus, so we cannot use LAG/LACP.
Findings:
- No observed IP's on one NIC in VMware
- If we disable a port on one switch, pings continue to mgmt address from a node but actual application access (like hitting the host via http) fails most times
- Switch with inactive port cannot ping mgmt address across ISC via switch that still has an active port
- With IP hash, similar issues (also dupe packets in wireshark) - left it that way for next test.
If we remove sharing on the switch (which is setup as a single port on each switch, port 1) but leave IP hash on VMware and still with MLAG enabled (MLAG 1 on both, but no sharing underneath) it works as expected.
If we remove MLAG (peers / ISC still up), but port 1 on each as simply trunks to each pnic, it works as it did with MLAG. Not sure MLAG is helping in this scenario.
So our plan (tentatively) is to use separate links to each core switch (trunks, no MLAG), knowing that since VMware is load balancing it will result in traffic across the ISC (which is way oversized in our scenario, so not a big deal). For dual-connected windows servers, we'll instead use LACP with both sharing and MLAG with a hash type of "address" on the server, L3 on the LACP side of extreme. For downstream user switches, we'll do the same. MLAG, LACP, L3 on both sides. In theory (as I understand it), this will result in only one link being actively handling traffic unless one core fails. Then, the address table will move to the remaining peer.
Wish us luck!