MLAG ISC VRRP asymmetric routing possible

  • 0
  • 1
  • Problem
  • Updated 2 years ago
  • Solved
We are having a problem with the ISC between two x460s. VRRP is configured as ACTIVE/STANDBY. Everything looked fine initially during our tests as we only used ICMP. I configured separate "external" switches with IPs I could ping to test MLAG fail over on access switches connected to the two x460 core switches. The test dropped pings as expected and VRRP transitioned properly on failover. MLAG worked as well going to the access switches.

Now the problem. TCP and UDP traffic does not establish any kind of connection. We connected the 460s to the internet and were able to ping 8.8.8.8, but cannot telnet to 53 nor http ports. Needless to say, no internet. When I disconnect the ISC between the two 460's, internet works flawlessly. I have no idea why this is and have not opened a ticket yet. I was plugged into the active VRRP switch when I tested, so the traffic shouldn't have been affected by the ISC in the first place.

VRRP is balanced on the switches, half ACTIVE and half STANDBY.
I figure if I change the configuration to ACTIVE/ACTIVE, then the traffic would flow correctly.
I have followed the Extreme guides to configure the ISC and MLAG as well. That is how the switches are configured.

Link that is similar to ours. Instead of the server, we have access switches.
https://d2r1vs3d9006ap.cloudfront.net...

This image showed the traffic flowing over the ISC and I would not think this would be an issue.
Photo of Justin Metts

Justin Metts

  • 252 Points 250 badge 2x thumb

Posted 2 years ago

  • 0
  • 1
Photo of Grosjean, Stephane

Grosjean, Stephane, Employee

  • 12,502 Points 10k badge 2x thumb
Hi,

Assuming your config is correct, do you have a FW somewhere that could block the traffic, when it has to switch because of VRRP?
Photo of Justin Metts

Justin Metts

  • 252 Points 250 badge 2x thumb
That is a good point. The firewall should create the ARP with the virtual mac address for the VIP and should transition seamlessly from active to backup. I would think that the firewall would also block ICMP if this was the case as well. I would provide the config of the switch if I were at the site. Here is topology we are implementing. The workstation works on VLAN 3 with the ISC connections connected, but not VLAN20.
Photo of Dorian Perry

Dorian Perry, Employee

  • 2,300 Points 2k badge 2x thumb
Hi Justin,

Is the default gateway (DG) of the Workstation the VIP of the X460s? If so, get the IP of the DG of the X460's and check which port the ARP entries are being programmed on. Use command: "show iproute" to find the DG of the switch and "show iparp <IP_Address>" to determine the port where the switch is learning the DG.  
Photo of Jan Steinbach

Jan Steinbach

  • 1,078 Points 1k badge 2x thumb
To be sure:

The ISC link also transport the access VLANs (20)?

And UDP is really also affected? Because "Telnet to 53" command would also be TCP even when you use an well known DNS Port..

The Firewalls also failover, correct? Are they using MAC masquerading (Virtual MAC) or does they send an GARP (Gratuitous ARP) via Broadcast?

And, did you check the Firewall log for maybe state related drops (Due to changing ingress Interfaces after fw-failover? I had a similar sitation a few years ago in another context with Cisco ASA Firewalls. ICMP worked well because it is stateless, but TCP was tracked in the connection table and the connection was mapped to the initial ingress Interface.

Cheers,
Jan
Photo of Justin Metts

Justin Metts

  • 252 Points 250 badge 2x thumb
Yes, DNS is affected. nslookup 8.8.8.8 server timed out on queries. Telnet is TCP, I just don't think about it sometimes.

We did not failover the firewalls. The same firewall was active the whole time. We did not have access to look at the logs. The ingress ports should have remained the same from the primary x460, since we didn't fail over. I even disconnected the second link of the backup 460 to the internet. and still had the problem.

I am going to setup a test tomorrow and get close to the production environment. I will also grab the configs.

VLAN20 is tagged, forgot to mention.
(Edited)
Photo of Pala, Zdenek

Pala, Zdenek, Employee

  • 8,442 Points 5k badge 2x thumb
Hi,

as far as I remember you must use Active-Active VRRP approach. If the LAG connected switch will send the traffic to backup VRRP router there is noone who will route it...

as far as I remember there is a section in manual regarding this.

regards

Zdenek
Photo of Justin Metts

Justin Metts

  • 252 Points 250 badge 2x thumb
I have seen the manuals on how to configure ACTIVE/ACTIVE, but have not seen anything mandating the configuration. I was directly connected to the primary x460 with the workstation. Nothing to send traffic to the backup.
Photo of Frank

Frank

  • 3,662 Points 3k badge 2x thumb
Hi,

Can you ping (both if applicable) firewalls' internal (vlan3) IP addresses from the workstation in vlan20? With and without the ISC?

If I understand correctly, the VLAN 20 PC is connected to an access switch, which is MLAGed to both 460s. "Switch Inside 1" and "..2", are they MLAGed to both 460s as well?

And just to be sure, both firewalls have the appropriate route back to the vlan20 IP space with the gateway address of the vlan3 VRRP address that's on the 460?

       Frank
Photo of Justin Metts

Justin Metts

  • 252 Points 250 badge 2x thumb
The workstation was directly connected to the ACTIVE VRRP x460. ICMP would work to the firewalls and to the internet. Session traffic would not flow with the ISC in place. Once removed, session traffic would flow. Going to find time to mimic the configuration and setup a "server" on the switch and test. I might end up configuring VRRP as ACTIVE/ACTIVE.
Photo of Grosjean, Stephane

Grosjean, Stephane, Employee

  • 12,492 Points 10k badge 2x thumb
VLAN 3 is also on the ISC, right?
Photo of Justin Metts

Justin Metts

  • 252 Points 250 badge 2x thumb
Correct.

I did initially forget that part in the beginning of configuring the switches. In order for VRRP to be MASTER/BACKUP, the VLANs have to be tagged across the ISC. That is what I learned since I am new to extreme. I would have thought all traffic would have been synchronized across the ISC link.
Photo of Grosjean, Stephane

Grosjean, Stephane, Employee

  • 12,492 Points 10k badge 2x thumb
100% sure the "standby" FW doesn't act as an "active" one? So traffic doesn't directly go to it when the VRRP failover happens on the x460 (that would explain your sessions issues)?
Photo of Justin Metts

Justin Metts

  • 252 Points 250 badge 2x thumb
I have isolated the switches from the entire network. Only the two switches are connected.
Photo of Brandon Clay

Brandon Clay, Escalation Support Engineer

  • 13,086 Points 10k badge 2x thumb
What version of EXOS are the X460s running?

If it is 15.6 or higher, it might be worth running the MLAG config check script found here. This will check that the VLANs on the MLAG ports are also added to the ISC.
Photo of Justin Metts

Justin Metts

  • 252 Points 250 badge 2x thumb
16.1.  I'll try the script tomorrow and also get the configs. I will post what I find.
Photo of Justin Metts

Justin Metts

  • 252 Points 250 badge 2x thumb
* Slot-1 # run script mlag_config_check.py
Local and remote FDB checksums match.
MLAG config check completed.

Same results on both switches. Posting a better diagram and the configs.
Photo of Justin Metts

Justin Metts

  • 252 Points 250 badge 2x thumb
The top diagram is exactly how I have the switches configured now. Both cores have an additional switch stacked.

The bottom of the diagram is showing how I think traffic should flow. Packet ingresses VLAN 20 on CORE 1 with destination 10.200.3.252, CORE 1 knows route to 10.200.3.252 is on VLAN 3, CORE 1 sends packet out VLAN 3, since it is directly connected to CORE 2 via layer 2 tagged across the ISC. CORE 2 receives packet on VLAN 3. CORE 2 sends packet to host on VLAN 3. The return path would be similar from PC2, but just the opposite way.

The ARP table is correct since ICMP works and does not drop the first ping to fill the ARP entries.

Photo of Justin Metts

Justin Metts

  • 252 Points 250 badge 2x thumb
Going over the configuration in depth for posting here, I have found the issue. There is an ACL on the Public VLAN that does not allow the traffic. It allows ICMP, but not anything else. I would have bet money that I removed the ACL previously to assist in any troubleshooting for initial implementation, but apparently not. Sorry for wasting anyone's time.
Photo of Grosjean, Stephane

Grosjean, Stephane, Employee

  • 12,492 Points 10k badge 2x thumb
That was your "FW" ;)
Good you found the issue.
Photo of Frank

Frank

  • 3,662 Points 3k badge 2x thumb
Glad everything works now (or is underway). I'm also glad I'm not the only one who's done something like this before :D