11-08-2019 08:47 PM
Hi
Currently have a strange networking issue, see the diagram below:
This shows two stacked EXOS switches in each data centre which are joined via MLAG.
There is an active / passive firewall. There exists 4 x /30 P2P links on the active firewall, 2 go to one stack the other 2 go to the other stack. All works well, ECMP configured and routing table reflects routes correctly. The passive firewall has all its interfaces shutdown.
Problem happens when the firewalls are flipped. You see both P2P links go down on each core, and then the other links that go to the other firewall come up as it goes active.
You see all the new P2P links then form a full adjacency.
What essentially is happening is the /30 subnets on each P2P link moves from one core to the other.
The problem is that all the new links form a full adjacency, but you cannot ping the other end of each of the point to point links and traffic stops passing through the firewalls!?
Well in fact, it seems one random P2P link out of the 4 will work and the remaining others will not, sometime not at all. If you fail the firewalls back, sometimes all the links restored, sometimes the link that successfully moved will not fail back.
There is a workaround though. Whenever a link stops working (in whatever scenario), you can simply disable and then re-enable the ports on the switch and all starts working!?
All settings in OSPF both EXOS and firewall are default and match i.e. timers, P2P, etc
Both firewalls and switches where upgraded, made no difference.
Enabled graceful restart, still no difference.
Can’t make sense of issue, and what could be causing it?.
Ideas:
Even if one of those ideas was true I’m not sure what I can do about it, so hoping the community can help?
EXOS Version: 22.7.1.2 patch1-11
Palo Alto Version: 8.1.11
Solved! Go to Solution.
11-14-2019 09:25 PM
This is now fixed.
Seems I have been looking at the issue in reverse…….
The MAC addresses of each the ports being presented to the Palo Alto are based on the Extreme switch MAC address and are as follows:
Col-CEF-Core1
02:04:96:9F:94:D8
Col-2A22-Core2
02:04:96:9F:A4:74
To view the ARP table of each of the P2P ports on the Palo Alto’s you can use the following commands:
show arp ethernet1/5
show arp ethernet1/6
show arp ethernet1/7
show arp ethernet1/8
This is the state of the ARP table on the active firewall before failover:
Firewall A
interface ip address hw address port status ttl
--------------------------------------------------------------------------------
ethernet1/5 172.20.251.66 02:04:96:9f:94:d8 ethernet1/5 c 1535
ethernet1/6 172.20.251.70 02:04:96:9f:94:d8 ethernet1/6 c 1528
ethernet1/7 172.20.251.74 02:04:96:9f:a4:74 ethernet1/7 c 1525
ethernet1/8 172.20.251.78 02:04:96:9f:a4:74 ethernet1/8 c 1526
This is the state of the ARP table after failover:
Firewall B
interface ip address hw address port status ttl
--------------------------------------------------------------------------------
ethernet1/5 172.20.251.66 02:04:96:9f:94:d8 ethernet1/5 c 1245
ethernet1/6 172.20.251.70 02:04:96:9f:94:d8 ethernet1/6 c 1245
ethernet1/7 172.20.251.74 02:04:96:9f:a4:74 ethernet1/7 c 1245
ethernet1/8 172.20.251.78 02:04:96:9f:a4:74 ethernet1/8 c 1250
The issue here is that the P2P ports move to the other switches when the passive firewall becomes active, that means the MAC addresses should have swapped around but they have not!
When you issue the ‘clear arp all’ command on the Palo Alto, this then refreshes the ARP entries to the now correct order and all works, see below:
Clear ARP All on active firewall:
ethernet1/5 172.20.251.66 02:04:96:9f:a4:74 ethernet1/5 c 1782
ethernet1/6 172.20.251.70 02:04:96:9f:a4:74 ethernet1/6 c 1782
ethernet1/7 172.20.251.74 02:04:96:9f:94:d8 ethernet1/7 c 1777
ethernet1/8 172.20.251.78 02:04:96:9f:94:d8 ethernet1/8 c 1735
The answer was to move the cables around on the passive firewall so that the same P2P subnet become active on the same switch so the MAC presented to the Palo stayed the same!
This may well be what the proper method is, or some alternative configuration may have helped but this sorted the issue for me.
Below is a diagram showing connections previously in the top row and then what I had moved them too below that in red:
11-14-2019 09:25 PM
This is now fixed.
Seems I have been looking at the issue in reverse…….
The MAC addresses of each the ports being presented to the Palo Alto are based on the Extreme switch MAC address and are as follows:
Col-CEF-Core1
02:04:96:9F:94:D8
Col-2A22-Core2
02:04:96:9F:A4:74
To view the ARP table of each of the P2P ports on the Palo Alto’s you can use the following commands:
show arp ethernet1/5
show arp ethernet1/6
show arp ethernet1/7
show arp ethernet1/8
This is the state of the ARP table on the active firewall before failover:
Firewall A
interface ip address hw address port status ttl
--------------------------------------------------------------------------------
ethernet1/5 172.20.251.66 02:04:96:9f:94:d8 ethernet1/5 c 1535
ethernet1/6 172.20.251.70 02:04:96:9f:94:d8 ethernet1/6 c 1528
ethernet1/7 172.20.251.74 02:04:96:9f:a4:74 ethernet1/7 c 1525
ethernet1/8 172.20.251.78 02:04:96:9f:a4:74 ethernet1/8 c 1526
This is the state of the ARP table after failover:
Firewall B
interface ip address hw address port status ttl
--------------------------------------------------------------------------------
ethernet1/5 172.20.251.66 02:04:96:9f:94:d8 ethernet1/5 c 1245
ethernet1/6 172.20.251.70 02:04:96:9f:94:d8 ethernet1/6 c 1245
ethernet1/7 172.20.251.74 02:04:96:9f:a4:74 ethernet1/7 c 1245
ethernet1/8 172.20.251.78 02:04:96:9f:a4:74 ethernet1/8 c 1250
The issue here is that the P2P ports move to the other switches when the passive firewall becomes active, that means the MAC addresses should have swapped around but they have not!
When you issue the ‘clear arp all’ command on the Palo Alto, this then refreshes the ARP entries to the now correct order and all works, see below:
Clear ARP All on active firewall:
ethernet1/5 172.20.251.66 02:04:96:9f:a4:74 ethernet1/5 c 1782
ethernet1/6 172.20.251.70 02:04:96:9f:a4:74 ethernet1/6 c 1782
ethernet1/7 172.20.251.74 02:04:96:9f:94:d8 ethernet1/7 c 1777
ethernet1/8 172.20.251.78 02:04:96:9f:94:d8 ethernet1/8 c 1735
The answer was to move the cables around on the passive firewall so that the same P2P subnet become active on the same switch so the MAC presented to the Palo stayed the same!
This may well be what the proper method is, or some alternative configuration may have helped but this sorted the issue for me.
Below is a diagram showing connections previously in the top row and then what I had moved them too below that in red:
11-10-2019 04:56 PM
Does look like an ARP issue. If I clear the ARP entries for each of the P2P VLANs the connections all start working.
The firewall is meant to be sending a gratuitous ARP when it moves, and looks like the entries are all correctly moving and nothing is getting stuck, but clearly an ARP related issue?
Just need to figure out a solution.
Will post back if I find a fix unless someone in the community has an answer beforehand.
Many thanks.
11-08-2019 11:30 PM
Ok, so some new information……
Turns out if I just leave it, after about 15 minutes all the ling spontaneously start working?!
11-08-2019 10:19 PM
Apologies realised I didn’t answer your question fully, but seems OK:
Firewall A Active
Slot-1 Col-CEF-Core1.5 # show iparp | inc b4:0c:25:e2:c0:44
VR-Default 172.20.251.65 b4:0c:25:e2:c0:44 3 NO FW1-Link1-Core1 3501 1:42
Slot-1 Col-CEF-Core1.6 # show iparp | inc b4:0c:25:e2:c0:45
VR-Default 172.20.251.69 b4:0c:25:e2:c0:45 3 NO FW1-Link2-Core1 3502 2:42
Slot-1 Col-CEF-Core1.7 # show iparp | inc b4:0c:25:e2:c0:46
Slot-1 Col-CEF-Core1.8 # show iparp | inc b4:0c:25:e2:c0:47
* Slot-1 Col-2A22-Core2.1 # show iparp | inc b4:0c:25:e2:c0:44
* Slot-1 Col-2A22-Core2.2 # show iparp | inc b4:0c:25:e2:c0:45
* Slot-1 Col-2A22-Core2.3 # show iparp | inc b4:0c:25:e2:c0:46
VR-Default 172.20.251.73 b4:0c:25:e2:c0:46 4 NO FW1-Link3-Core2 3503 1:43
* Slot-1 Col-2A22-Core2.4 # show iparp | inc b4:0c:25:e2:c0:47
VR-Default 172.20.251.77 b4:0c:25:e2:c0:47 4 NO FW1-Link4-Core2 3504 2:43
Firewall B Active
Slot-1 Col-CEF-Core1.8 # show iparp | inc b4:0c:25:e2:c0:44
Slot-1 Col-CEF-Core1.9 # show iparp | inc b4:0c:25:e2:c0:45
Slot-1 Col-CEF-Core1.10 # show iparp | inc b4:0c:25:e2:c0:46
VR-Default 172.20.251.73 b4:0c:25:e2:c0:46 0 NO FW2-Link3-Core1 3503 1:43
Slot-1 Col-CEF-Core1.11 # show iparp | inc b4:0c:25:e2:c0:47
VR-Default 172.20.251.77 b4:0c:25:e2:c0:47 0 NO FW2-Link4-Core1 3504 2:43
VR-Default 172.20.251.65 b4:0c:25:e2:c0:44 1 NO FW2-Link1-Core2 3501 1:42
* Slot-1 Col-2A22-Core2.6 # show iparp | inc b4:0c:25:e2:c0:45
VR-Default 172.20.251.69 b4:0c:25:e2:c0:45 1 NO FW2-Link2-Core2 3502 2:42
* Slot-1 Col-2A22-Core2.7 # show iparp | inc b4:0c:25:e2:c0:46
* Slot-1 Col-2A22-Core2.8 # show iparp | inc b4:0c:25:e2:c0:47
Thanks,
Martin