Header Only - DO NOT REMOVE - Extreme Networks
Solved

OSPF establishes full adjacency but unable to ping P2P link.

  • 8 November 2019
  • 6 replies
  • 508 views

Userlevel 5
Badge

Hi

 

Currently have a strange networking issue, see the diagram below:

 

This shows two stacked EXOS switches in each data centre which are joined via MLAG.

 

There is an active / passive firewall. There exists 4 x /30 P2P links on the active firewall, 2 go to one stack the other 2 go to the other stack. All works well, ECMP configured and routing table reflects routes correctly. The passive firewall has all its interfaces shutdown.

 

Problem happens when the firewalls are flipped. You see both P2P links go down on each core, and then the other links that go to the other firewall come up as it goes active.

 

You see all the new P2P links then form a full adjacency.

 

What essentially is happening is the /30 subnets on each P2P link moves from one core to the other.

 

The problem is that all the new links form a full adjacency, but you cannot ping the other end of each of the point to point links and traffic stops passing through the firewalls!?

 

Well in fact, it seems one random P2P link out of the 4 will work and the remaining others will not, sometime not at all. If you fail the firewalls back, sometimes all the links restored, sometimes the link that successfully moved will not fail back.

 

There is a workaround though. Whenever a link stops working (in whatever scenario), you can simply disable and then re-enable the ports on the switch and all starts working!?

 

All settings in OSPF both EXOS and firewall are default and match i.e. timers, P2P, etc

 

Both firewalls and switches where upgraded, made no difference.

 

Enabled graceful restart, still no difference.

 

Can’t make sense of issue, and what could be causing it?.

 

Ideas:

 

  • Timing issue. Firewall failover too quick and adjacencies are forming before something has had time to catchup?
  • Adjacency forms using multicast, perhaps a layer 2 is good but layer 3 issue?
  • P2P /30 subnets are moving from one core to the other, somehow causing an issue?

 

Even if one of those ideas was true I’m not sure what I can do about it, so hoping the community can help?

EXOS Version:  22.7.1.2 patch1-11

Palo Alto Version: 8.1.11

 

 

icon

Best answer by Martin Flammia 14 November 2019, 22:25

This is now fixed.

Seems I have been looking at the issue in reverse…….

The MAC addresses of each the ports being presented to the Palo Alto are based on the Extreme switch MAC address and are as follows:

 

Col-CEF-Core1

02:04:96:9F:94:D8

 

Col-2A22-Core2

02:04:96:9F:A4:74

 

To view the ARP table of each of the P2P ports on the  Palo Alto’s you can use the following commands:

 

show arp ethernet1/5
show arp ethernet1/6
show arp ethernet1/7
show arp ethernet1/8

 

This is the state of the ARP table on the active firewall before failover:

 

Firewall A

 

interface         ip address      hw address        port              status   ttl
--------------------------------------------------------------------------------
ethernet1/5       172.20.251.66   02:04:96:9f:94:d8 ethernet1/5         c      1535
ethernet1/6       172.20.251.70   02:04:96:9f:94:d8 ethernet1/6         c      1528
ethernet1/7       172.20.251.74   02:04:96:9f:a4:74 ethernet1/7         c      1525
ethernet1/8       172.20.251.78   02:04:96:9f:a4:74 ethernet1/8         c      1526

 

This is the state of the ARP table after failover:

 

Firewall B

 

interface         ip address      hw address        port              status   ttl
--------------------------------------------------------------------------------
ethernet1/5       172.20.251.66   02:04:96:9f:94:d8 ethernet1/5         c      1245
ethernet1/6       172.20.251.70   02:04:96:9f:94:d8 ethernet1/6         c      1245
ethernet1/7       172.20.251.74   02:04:96:9f:a4:74 ethernet1/7         c      1245
ethernet1/8       172.20.251.78   02:04:96:9f:a4:74 ethernet1/8         c      1250

 

The issue here is that the P2P ports move to the other switches when the passive firewall becomes active, that means the MAC addresses should have swapped around but they have not!

When you issue the ‘clear arp all’ command on the Palo Alto, this then refreshes the ARP entries to the now correct order and all works, see below:

 

Clear ARP All on active firewall:

 

ethernet1/5       172.20.251.66   02:04:96:9f:a4:74 ethernet1/5         c      1782
ethernet1/6       172.20.251.70   02:04:96:9f:a4:74 ethernet1/6         c      1782
ethernet1/7       172.20.251.74   02:04:96:9f:94:d8 ethernet1/7         c      1777
ethernet1/8       172.20.251.78   02:04:96:9f:94:d8 ethernet1/8         c      1735

 

The answer was to move the cables around on the passive firewall so that the same P2P subnet become active on the same switch so the MAC presented to the Palo stayed the same!

 

This may well be what the proper method is, or some alternative configuration may have helped but this sorted the issue for me.

 

Below is a diagram showing connections previously in the top row and then what I had moved them too below that in red:

 

 

View original

6 replies

Userlevel 6

Is the arp entry getting stuck with the old wrong MAC?

Userlevel 5
Badge

Hi David,

Thanks for posting.

Just checked and the MAC address seems to be moving OK:

Firewall A Active

 

Slot-1 Col-CEF-Core1.1 # show fdb ports 1:42-43,2:42-43
MAC                                      VLAN Name( Tag)  Age  Flags          Port / Virtual Port List
------------------------------------------------------------------------------------------------------
b4:0c:25:e2:c0:44                  FW1-Link1-Core1(3501) 0000  d mi           1:42
b4:0c:25:e2:c0:45                  FW1-Link2-Core1(3502) 0000  d mi           2:42

 

* Slot-1 Col-2A22-Core2.1 # show fdb ports 1:42-43,2:42-43
MAC                                      VLAN Name( Tag)  Age  Flags          Port / Virtual Port List
------------------------------------------------------------------------------------------------------
b4:0c:25:e2:c0:46                  FW1-Link3-Core2(3503) 0000  d mi           1:43
b4:0c:25:e2:c0:47                  FW1-Link4-Core2(3504) 0000  d mi           2:43


Firewall B Active

 

Slot-1 Col-CEF-Core1.2 # show fdb ports 1:42-43,2:42-43
MAC                                      VLAN Name( Tag)  Age  Flags          Port / Virtual Port List
------------------------------------------------------------------------------------------------------
b4:0c:25:e2:c0:46                  FW2-Link3-Core1(3503) 0000  d mi           1:43
b4:0c:25:e2:c0:47                  FW2-Link4-Core1(3504) 0000  d mi           2:43

 

* Slot-1 Col-2A22-Core2.2 # show fdb ports 1:42-43,2:42-43
MAC                                      VLAN Name( Tag)  Age  Flags          Port / Virtual Port List
------------------------------------------------------------------------------------------------------
b4:0c:25:e2:c0:44                  FW2-Link1-Core2(3501) 0000  d mi           1:42
b4:0c:25:e2:c0:45                  FW2-Link2-Core2(3502) 0000  d mi           2:42

 

Checked on core 1 that the previous MAC addresses did not get stuck but are not showing:


Slot-1 Col-CEF-Core1.3 # show fdb b4:0c:25:e2:c0:44 
MAC                                      VLAN Name( Tag)  Age  Flags          Port / Virtual Port List
------------------------------------------------------------------------------------------------------

Slot-1 Col-CEF-Core1.3 # show fdb b4:0c:25:e2:c0:45
MAC                                      VLAN Name( Tag)  Age  Flags          Port / Virtual Port List

 

Thanks

 

Martin

Userlevel 5
Badge

Apologies realised I didn’t answer your question fully, but seems OK:

 

Firewall A Active

 

Slot-1 Col-CEF-Core1.5 # show iparp | inc b4:0c:25:e2:c0:44

VR-Default    172.20.251.65    b4:0c:25:e2:c0:44    3      NO  FW1-Link1-Core1  3501  1:42

Slot-1 Col-CEF-Core1.6 # show iparp | inc b4:0c:25:e2:c0:45 

VR-Default    172.20.251.69    b4:0c:25:e2:c0:45    3      NO  FW1-Link2-Core1  3502  2:42

Slot-1 Col-CEF-Core1.7 # show iparp | inc b4:0c:25:e2:c0:46

Slot-1 Col-CEF-Core1.8 # show iparp | inc b4:0c:25:e2:c0:47

 

* Slot-1 Col-2A22-Core2.1 # show iparp | inc b4:0c:25:e2:c0:44

* Slot-1 Col-2A22-Core2.2 # show iparp | inc b4:0c:25:e2:c0:45 

* Slot-1 Col-2A22-Core2.3 # show iparp | inc b4:0c:25:e2:c0:46

VR-Default    172.20.251.73    b4:0c:25:e2:c0:46    4      NO  FW1-Link3-Core2  3503  1:43

* Slot-1 Col-2A22-Core2.4 # show iparp | inc b4:0c:25:e2:c0:47

VR-Default    172.20.251.77    b4:0c:25:e2:c0:47    4      NO  FW1-Link4-Core2  3504  2:43

 

Firewall B Active

 

Slot-1 Col-CEF-Core1.8 # show iparp | inc b4:0c:25:e2:c0:44

Slot-1 Col-CEF-Core1.9 # show iparp | inc b4:0c:25:e2:c0:45 

Slot-1 Col-CEF-Core1.10 # show iparp | inc b4:0c:25:e2:c0:46

VR-Default    172.20.251.73    b4:0c:25:e2:c0:46    0      NO  FW2-Link3-Core1  3503  1:43

Slot-1 Col-CEF-Core1.11 # show iparp | inc b4:0c:25:e2:c0:47

VR-Default    172.20.251.77    b4:0c:25:e2:c0:47    0      NO  FW2-Link4-Core1  3504  2:43

 

VR-Default    172.20.251.65    b4:0c:25:e2:c0:44    1      NO  FW2-Link1-Core2  3501  1:42

* Slot-1 Col-2A22-Core2.6 # show iparp | inc b4:0c:25:e2:c0:45 

VR-Default    172.20.251.69    b4:0c:25:e2:c0:45    1      NO  FW2-Link2-Core2  3502  2:42

* Slot-1 Col-2A22-Core2.7 # show iparp | inc b4:0c:25:e2:c0:46

* Slot-1 Col-2A22-Core2.8 # show iparp | inc b4:0c:25:e2:c0:47

 

Thanks,

 

Martin

Userlevel 5
Badge

Ok, so some new information……

Turns out if I just leave it, after about 15 minutes all the ling spontaneously start working?!

 

 

Userlevel 5
Badge

Does look like an ARP issue. If I clear the ARP entries for each of the P2P VLANs the connections all start working.

The firewall is meant to be sending a gratuitous ARP when it moves, and looks like the entries are all correctly moving and nothing is getting stuck, but clearly an ARP related issue?

Just need to figure out a solution.

Will post back if I find a fix unless someone in the community has an answer beforehand.

Many thanks.

Userlevel 5
Badge

This is now fixed.

Seems I have been looking at the issue in reverse…….

The MAC addresses of each the ports being presented to the Palo Alto are based on the Extreme switch MAC address and are as follows:

 

Col-CEF-Core1

02:04:96:9F:94:D8

 

Col-2A22-Core2

02:04:96:9F:A4:74

 

To view the ARP table of each of the P2P ports on the  Palo Alto’s you can use the following commands:

 

show arp ethernet1/5
show arp ethernet1/6
show arp ethernet1/7
show arp ethernet1/8

 

This is the state of the ARP table on the active firewall before failover:

 

Firewall A

 

interface         ip address      hw address        port              status   ttl
--------------------------------------------------------------------------------
ethernet1/5       172.20.251.66   02:04:96:9f:94:d8 ethernet1/5         c      1535
ethernet1/6       172.20.251.70   02:04:96:9f:94:d8 ethernet1/6         c      1528
ethernet1/7       172.20.251.74   02:04:96:9f:a4:74 ethernet1/7         c      1525
ethernet1/8       172.20.251.78   02:04:96:9f:a4:74 ethernet1/8         c      1526

 

This is the state of the ARP table after failover:

 

Firewall B

 

interface         ip address      hw address        port              status   ttl
--------------------------------------------------------------------------------
ethernet1/5       172.20.251.66   02:04:96:9f:94:d8 ethernet1/5         c      1245
ethernet1/6       172.20.251.70   02:04:96:9f:94:d8 ethernet1/6         c      1245
ethernet1/7       172.20.251.74   02:04:96:9f:a4:74 ethernet1/7         c      1245
ethernet1/8       172.20.251.78   02:04:96:9f:a4:74 ethernet1/8         c      1250

 

The issue here is that the P2P ports move to the other switches when the passive firewall becomes active, that means the MAC addresses should have swapped around but they have not!

When you issue the ‘clear arp all’ command on the Palo Alto, this then refreshes the ARP entries to the now correct order and all works, see below:

 

Clear ARP All on active firewall:

 

ethernet1/5       172.20.251.66   02:04:96:9f:a4:74 ethernet1/5         c      1782
ethernet1/6       172.20.251.70   02:04:96:9f:a4:74 ethernet1/6         c      1782
ethernet1/7       172.20.251.74   02:04:96:9f:94:d8 ethernet1/7         c      1777
ethernet1/8       172.20.251.78   02:04:96:9f:94:d8 ethernet1/8         c      1735

 

The answer was to move the cables around on the passive firewall so that the same P2P subnet become active on the same switch so the MAC presented to the Palo stayed the same!

 

This may well be what the proper method is, or some alternative configuration may have helped but this sorted the issue for me.

 

Below is a diagram showing connections previously in the top row and then what I had moved them too below that in red:

 

 

Reply