IPv6 forwarding issue with OSPFv3 in a user VR

  • 0
  • 1
  • Problem
  • Updated 11 months ago
  • Solved
Hi all

Before I open a TAC case about this, I'm just trying to find out if anyone else is doing something similar and has the same issue.

Somewhat simplifying, I have four devices physically connected like this:

[ Cisco 1 ] -------- [ X460G2 "RX1" ]
: :
: :
: :
: :
: :
[ Cisco 2 ] -------- [ X460G2 "RX2" ]
On each physical link, there is a point-to-point VLAN in a user VR, 'VR-Internet', with v4 and v6 addresses.  This appears to be the important point.

These all run OSPF, OSPFv3 and iBGP between them.  Happy little routing network :)

Except that recently we had a failure of Cisco 2 - and I noticed that the IPv6 iBGP sessions between RX2 and RX1, and between RX2 and Cisco 1 went down.

Some troubleshooting later and it seems that the following case is true if all links are up:
1) It all works as expected.
2) You can ping RX1's loopback from RX2, v4 and v6.
3) OSPFv3 shows adjacencies all up.  iBGP all up.
4) On RX2, the IPv6 next-hop for RX1's loopback is via the link directly to RX1 as expected.

If you break the link between RX2 and Cisco 2 (or Cisco 2 goes away):
1) Things don't work as expected.
2) You can ping RX1's loopback (and everything else in the network) from RX2 on IPv4.
3) OSPFv3 still shows adjacencies are up between RX1 and RX2.
4) On RX2, the IPv6 next-hop for RX1's loopback remains the same, the directly connected link to RX1 as expected.
5) However, you cannot ping RX1's loopback from RX2 on IPv6.
6) Nor can you ping Cisco 1's loopback from RX2 on IPv6.
7) Unsurprisingly, given (5) and (6) above, iBGP goes down.
8) If I work in VR-Default on these switches, and configure some IPv6 between RX1 and RX2, things work as expected.  This is a key point, it seems that the forwarding inside the VR is the problem and not OSPFv3 in general.

I'm in the process of trying to reproduce in the lab on a much simpler config (in theory, this can be done with two X460 G2s with two VRFs and one point to point link).

Has anyone else seen anything similar?  I'm running 21.1.3.7 on these X460s.

Paul.
Photo of Paul Thornton

Paul Thornton

  • 1,374 Points 1k badge 2x thumb

Posted 12 months ago

  • 0
  • 1
Photo of Jarek

Jarek

  • 2,398 Points 2k badge 2x thumb
Hi Paul,

what do you see after 'sh iproute vr VR-Internet' on RX2 and RX1?
Do you see any route to RX1's loopback ?

--
Jarek
Photo of Paul Thornton

Paul Thornton

  • 1,374 Points 1k badge 2x thumb
Hi Jarek,

Absolutely.  The route to RX2's loopback is present and correct in all cases, so OSPFv3 is doing the right thing.  It is almost like I've forgotten to 'enable ipforwarding ipv6' on RX1 (which I haven't - it works - using the same route - if the two Ciscos are up and connected).

Paul.

Edit to add: I meant the route to RX1's loopback in the first line.  Typo!
(Edited)
Photo of Jarek

Jarek

  • 2,398 Points 2k badge 2x thumb
Are the routes provided to FIB ?
Or maybe only visible in RIB ?

Can you for example  ping IP from RX2 to RX1 ? ( I mean the directly connected IP's )

--
Jarek
Photo of Paul Thornton

Paul Thornton

  • 1,374 Points 1k badge 2x thumb
I think the issue may be a RIB -> FIB problem.

If I ping across the directly connected link, it works fine.  The moment I go 'one hop' further (eg: to the loopback of the switch) it fails.

Both devices have sensible routes to each other - using the directly connected link's addresses (well, it is v6 - so the link local addr is the next hop).

The part that really confuses me is that in this broken state, if you reconnect the Ciscos, it becomes possible to ping RX1's loopback from RX2 again.  With no change of best next hop or anything else I can see from the output of 'rtlook xxx'.  Very, very odd.

Paul.
Photo of Jarek

Jarek

  • 2,398 Points 2k badge 2x thumb
When you do show route, do you see the 'f' ? (provided to fib)
Example:
#oa   172.16.20.0/24     172.12.2.21         1     UG-D---um--f-         526d:1h:52m:59s

Maybe you can change the IP's and paste the 'show iproute' from the devices ?

--
Jarek
Photo of Paul Thornton

Paul Thornton

  • 1,374 Points 1k badge 2x thumb
Hi

Ah-ha, progress.
No, the OSPFv3 route isn't making it to the FIB.
However, if I configure static routes, it *does* make it to the FIB:

:79 is the loopback address of RX1
:78 is the loopback address of RX2

The link between them (VLAN rx1rx2_pri) is 2001:xxxx:0:14::/64

* (vr VR-Internet) rx2.thn.40 # rtlook 2001:xxxx::79
Destination Mtr Flags Origin
Gateway Interface
2001:xxxx::79/128 2 UGHD---um--c- OSPFv3Intra
fe80::204:96ff:fe98:8701 rx1rx2_pri

* (vr VR-Internet) rx2.thn.41 # ping 2001:xxxx::79
Ping(ICMP6) 2001:xxxx::79: 4 packets, 8 data bytes, interval 1 second(s).
--- 2001:xxxx::79 ping statistics ---
4 packets transmitted, 0 packets received, 100% loss
round-trip min/avg/max = 0/0/0 ms
If I configure a static route across the link, I can ping loopback to loopback:

* (vr VR-Internet) rx1.thn.19 # config iproute add 2001:xxxx::78/128 2001:xxxx:0:14::2

* (vr VR-Internet) rx2.thn.53 # config iproute add 2001:xxxx::79/128 2001:xxxx:0:14::1


* (vr VR-Internet) rx2.thn.54 # rtlook 2001:xxxx::79
Destination Mtr Flags Origin
Gateway Interface
2001:xxxx::79/128 1 UG---S-um--f- Static
2001:xxxx:0:14::1 rx1rx2_pri

* (vr VR-Internet) rx2.thn.55 # ping 2001:xxxx::78
Ping(ICMP6) 2001:xxxx::78: 4 packets, 8 data bytes, interval 1 second(s).
16 bytes from 2001:xxxx::78: icmp_seq=0 ttl=64 time=0.238 ms
16 bytes from 2001:xxxx::78: icmp_seq=1 ttl=64 time=0.134 ms
16 bytes from 2001:xxxx::78: icmp_seq=2 ttl=64 time=0.147 ms
^C
--- 2001:xxxx::78 ping statistics ---
4 packets transmitted, 4 packets received, 0% loss
round-trip min/avg/max = 0/0/0 ms
So something strange is happening with the nexthop address.

00:04:96:98:87:01 is the switch MAC address of rx1, so that link-local nexthop looks sensible.

This is where I swear loudly that OSPFv3 uses link-local addresses and not global addresses for next-hops, thus making troubleshooting that bit harder :/

Paul.

Photo of Jarek

Jarek

  • 2,398 Points 2k badge 2x thumb
Could you paste from RX1 and RX2:

- sh iproute
- sh configuration ospf

--
Jarek
Photo of Paul Thornton

Paul Thornton

  • 1,374 Points 1k badge 2x thumb
Hi

I made an interesting discovery just now whilst collecting the routing information and ospfv3 config.  The cause of this problem appears to be iproute compression somehow.

If I disable iproute compression, everything works.  I need to check this out of hours properly (as it is part of our production network) but I think the new combination of features that cause this issue are:
1) User VR
2) OSPFv3
3) iproute compression enabled

Paul.
Photo of Jarek

Jarek

  • 2,398 Points 2k badge 2x thumb