cancel
Showing results for 
Search instead for 
Did you mean: 

Preparing for BGP neighbor outage

Preparing for BGP neighbor outage

Frank
Contributor
OK, so let's say I'm BGP multihomed with multiple providers, using two routers (480s), and that I have my own ASN (12345). My BGP is happily trucking away, and I'm advertising my networks to all my peers.

Now provider X tells me that they'll have to do maintenance on my circuit. If that BGP peer drops, I know that I'll still have Internet access, but I also know that I will have a 3-4 minute window where BGP re-shuffles routes, and everything that used to come in or go out through provider X drops connectivity - essentially a short service outage, and there are people out there that (a) notice, and (b) aren't too happy when that happens.

So how do I best prepare for that? In the past (cisco gear), I've pre-pended my advertised ASN path to neighbor X with "10 more of 12345". Essentially, that keeps existing connections alive (I hope), but within 10 minutes or so, nobody should use that peer for incoming traffic anymore.

Is there a better way other than AS-prepend? (I don't think anyone implemented RFC 6198 yet)

I already have a policy in place for adverts out:
configure bgp neighbor 1.2.3.4 route-policy out AS-Localonlyto ensure that I only advertise locally originating paths. I'm thinking I could use that for prepends like this:

AS-Prepend.pol:entry prepend-localonly {
if {
as-path "^$"
} then {
as-path "29765 29765 29765 29765 29765 29765 29765 29765 29765 29765";
permit;
}
}
entry DenyRest {
if {
} then { deny; }
}
Would this be a working policy? And I could just activate it with

configure bgp neighbor 1.2.3.4 route-policy out AS-Prepend(possibly after an unconfigure bgp neighbor 1.2.3.4 route-policy out AS-Localonly, or whatever the proper syntax for that is)

Would that work? Is there a better way to do this? Again, the goal is to not have disruption due to BGP route convergence when one peer drops, because I'm shuffling traffic away before the drop.

Thanks!

P.S.: Bonus points - how do I script that (or whatever alternative), if I even can? If I know that the window is from 1am to 3am, I could automatically do the "config bgp neigh..." thing at 12:30am, and re-set it at 4:00am and never lose any sleep 🙂
9 REPLIES 9

Stephane_Grosj1
Extreme Employee
Don't confuse FIB and RIB.
FIB (hardware) has 512K IPv4 LPM (~524,000), this is the best unique routes.
RIB (software) can be much bigger, and hold several full table.

compression will reduce the FIB size by 40-50%

Frank
Contributor
Thank you for all your help! 🙂

I'm not concerned with the convergence times on my routers as much, but the convergence on 'the world' for inbound traffic. From what I understand, "disable neighbor" or "unplug cable" has the same effect - inbound traffic through that neighbor takes a 3 minute hit.
If I prepend to that neighbor, existing traffic will still come in, but after a few minutes other neighbors will be preferred (from the world to us), and we don't fall into a partial 3-minute "hole" (where connections that think the disabled neighbor is still up go down that path and then die).
I think I'll prepend-announce for 15-30 minutes, then do a 'disable neighbor' before the link drops.

As to the full table, it's been my experience that over 500K pre-compressed routes do actually upset the 480 rather much, causing all kinds of difficulties. I'm not sure if it's 500,000 or 512,000 or what the exact number is, but there appears to be a limit in that range to which the 480 takes somewhat grave exceptions - especially if I get them from two neighbors on one router - and then have another router with another neighbor with full routes. Setting the limit to 500K seems to keep the 480s happy - and I still have defaults for whatever I missed.

Stephane_Grosj1
Extreme Employee
Hi,

you can have a full view on a x480, just enable iproute compression, the effect is big.

As for a convergence time with full table, I'd expect ~3mn from previous tests.

Best Regards,
Stephane

Paul_Thornton
New Contributor III
Hi Frank,

Sorry, missed your reply earlier this week.

I would normally just disable the neighbour to 'cleanly' shut down the peer.

Compare this with either the port going down (which should, in theory, result in the same thing for all sane BGP implementations) or where the end-to-end link between two BGP speaking routers goes down without loss of link at both ends. In this case, the timers need to expire and with default timer settings you have 3 minutes of one device sending packets down a link that will never work.

So I haven't tried a full table in an X480 - I know in theory it can do it, but given that the global IPv4 table is now standing at 531,000 routes you're forced to only use a subset anyway. In the places I've worked with X480s doing Internet-connected BGP, it has been with default routes from a couple of transit providers and somewhere between 1,000 and 80,000 local routes from peering.

Maybe Stephane has an idea of convergence time with two X480s with 500K routes in them - I've not seen any convergence issues with smaller BGP rib sizes.

If you are seeing convergence time issues between your switches when a path changes, you'll almost certainly see that whether or not a route is withdrawn because one transit provider went away (bgp session disabled or fibre pulled, for example) or because a route is withdrawn because you're changing the local preference or deciding to not accept the route through a policy filter.

Paul.

GTM-P2G8KFN