Preparing for BGP neighbor outage

  • 0
  • 1
  • Question
  • Updated 3 years ago
  • Answered
OK, so let's say I'm BGP multihomed with multiple providers, using two routers (480s), and that I have my own ASN (12345). My BGP is happily trucking away, and I'm advertising my networks to all my peers.

Now provider X tells me that they'll have to do maintenance on my circuit. If that BGP peer drops, I know that I'll still have Internet access, but I also know that I will have a 3-4 minute window where BGP re-shuffles routes, and everything that used to come in or go out through provider X drops connectivity - essentially a short service outage, and there are people out there that (a) notice, and (b) aren't too happy when that happens.

So how do I best prepare for that? In the past (cisco gear), I've pre-pended my advertised ASN path to neighbor X  with "10 more of 12345". Essentially, that keeps existing connections alive (I hope), but within 10 minutes or so, nobody should use that peer for incoming traffic anymore.

Is there a better way other than AS-prepend? (I don't think anyone implemented RFC 6198 yet)

I already have a policy in place for adverts out:
configure bgp neighbor 1.2.3.4 route-policy out AS-Localonly
to ensure that I only advertise locally originating paths. I'm thinking I could use that for prepends like this:

AS-Prepend.pol:
entry prepend-localonly {
        if {
                as-path "^$"
        } then {
                as-path "29765 29765 29765 29765 29765 29765 29765 29765 29765 29765";
                permit;
        }
}
entry DenyRest {
if {
} then { deny; }
}
Would this be a working policy? And I could just activate it with

configure bgp neighbor 1.2.3.4 route-policy out AS-Prepend
(possibly after an unconfigure bgp neighbor 1.2.3.4 route-policy out AS-Localonly, or whatever the proper syntax for that is)


Would that work? Is there a better way to do this? Again, the goal is to not have disruption due to BGP route convergence when one peer drops, because I'm shuffling traffic away before the drop.

Thanks!

P.S.: Bonus points - how do I script that (or whatever alternative), if I even can? If I know that the window is from 1am to 3am, I could automatically do the "config bgp neigh..." thing at 12:30am, and re-set it at 4:00am and never lose any sleep :)
Photo of Frank

Frank

  • 3,662 Points 3k badge 2x thumb

Posted 3 years ago

  • 0
  • 1
Photo of Grosjean, Stephane

Grosjean, Stephane, Employee

  • 12,562 Points 10k badge 2x thumb
Hi,

you're adding a lot of times your own AS, hopefully not as much as what did crash the internet in the past years.

AS-Prepend will influence the traffic coming into your AS.
For the traffic going out of your AS, you want to modify local-pref.

Is that going to be applied to a full view (your main feed) or for some routes? 3mn convergence is for a full view failover.

You can easily script your CLI commands and have them launched by UPM timer.

What EXOS version do you use?
Photo of Frank

Frank

  • 3,662 Points 3k badge 2x thumb
That sounds as if my policy should do what I want it to do - thanks!

I'm using 15.3 in one location, and 15.5 in another - and I may have been exaggerating the prepends. Past experience on my Cisco routers (which we'll eventually phase out) has shown that 3-4 path prepends are enough to kill inbound traffic from that provider and force the world through the others. Interestingly enough that has also throttled outbound traffic through that neighbor, but I'm not clear why.

Changing local-pref doesn't seem to be much of an option, as I'd have to disable bgp to change it. It also seems that I can't put that preference in on a per-neighbor basis. I think if I need to mess with outbound paths (to move things away from a neighbor), I could probably prepend their BGP path similarly with an "as-path '123 123 123' " where "123" is that neighbor's ASN. I'd just do it as a "route-policy in".

Thanks for the answer!
Photo of Grosjean, Stephane

Grosjean, Stephane, Employee

  • 12,562 Points 10k badge 2x thumb
Hi,

from my experience, the average AS-Path length is 4-5. So adding 3 times your AS should have a strong enough "impact".

For local-pref, you have a knob in CLI to change it globally, and it requires to disable BGP first, right. However you can create policies to set a given local-pref per prefix.
Photo of Paul Thornton

Paul Thornton

  • 1,374 Points 1k badge 2x thumb
Bear in mind that whether you prepend once, or 20 times, another ISP can be using local-pref to select what you consider the 'bad' route.  This is particularly the case for other customers of your direct transit provider - as if I am an ISP and you are my customer, I will always prefer the direct connection I'm selling you over a peer or other connection.  Some ISPs allow you to use communities to perform traffic engineering to limit the way they propagate routes you're announcing to them, but that is getting a bit complex for your use case I think.

The way we tend to do this is either: (a) configure the peering session to announce zero routes, and accept no routes from the peer, around 30 minutes before any maintenance; or (b) simply shut down the session cleanly before any work starts.

Whatever you do, you are going to cause some recalculation - the important part is that when a BGP peering is closed gracefully (ie: by one end shutting it down), there are no issues with hold timers needing to expire and the routers at both ends will be able to send withdrawals of any routes learned via the peer immediately.

How many routes are you getting from your providers (and how many do you announce)?  Is it a lot (I hope not a full table on an X480!) - If so, the use of 'don't announce anything / don't accept anything' policy before maintenance may work well.  If you only have a default route, then cleanly closing the session should work ok.

I can see that prepending may seem a good idea, but I'm not sure that you'll actually benefit much more doing that than simply ensuring that the BGP neighbour is cleanly disabled before the maintenance.

One thing to be aware of - if you stop announcing prefixes to one of your upstreams in preparation for maintenance, of course you now have a single point of failure with the other one!

Paul.
Photo of Frank

Frank

  • 3,662 Points 3k badge 2x thumb
Paul,

Could you elaborate on the "clean shutdown"? Would a "disable bgp neighbor X" do that without incurring the 3-minute problem? Or is there a better command (sequence) to do that?

How would I announce zero routes to a specific neighbor? "Deny everything" on an outbound bgp policy?

We're only announcing a handful of blocks. About 6-8 IPv4 blocks and one IPv6 block.
Yes, we're getting "full routes plus default" from our upstream providers, but limit to 500K routes as not to overload the routing table. The 480s were advertised to us as "datacenter ingress/egress routers" to replace Cisco 7606s

We have 3-4 providers between two routers in our datacenters, so fortunately when one does maintenance, we can still afford another one to fail without warning. "Backhoe mating season" might still catch us, though ;)

Thanks!

    Frank
Photo of Paul Thornton

Paul Thornton

  • 1,374 Points 1k badge 2x thumb
Hi Frank,

Sorry, missed your reply earlier this week.

I would normally just disable the neighbour to 'cleanly' shut down the peer.

Compare this with either the port going down (which should, in theory, result in the same thing for all sane BGP implementations) or where the end-to-end link between two BGP speaking routers goes down without loss of link at both ends.  In this case, the timers need to expire and with default timer settings you have 3 minutes of one device sending packets down a link that will never work.

So I haven't tried a full table in an X480 - I know in theory it can do it, but given that the global IPv4 table is now standing at 531,000 routes you're forced to only use a subset anyway.  In the places I've worked with X480s doing Internet-connected BGP, it has been with default routes from a couple of transit providers and somewhere between 1,000 and 80,000 local routes from peering.

Maybe Stephane has an idea of convergence time with two X480s with 500K routes in them - I've not seen any convergence issues with smaller BGP rib sizes.

If you are seeing convergence time issues between your switches when a path changes, you'll almost certainly see that whether or not a route is withdrawn because one transit provider went away (bgp session disabled or fibre pulled, for example) or because a route is withdrawn because you're changing the local preference or deciding to not accept the route through a policy filter.

Paul.
Photo of Grosjean, Stephane

Grosjean, Stephane, Employee

  • 12,562 Points 10k badge 2x thumb
Hi,

you can have a full view on a x480, just enable iproute compression, the effect is big.

As for a convergence time with full table, I'd expect ~3mn from previous tests.

Best Regards,
Stephane
Photo of Frank

Frank

  • 3,662 Points 3k badge 2x thumb
Thank you for all your help! :)

I'm not concerned with the convergence times on my routers as much, but the convergence on 'the world' for inbound traffic. From what I understand, "disable neighbor" or "unplug cable" has the same effect - inbound traffic through that neighbor takes a 3 minute hit.
If I prepend to that neighbor, existing traffic will still come in, but after a few minutes other neighbors will be preferred (from the world to us), and we don't fall into a partial 3-minute "hole" (where connections that think the disabled neighbor is still up go down that path and then die).
I think I'll prepend-announce for 15-30 minutes, then do a 'disable neighbor' before the link drops.

As to the full table, it's been my experience that over 500K pre-compressed routes do actually upset the 480 rather much, causing all kinds of difficulties. I'm not sure if it's 500,000 or 512,000 or what the exact number is, but there appears to be a limit in that range to which the 480 takes somewhat grave exceptions - especially if I get them from two neighbors on one router - and then have another router with another neighbor with full routes. Setting the limit to 500K seems to keep the 480s happy - and I still have defaults for whatever I missed.
Photo of Grosjean, Stephane

Grosjean, Stephane, Employee

  • 12,562 Points 10k badge 2x thumb
Don't confuse FIB and RIB.
FIB (hardware) has 512K IPv4 LPM (~524,000), this is the best unique routes.
RIB (software) can be much bigger, and hold several full table.

compression will reduce the FIB size by 40-50%