cancel
Showing results for 
Search instead for 
Did you mean: 

BGP Process taking up all the CPU

BGP Process taking up all the CPU

Cavan
New Contributor II
Running several 10K Black Diamonds with ExtremeXOS version 15.1.4.3 patch1-4. The routers randomly become very slow running any BGP related commands. CPU runs dcbgp at 50+% for hours on end. No flapping of BGP neighbors. It still seems to route traffic ok, but have had to reboot after seeing this go on for over 24 hours at times. Currently receiving about 250000 routes. Is this a known issue? Any advice or thoughts on what might be causing this?
5 REPLIES 5

Cavan
New Contributor II
Thanks for that. Have you tried terminating unused processes like netlogin, isis, vrrp..etc as the other user in this thread recommended?

Dave_E_Martin
New Contributor
I'll add that on a 480 with full internet routes from multiple providers, changing a policy caused it to go to 50% cpu (presumably the 480 has two cores?) for about 20 to 40 minutes, as it worked through applying the policy change and propogating it out to its peers. If the policy was again updated before this process was complete, It would end up "stuck" at 50% cpu, until reboot, or restart of bgp or peers. This 480 was serving as a reflector for several dozen peers. Typically, when it got stuck, we could disable the reflector peer group, wait until CPU dropped to normal, then re-enable the peer group, and verify after an hour or so that CPU had dropped back to normal.

Dave_E_Martin
New Contributor
This problem happened to us on Summit 480/460. We have opened several TAC cases over it, it is not resolved.

We have found:

The more BGP routes you have, the more likely it is to occur.

The more BGP peers you have (such as if the switch is serving as a reflector) the more likely it is to occur.

If a BGP peer goes up or down while the switch is still processing another BGP peer having recently gone up or down, the more likely it is to occur.

If a policy change occurs while BGP is still processing updates, it might occur.

The problem is hard to reproduce consistently.

It occurred to us once on a 460 with only 2 bgp peers and 2500 or so routes. Usually it occurs on our 480s with several hundred thousand routes.

You might be able to fix it by disabling then re-enabling one or some of your BGP peers (or peer groups if you are using them), and then waiting about an hour (at least on summit 480 with full internet routes). Alternatively, you can restart process bgp (note the impact these actions may have on your network).

Essentially, you can do "show bgp route summary" (and/or show bgp route ipv6 summary) and wait until the counters stabilize. At that point, CPU usage should drop to normal. If it doesn't then try again (stopping/starting BGP peers or BGP itself, or reboot).

This has been a very frustrating problem, and I feel vindicated to hear that it is happening to someone else. It has happened throughout many 15.x versions.

PARTHIBAN_CHINN
Contributor
See if it is possible to reduce the routing table.
Terminate sessions.
Terminate process which are not needed in the switch.
if it is a core switch and if netlogin,isis,vrrp not used.
Terminate these process.
GTM-P2G8KFN