cancel
Showing results for 
Search instead for 
Did you mean: 

X440 90+% CPU usage

X440 90+% CPU usage

davidj_cogliane
Contributor
We have multiple closets at a EDU customer's High School many of which are seeing 90+% CPU spikes on a regular basis. I have read the articles about 8-20% CPU usage and the cli related CPU levels of 30+%.

The customer is experiencing random network issues this for the second day in a row and it is hard to argue against the CPU usage being related. Yes, I understand switching should not be affected by CPU usage.

These are stacks of 3-6 X440-G2-48p.
They are running summitX-22.5.1.7-patch1-7
ELRP is configured
configure elrp-client periodic vlan NOLOOP ports all disable-port duration 600
configure elrp-client disable-port exclude ports 1:52
NOLOOP is on every port on every stack.
13 REPLIES 13

FredrikB
Contributor II
Hi!

configure forwarding ipmc lookup-key group-vlan used to be more of a pain with the X440 non G2, but surely it can affect the G2 too. It does have very small tables even if they are magnitudes better than the old X440... I'd start monitoring the switches with some ping tracer to see where the problem starts. Do you have in-band management on the X440G2s? If so, setting up smokeping (made by the same guy that made MRTG and RRD, Tobi Oetriker) on a Linux PC (or VM) could let you trace where and when the problems occur. You have a lot of work in front of you finding the cause, but if you're lucky, tools like this can help. A TAC case will certainly be helpful too as suggested.

Simply pinging a lot of switches (if management is in-band) might be a good starting point, but smokeping is my favourite! Remember to ping with a size of 1472 bytes as that will produce IP packets with 1500 bytes size. If you use smaller paket size, MTU problems may not be detected (not that this seems to be the case here, but...).

Check optical levels if you have fiber links. "show ports transceiver information".

Check for RX errors (and collisions and TX errors while you're at it), mainly on uplinks. You should have a tool to monitor that along with utilization.

Check for congestion (tail drops, buffers running over) "show ports qosmonitor congestion"
You'll be surprised to see how soon some switches drop packets. I've seen heavy tail drops on X460 (I think) with 22 % utilization (as reported by the CLI 2 second update).

/Fredrik

davidj_cogliane
Contributor
The CPU utilization messages have gone away but the customer is still reporting intermittent issues with web traffic.
Waiting to hear if they see anything with their webfilter or firewall.

davidj_cogliane
Contributor
I found that the IPv4 MCast entries on the L3 Hash Table were running between 1750 and 1800 of the theoretical max (2048)

I ran [configure forwarding ipmc lookup-key group-vlan] and entries are down to 8-20. Since this change was made the CPU utilization messages have stopped and customer issues appear to have stopped.

We will continue monitoring...

BradP
Extreme Employee
it might be best to call GTAC to see if we can get a tcp dump from the CPU to see what's actually causing it.

BradP
Extreme Employee
some movement is normal as clients roam from AP to AP. But if they are going back and forth a lot, it can signify that the MU is between two different APs and the signal strength is such that the MU connects to both in rapid succession.
GTM-P2G8KFN