high CPU in X450 switch


Hi everybody



we have X450 switch in our network is facing high CPU in the following process:



1- ./hal

2- ./FDB

3- bcmRX



also when I ping to the IP of the switch I noted the bcmRX process increase

I need to Know if this normal?



Mem: 246820K used, 7816K free, 0K shrd, 32580K buff, 72532K cached

CPU: 0.0% usr 100% sys 0.0% nic 0.0% idle 0.0% io 0.0% irq 0.0% sirq

Load average: 4.11 4.11 4.08 3/182 2021

PID PPID USER STAT RSS %MEM CPU %CPU COMMAND

1433 1 root S 3316 1.3 0 26.3 ./fdb

1405 1 root S < 11140 4.3 0 21.0 ./hal

1261 2 root SW 0 0.0 0 21.0 [bcmRX]

2021 2020 root R 780 0.3 0 15.7 top -d 3

1474 1 root S 1920 0.7 0 5.2 ./bfd

1407 1 root S 1596 0.6 0 5.2 ./nodemgr

1506 1505 root S 764 0.3 0 5.2 ./telnetd -e

1409 1 root S 31584 12.3 0 0.0 ./cliMaster

1415 1 root S 5456 2.1 0 0.0 ./snmpSubagent

1536 1 root S 4688 1.8 0 0.0 ./xmld

1561 1 root S 3964 1.5 0 0.0 ./idMgr

1514 1 root S 3920 1.5 0 0.0 ./etmon

1401 1 root S 3732 1.4 0 0.0 ./emsServer

1472 1 root S 3464 1.3 0 0.0 ./mcmgr

1413 1 root S 3292 1.2 0 0.0 ./snmpMaster

1423 1 root S 3288 1.2 0 0.0 ./vlan

1492 1 root S 3076 1.2 0 0.0 ./pim

1466 1 root S 2916 1.1 0 0.0 ./rtmgr update

1411 1 root S 2848 1.1 0 0.0 ./cfgmgr

1501 1 root S 2688 1.0 0 0.0 ./netTools

1499 1 root S 2640 1.0 0 0.0 ./acl



regards,

Hasan

8 replies

Userlevel 6
Its hard to tell just form this but you might have a loop on the network. I would suggest opening a case with GTAC to have this investigated further.
Userlevel 6
Hi Hasan,
As Patrick stated, there is a possibility of a loop.
the process FDB would consume high CPU when there is too much learning happening in the switch. This is possible at the time of mac-movement.

Share the following output:

show log counters fdb occurred.
Userlevel 7
I agree with Patrick, it seems that there may be a loop. The article linked below explains how to determine if there is a loop, and which ports it is on.
https://gtacknowledge.extremenetworks.com/articles/Q_A/Which-commands-can-be-used-to-detect-a-loop

-Brandon
Hi everybody



Thanks for your support ,



Just for your information when I typing the command “ top” without press number 1 I see the CUP is normal but with press number 1 I see the CPU is high can somebody tell me why ?

Also I typed the command L2stat and i found 5 Vlans are copy a lot of packets to CPU after that I used ELRP to detect any loop in those Vlans but there is no any loop there .

Bridge interface on VLAN MW_MGMT_3511:
Total number of packets to CPU = 4628.
Total number of packets learned = 882939.
Total number of IGMP control packets snooped = 11364.
Total number of IGMP data packets switched = 104.
Total number of MLD control packets snooped = 0.
Total number of MLD data packets switched = 0.

Bridge interface on VLAN MW_MGMT_501:
Total number of packets to CPU = 3482.
Total number of packets learned = 39523.
Total number of IGMP control packets snooped = 21646.
Total number of IGMP data packets switched = 1094.
Total number of MLD control packets snooped = 0.
Total number of MLD data packets switched = 0.

Bridge interface on VLAN KAEFER_Yanbu_L2VPN_HO:
Total number of packets to CPU = 149768.
Total number of packets learned = 391638.
Total number of IGMP control packets snooped = 23648.
Total number of IGMP data packets switched = 2854.
Total number of MLD control packets snooped = 0.
Total number of MLD data packets switched = 0.

Bridge interface on VLAN SGS_DIA:
Total number of packets to CPU = 3760.
Total number of packets learned = 2209.
Total number of IGMP control packets snooped = 0.
Total number of IGMP data packets switched = 0.
Total number of MLD control packets snooped = 0.
Total number of MLD data packets switched = 0.

Bridge interface on VLAN STS_2Mbps-DIA:
Total number of packets to CPU = 5499.
Total number of packets learned = 18835.
Total number of IGMP control packets snooped = 129.
Total number of IGMP data packets switched = 56.
Total number of MLD control packets snooped = 0.
Total number of MLD data packets switched = 0.

* JUB_020.21 # configure elrp-client one-shot "MW_MGMT_3511" ports all print-and-log

Starting ELRP Poll . . .
# NO LOOP DETECTED # --- vlan "MW_MGMT_3511" elrp statistics ---
3 packets transmitted, 0 received, ingress port (nil)

* JUB_020.22 # configure elrp-client one-shot "MW_MGMT_501" ports all print-and-log

Starting ELRP Poll . . .
# NO LOOP DETECTED # --- vlan "MW_MGMT_501" elrp statistics ---
3 packets transmitted, 0 received, ingress port (nil)

* JUB_020.20 # configure elrp-client one-shot "KAEFER_Yanbu_L2VPN_HO" ports all print-and-log

Starting ELRP Poll . . .
# NO LOOP DETECTED # --- vlan "KAEFER_Yanbu_L2VPN_HO" elrp statistics ---
3 packets transmitted, 0 received, ingress port (nil)

* JUB_020.23 # configure elrp-client one-shot "SGS_DIA" ports all print-and-log

Starting ELRP Poll . . .
# NO LOOP DETECTED # --- vlan "SGS_DIA" elrp statistics ---
3 packets transmitted, 0 received, ingress port (nil)

* JUB_020.24 # configure elrp-client one-shot "STS_2Mbps-DIA" ports all print-and-log

Starting ELRP Poll . . .
# NO LOOP DETECTED # --- vlan "STS_2Mbps-DIA" elrp statistics ---
3 packets transmitted, 0 received, ingress port (nil)



Regards,

Hasan
Userlevel 6
l2stats shows a count since the uptime on the switch or the last time someone cleared the counters (This information can't be found). Packets going to the CPU is normal. The problem is when you have a bunch of packets going to the CPU constantly.

I would recommend contacting GTAC moving forward considering there might need to be some diagnostic steps they can take that will require debug mode.
Userlevel 6
Hi Hasan,

Looking at the l2stats output, the count of packets learned has increased which is an indication of the too much learning in the switch. I still bet on the mac-movement happening in the switch or too much addition and deletion of the entries happening in the switch.

example: Total number of packets learned = 391638.

As requested before, collect show log counters fdb occurred. Check for the counters of FDB.macadd, FDB.macdel and FDB.macmove.

Utilise the link below to configure mac-tracking and to identify the mac learning in the switch.

https://gtacknowledge.extremenetworks.com/articles/How_To/How-to-configure-MAC-tracking-in-EXOS/

Regarding your question regarding the top, can you please share the output while pressing 1 and without pressing 1 for our understanding.

Hope this helps!!
Keep us updated.
Hi Prashanth,


thanks for your support

I have configured the log for MAC-add , MAC-del and MAC-mov in the switch and I found a lot of packet related for add and delete MACs



below the log of FDB counters:





JUB_020.22 # show log counters FDB occurred

Component SubComponent Condition Severity Occurred I Ntfd

----------- ------------ ----------------------- ------------- -------- - ----

FDB ArpDebugSummary Debug-Summary 14638 N 0

FDB FdbDebugSummary Debug-Summary 2067684 N 0

FDB FilterDebugSummary Debug-Summary 9 N 0

FDB MACTracking MACAdd Notice 51 Y 47

FDB MACTracking MACDel Notice 49 Y 46





I need to know we can prevent add and delete MACs in Vlans also I need to know a huge of adding and deleting MACs from Vlans will affect the switch





As your request in the Below the log of top command before press number 1 and during press number 1



Before press number 1



Mem: 247840K used, 6796K free, 0K shrd, 32804K buff, 73132K cached

CPU: 2.6% usr 1.9% sys 0.0% nic 94.3% idle 0.0% io 0.3% irq 0.6% sirq

Load average: 4.04 4.09 4.08 3/182 2301

PID PPID USER STAT RSS %MEM CPU %CPU COMMAND

1433 1 root S 3324 1.3 0 1.9 ./fdb

1405 1 root S < 11432 4.4 0 0.9 ./hal

1490 1 root S 2096 0.8 0 0.6 ./r.png

2301 2300 root R 780 0.3 0 0.6 top -d 3

1413 1 root S 3296 1.2 0 0.3 ./snmpMaster

1407 1 root S 1660 0.6 0 0.3 ./nodemgr

1261 2 root SW 0 0.0 0 0.3 [bcmRX]

1409 1 root S 31460 12.2 0 0.0 ./cliMaster

1415 1 root S 5460 2.1 0 0.0 ./snmpSubagent

1536 1 root S 4688 1.8 0 0.0 ./xmld

1561 1 root S 3964 1.5 0 0.0 ./idMgr

1514 1 root S 3920 1.5 0 0.0 ./etmon

1401 1 root S 3732 1.4 0 0.0 ./emsServer

1472 1 root S 3484 1.3 0 0.0 ./mcmgr

1423 1 root S 3336 1.3 0 0.0 ./vlan

1492 1 root S 3076 1.2 0 0.0 ./pim

1466 1 root S 2936 1.1 0 0.0 ./rtmgr update

1411 1 root S 2852 1.1 0 0.0 ./cfgmgr

1501 1 root S 2688 1.0 0 0.0 ./netTools

1499 1 root S 2640 1.0 0 0.0 ./acl

1486 1 root S 2640 1.0 0 0.0 ./ospfv3





during press number 1





Mem: 247840K used, 6796K free, 0K shrd, 32804K buff, 73132K cached

CPU: 0.0% usr 100% sys 0.0% nic 0.0% idle 0.0% io 0.0% irq 0.0% sirq

Load average: 4.18 4.14 4.10 3/182 2301

PID PPID USER STAT RSS %MEM CPU %CPU COMMAND

1433 1 root S 3324 1.3 0 30.4 ./fdb

2301 2300 root R 780 0.3 0 26.0 top -d 3

1405 1 root S < 11432 4.4 0 21.7 ./hal

1492 1 root S 3076 1.2 0 8.6 ./pim

1261 2 root RW 0 0.0 0 8.6 [bcmRX]

1472 1 root S 3484 1.3 0 4.3 ./mcmgr

1409 1 root S 31460 12.2 0 0.0 ./cliMaster

1415 1 root S 5460 2.1 0 0.0 ./snmpSubagent

1536 1 root S 4688 1.8 0 0.0 ./xmld

1561 1 root S 3964 1.5 0 0.0 ./idMgr

1514 1 root S 3920 1.5 0 0.0 ./etmon

1401 1 root S 3732 1.4 0 0.0 ./emsServer

1423 1 root S 3336 1.3 0 0.0 ./vlan

1413 1 root S 3296 1.2 0 0.0 ./snmpMaster

1466 1 root S 2936 1.1 0 0.0 ./rtmgr update

1411 1 root S 2852 1.1 0 0.0 ./cfgmgr

1501 1 root S 2688 1.0 0 0.0 ./netTools

1499 1 root S 2640 1.0 0 0.0 ./acl

1486 1 root S 2640 1.0 0 0.0 ./ospfv3

1056 1 root S 2624 1.0 0 0.0 /exos/bin/epm -t 40 -f /exos/confi

1484 1 root S 2576 1.0 0 0.0 ./ospf



Regards,

Hasan
Userlevel 6
Hi Hasan,

Thanks a lot for your effort in collecting the requested outputs.
I see that you have added the log counters. So, if you issue the command show log, you will be able to know what mac addresses are added and deleted along with the port numbers and VLAN information.
See if you can find any pattern of a specific port or any specific mac-addresses. that should help a bit.

If you are using STP or any L2 loop prevention protocol, please check if there are any frequent topology changes. This could result in the FDB flush and forced re-learning.

Regarding the impact, it is hard to say unless we know the network completely. However it is not recommended to have high CPU in a switch. So, it would be good to sort this out.

I think this would be the right time to open a GTAC case with all this information as Brandon and Patrick suggested!

Thanks!

Reply