high CPU in X450 switch

  • 0
  • 1
  • Question
  • Updated 3 years ago
  • Answered

Hi everybody

 

we have X450 switch in our network  is facing  high CPU in the following process:

 

1-      ./hal

2-      ./FDB

3-       bcmRX

 

also when I ping to the IP of the switch I noted the bcmRX process increase  

 I need to Know if this normal?

Mem: 246820K used, 7816K free, 0K shrd, 32580K buff, 72532K cached

CPU:  0.0% usr  100% sys  0.0% nic  0.0% idle  0.0% io  0.0% irq  0.0% sirq

Load average: 4.11 4.11 4.08 3/182 2021

PID  PPID USER     STAT   RSS %MEM CPU %CPU COMMAND

1433     1 root     S     3316  1.3   0 26.3 ./fdb

1405     1 root     S <  11140  4.3   0 21.0 ./hal

1261     2 root     SW       0  0.0   0 21.0 [bcmRX]

2021  2020 root     R      780  0.3   0 15.7 top -d 3

1474     1 root     S     1920  0.7   0  5.2 ./bfd

1407     1 root     S     1596  0.6   0  5.2 ./nodemgr

1506  1505 root     S      764  0.3   0  5.2 ./telnetd -e

1409     1 root     S    31584 12.3   0  0.0 ./cliMaster

1415     1 root     S     5456  2.1   0  0.0 ./snmpSubagent

1536     1 root     S     4688  1.8   0  0.0 ./xmld

1561     1 root     S     3964  1.5   0  0.0 ./idMgr

1514     1 root     S     3920  1.5   0  0.0 ./etmon

1401     1 root     S     3732  1.4   0  0.0 ./emsServer

1472     1 root     S     3464  1.3   0  0.0 ./mcmgr

1413     1 root     S     3292  1.2   0  0.0 ./snmpMaster

1423     1 root     S     3288  1.2   0  0.0 ./vlan

1492     1 root     S     3076  1.2   0  0.0 ./pim

1466     1 root     S     2916  1.1   0  0.0 ./rtmgr update

1411     1 root     S     2848  1.1   0  0.0 ./cfgmgr

1501     1 root     S     2688  1.0   0  0.0 ./netTools

1499     1 root     S     2640  1.0   0  0.0 ./acl

regards,

Hasan

Photo of hasan issa

hasan issa

  • 154 Points 100 badge 2x thumb

Posted 3 years ago

  • 0
  • 1
Photo of Patrick Voss

Patrick Voss, Employee

  • 11,484 Points 10k badge 2x thumb
Its hard to tell just form this but you might have a loop on the network. I would suggest opening a case with GTAC to have this investigated further.
Photo of Prashanth KG

Prashanth KG, Employee

  • 5,300 Points 5k badge 2x thumb
Hi Hasan,
As Patrick stated, there is a possibility of a loop. 
the process FDB would consume high CPU when there is too much learning happening in the switch. This is possible at the time of mac-movement. 

Share the following output: 

show log counters fdb occurred. 
Photo of Brandon Clay

Brandon Clay, Escalation Support Engineer

  • 13,086 Points 10k badge 2x thumb
I agree with Patrick, it seems that there may be a loop. The article linked below explains how to determine if there is a loop, and which ports it is on.
https://gtacknowledge.extremenetworks.com/articles/Q_A/Which-commands-can-be-used-to-detect-a-loop

-Brandon
Photo of hasan issa

hasan issa

  • 154 Points 100 badge 2x thumb

Hi everybody

 

Thanks for your support ,

 

Just for your information when I typing the command “ top” without press number 1   I see the CUP is normal but with press number 1 I see the CPU is high can somebody tell me why ?

 Also I typed the command L2stat and i found 5 Vlans  are copy a lot of packets to CPU after that  I used ELRP to detect  any loop in those Vlans but there is no any loop there .

Bridge interface on VLAN MW_MGMT_3511:
Total number of packets to CPU = 4628.
Total number of packets learned = 882939.
Total number of IGMP control packets snooped = 11364.
Total number of IGMP data packets switched = 104.
Total number of MLD control packets snooped = 0.
Total number of MLD data packets switched = 0.

Bridge interface on VLAN MW_MGMT_501:
Total number of packets to CPU = 3482.
Total number of packets learned = 39523.
Total number of IGMP control packets snooped = 21646.
Total number of IGMP data packets switched = 1094.
Total number of MLD control packets snooped = 0.
Total number of MLD data packets switched = 0.

Bridge interface on VLAN KAEFER_Yanbu_L2VPN_HO:
Total number of packets to CPU = 149768.
Total number of packets learned = 391638.
Total number of IGMP control packets snooped = 23648.
Total number of IGMP data packets switched = 2854.
Total number of MLD control packets snooped = 0.
Total number of MLD data packets switched = 0.


Bridge interface on VLAN SGS_DIA:
Total number of packets to CPU = 3760.
Total number of packets learned = 2209.
Total number of IGMP control packets snooped = 0.
Total number of IGMP data packets switched = 0.
Total number of MLD control packets snooped = 0.
Total number of MLD data packets switched = 0.

Bridge interface on VLAN STS_2Mbps-DIA:
Total number of packets to CPU = 5499.
Total number of packets learned = 18835.
Total number of IGMP control packets snooped = 129.
Total number of IGMP data packets switched = 56.
Total number of MLD control packets snooped = 0.
Total number of MLD data packets switched = 0.

* JUB_020.21 # configure elrp-client one-shot "MW_MGMT_3511" ports all print-and-log

Starting ELRP Poll . . .
# NO LOOP DETECTED # --- vlan "MW_MGMT_3511" elrp statistics ---
3 packets transmitted, 0 received, ingress port (nil)

 * JUB_020.22 # configure elrp-client one-shot "MW_MGMT_501" ports all print-and-log

Starting ELRP Poll . . .
# NO LOOP DETECTED # --- vlan "MW_MGMT_501" elrp statistics ---
3 packets transmitted, 0 received, ingress port (nil)

 * JUB_020.20 #  configure elrp-client one-shot "KAEFER_Yanbu_L2VPN_HO" ports all print-and-log

Starting ELRP Poll . . .
# NO LOOP DETECTED # --- vlan "KAEFER_Yanbu_L2VPN_HO" elrp statistics ---
3 packets transmitted, 0 received, ingress port (nil)

* JUB_020.23 # configure elrp-client one-shot "SGS_DIA" ports all print-and-log

Starting ELRP Poll . . .
# NO LOOP DETECTED # --- vlan "SGS_DIA" elrp statistics ---
3 packets transmitted, 0 received, ingress port (nil)

* JUB_020.24 # configure elrp-client one-shot "STS_2Mbps-DIA" ports all print-and-log

Starting ELRP Poll . . .
# NO LOOP DETECTED # --- vlan "STS_2Mbps-DIA" elrp statistics ---
3 packets transmitted, 0 received, ingress port (nil)

 

Regards,

Hasan

 

 

 

 

 

Photo of Patrick Voss

Patrick Voss, Employee

  • 11,484 Points 10k badge 2x thumb
l2stats shows a count since the uptime on the switch or the last time someone cleared the counters (This information can't be found). Packets going to the CPU is normal. The problem is when you have a bunch of packets going to the CPU constantly.

I would recommend contacting GTAC moving forward considering there might need to be some diagnostic steps they can take that will require debug mode.
Photo of Prashanth KG

Prashanth KG, Employee

  • 5,300 Points 5k badge 2x thumb
Hi Hasan,

Looking at the l2stats output, the count of packets learned has increased which is an indication of the too much learning in the switch. I still bet on the mac-movement happening in the switch or too much addition and deletion of the entries  happening in the switch. 

example: Total number of packets learned = 391638. 

As requested before, collect show log counters fdb occurred. Check for the counters of FDB.macadd, FDB.macdel and FDB.macmove. 

Utilise the link below to configure mac-tracking and to identify the mac learning in the switch. 

https://gtacknowledge.extremenetworks.com/articles/How_To/How-to-configure-MAC-tracking-in-EXOS/

Regarding your question regarding the top, can you please share the output while pressing 1 and without pressing 1 for our understanding. 

Hope this helps!!
Keep us updated.
(Edited)
Photo of hasan issa

hasan issa

  • 154 Points 100 badge 2x thumb

Hi Prashanth,


thanks for your support

I have configured the log for MAC-add , MAC-del and MAC-mov in the switch and I found a lot of packet related for add and delete MACs

 

below the log of FDB counters:

 

 

JUB_020.22 # show log counters FDB occurred

Component   SubComponent Condition               Severity      Occurred I Ntfd

----------- ------------ ----------------------- ------------- -------- - ----

FDB                      ArpDebugSummary         Debug-Summary    14638 N    0

FDB                      FdbDebugSummary         Debug-Summary  2067684 N    0

FDB                      FilterDebugSummary      Debug-Summary        9 N    0

FDB         MACTracking  MACAdd                  Notice              51 Y   47

FDB         MACTracking  MACDel                  Notice              49 Y   46

 

 

I need to know  we can prevent add and delete MACs in Vlans  also I need to know a huge of  adding  and deleting  MACs from Vlans will affect the switch

 

 

As your request in the Below the log of top command before press number 1 and during  press number 1  

 

Before press number 1

 

 Mem: 247840K used, 6796K free, 0K shrd, 32804K buff, 73132K cached

CPU:  2.6% usr  1.9% sys  0.0% nic 94.3% idle  0.0% io  0.3% irq  0.6% sirq

Load average: 4.04 4.09 4.08 3/182 2301

  PID  PPID USER     STAT   RSS %MEM CPU %CPU COMMAND

 1433     1 root     S     3324  1.3   0  1.9 ./fdb

 1405     1 root     S <  11432  4.4   0  0.9 ./hal

 1490     1 root     S     2096  0.8   0  0.6 ./ripng

 2301  2300 root     R      780  0.3   0  0.6 top -d 3

 1413     1 root     S     3296  1.2   0  0.3 ./snmpMaster

 1407     1 root     S     1660  0.6   0  0.3 ./nodemgr

 1261     2 root     SW       0  0.0   0  0.3 [bcmRX]

 1409     1 root     S    31460 12.2   0  0.0 ./cliMaster

 1415     1 root     S     5460  2.1   0  0.0 ./snmpSubagent

 1536     1 root     S     4688  1.8   0  0.0 ./xmld

 1561     1 root     S     3964  1.5   0  0.0 ./idMgr

 1514     1 root     S     3920  1.5   0  0.0 ./etmon

 1401     1 root     S     3732  1.4   0  0.0 ./emsServer

 1472     1 root     S     3484  1.3   0  0.0 ./mcmgr

 1423     1 root     S     3336  1.3   0  0.0 ./vlan

 1492     1 root     S     3076  1.2   0  0.0 ./pim

 1466     1 root     S     2936  1.1   0  0.0 ./rtmgr update

 1411     1 root     S     2852  1.1   0  0.0 ./cfgmgr

 1501     1 root     S     2688  1.0   0  0.0 ./netTools

 1499     1 root     S     2640  1.0   0  0.0 ./acl

 1486     1 root     S     2640  1.0   0  0.0 ./ospfv3

 

 

during  press number 1  

 

 

Mem: 247840K used, 6796K free, 0K shrd, 32804K buff, 73132K cached

CPU:  0.0% usr  100% sys  0.0% nic  0.0% idle  0.0% io  0.0% irq  0.0% sirq

Load average: 4.18 4.14 4.10 3/182 2301

  PID  PPID USER     STAT   RSS %MEM CPU %CPU COMMAND

 1433     1 root     S     3324  1.3   0 30.4 ./fdb

 2301  2300 root     R      780  0.3   0 26.0 top -d 3

 1405     1 root     S <  11432  4.4   0 21.7 ./hal

 1492     1 root     S     3076  1.2   0  8.6 ./pim

 1261     2 root     RW       0  0.0   0  8.6 [bcmRX]

 1472     1 root     S     3484  1.3   0  4.3 ./mcmgr

 1409     1 root     S    31460 12.2   0  0.0 ./cliMaster

 1415     1 root     S     5460  2.1   0  0.0 ./snmpSubagent

 1536     1 root     S     4688  1.8   0  0.0 ./xmld

 1561     1 root     S     3964  1.5   0  0.0 ./idMgr

 1514     1 root     S     3920  1.5   0  0.0 ./etmon

 1401     1 root     S     3732  1.4   0  0.0 ./emsServer

 1423     1 root     S     3336  1.3   0  0.0 ./vlan

 1413     1 root     S     3296  1.2   0  0.0 ./snmpMaster

 1466     1 root     S     2936  1.1   0  0.0 ./rtmgr update

 1411     1 root     S     2852  1.1   0  0.0 ./cfgmgr

 1501     1 root     S     2688  1.0   0  0.0 ./netTools

 1499     1 root     S     2640  1.0   0  0.0 ./acl

 1486     1 root     S     2640  1.0   0  0.0 ./ospfv3

 1056     1 root     S     2624  1.0   0  0.0 /exos/bin/epm -t 40 -f /exos/confi

 1484     1 root     S     2576  1.0   0  0.0 ./ospf

 

Regards,

Hasan

Photo of Prashanth KG

Prashanth KG, Employee

  • 5,300 Points 5k badge 2x thumb
Hi Hasan,

Thanks a lot for your effort in collecting the requested outputs.
I see that you have added the log counters. So, if you issue the command show log, you will be able to know what mac addresses are added and deleted along with the port numbers and VLAN information. 
See if you can find any pattern of a specific port or any specific mac-addresses. that should help a bit. 

If you are using STP or any L2 loop prevention protocol, please check if there are any frequent topology changes. This could result in the FDB flush and forced re-learning. 

Regarding the impact, it is hard to say unless we know the network completely. However it is not recommended to have high CPU in a switch. So, it would be good to sort this out. 

I think this would be the right time to open a GTAC case with all this information as Brandon and Patrick suggested! 

Thanks!