I have a C5 that is giving me some grief. It is the core L3 for a medium sized network. There are 2 C5Gs and 1 C5K stacked.
Every so often when adding new hardware to the network the CPU goes nuts on the device and the only resolution is to randomly disconnect trunk ports to reset STP, essentially.
Today we added a new HP stack to the mix to act as an L2 for a VM network. This all went fine. The uplinks are trunked on both sides and we have a good link. I plugged in a VM server without issue. I then plugged in a simple DHCP device (APC PDU) and it completely brought down the network. CPU went to 95% and brought down pretty much all traffic. The process breakdown is below:
Total CPU Utilization:
Switch CPU 5 sec 1 min 5 min
-------------------------------------------------
3 1 95% 96% 96%
Switch:3 CPU:1
TID Name 5Sec 1Min 5Min
----------------------------------------------------------
3eb5430 tNet0 0.20% 0.17% 0.13%
3f53ea0 tXbdService 0.00% 0.08% 0.02%
4713b20 osapiTimer 2.20% 2.16% 2.13%
4a79ff0 bcmL2X.0 0.60% 0.53% 0.57%
4b26eb0 bcmCNTR.0 1.00% 0.94% 0.96%
4b9f490 bcmTX 1.00% 1.01% 1.19%
53b9f40 bcmRX 16.00% 15.57% 16.38%
54042f0 bcmATP-TX 25.60% 22.90% 23.34%
54097f0 bcmATP-RX 0.00% 0.08% 0.14%
59fb7f0 MAC Send Task 0.20% 0.20% 0.20%
5a0ccf0 MAC Age Task 0.20% 0.06% 0.05%
6e02f30 bcmLINK.0 0.40% 0.40% 0.40%
90e38d0 osapiMemMon 2.20% 2.47% 2.63%
91177f0 SysIdleTask 2.40% 1.64% 1.74%
920dce0 C5IntProc 0.00% 0.11% 0.07%
9dfe8b0 hapiRxTask 2.00% 1.81% 1.86%
9e33d40 tEmWeb 0.40% 0.32% 0.18%
b61e280 EDB BXS Req 0.00% 4.58% 2.32%
b763a90 SNMPTask 0.00% 1.30% 0.68%
b7ab5d0 RMONTask 0.00% 0.31% 1.24%
e2f2e30 dot1s_timer_task 1.00% 1.00% 1.00%
106fa4a0 fftpTask 0.00% 0.04% 0.01%
10793cc0 ipMapForwardingTask 42.60% 39.87% 40.37%
10c3a880 ARP Timer 0.20% 0.03% 0.00%
And this is what we saw in the logs. There was a topo change, but it had happened almost 2 hours before.
<166>Mar 1 07:21:45 10.10.1.1-3 DOT1S[238044000]: dot1s_ih.c(1485) 2257 %% Setting Port(130) instance(4095) State: DISCARDING<166>Mar 1 07:21:45 10.10.1.1-3 DOT1S[238044000]: dot1s_txrx.c(485) 2258 %% dot1sMstpTx(): CIST Role Disabled
<166>Mar 1 07:21:45 10.10.1.1-3 DOT1S[238044000]: dot1s_ih.c(1485) 2259 %% Setting Port(130) instance(0) State: DISCARDING
<166>Mar 1 07:21:45 10.10.1.1-3 DOT1S[238044000]: dot1s_sm.c(4253) 2260 %% Setting Port(130) Role: ROLE_DESIGNATED | STP Port(130) | Int Cost(2000) | Ext Cost(2000)
<166>Mar 1 07:21:47 10.10.1.1-3 DOT1S[238044000]: dot1s_ih.c(1360) 2261 %% Setting Port(130) instance(0) State: LEARNING
<166>Mar 1 07:21:47 10.10.1.1-3 DOT1S[238044000]: dot1s_ih.c(1424) 2262 %% Setting Port(130) instance(0) State: FORWARDING
<166>Mar 1 09:00:27 10.10.1.1-3 DOT1S[238044000]: dot1s_ih.c(1485) 2274 %% Setting Port(130) instance(0) State: DISCARDING
<166>Mar 1 09:03:31 10.10.1.1-3 DOT1S[238044000]: dot1s_txrx.c(485) 2277 %% dot1sMstpTx(): CIST Role Disabled
<166>Mar 1 09:03:31 10.10.1.1-3 DOT1S[238044000]: dot1s_ih.c(1485) 2278 %% Setting Port(123) instance(0) State: DISCARDING
<166>Mar 1 09:03:31 10.10.1.1-3 DOT1S[238044000]: dot1s_txrx.c(485) 2279 %% dot1sMstpTx(): CIST Role Disabled
<166>Mar 1 09:03:31 10.10.1.1-3 DOT1S[238044000]: dot1s_ih.c(1485) 2280 %% Setting Port(124) instance(0) State: DISCARDING
<166>Mar 1 09:03:31 10.10.1.1-3 DOT1S[238044000]: dot1s_ih.c(1346) 2281 %% Setting Port(445) instance(4095) State: DISABLED
<166>Mar 1 09:03:31 10.10.1.1-3 DOT1S[238044000]: dot1s_ih.c(1346) 2282 %% Setting Port(446) instance(4095) State: DISABLED
<166>Mar 1 09:03:40 10.10.1.1-3 DOT1S[238044000]: dot1s_ih.c(1485) 2283 %% Setting Port(123) instance(4095) State: DISCARDING
<166>Mar 1 09:03:40 10.10.1.1-3 DOT1S[238044000]: dot1s_txrx.c(485) 2284 %% dot1sMstpTx(): CIST Role Disabled
<166>Mar 1 09:03:40 10.10.1.1-3 DOT1S[238044000]: dot1s_ih.c(1485) 2285 %% Setting Port(123) instance(0) State: DISCARDING
<166>Mar 1 09:03:40 10.10.1.1-3 DOT1S[238044000]: dot1s_sm.c(4253) 2286 %% Setting Port(123) Role: ROLE_DESIGNATED | STP Port(123) | Int Cost(20000) | Ext Cost(20000)
<166>Mar 1 09:03:40 10.10.1.1-3 DOT1S[238044000]: dot1s_ih.c(1360) 2287 %% Setting Port(123) instance(0) State: LEARNING
<166>Mar 1 09:03:40 10.10.1.1-3 DOT1S[238044000]: dot1s_ih.c(1424) 2288 %% Setting Port(123) instance(0) State: FORWARDING
<166>Mar 1 09:03:40 10.10.1.1-3 DOT1S[238044000]: dot1s_ih.c(1503) 2289 %% Setting Port(445) instance(4095) State: MANUAL_FORWARDING
<166>Mar 1 09:03:40 10.10.1.1-3 DOT1S[238044000]: dot1s_ih.c(1503) 2290 %% Setting Port(446) instance(4095) State: MANUAL_FORWARDING
<166>Mar 1 09:03:40 10.10.1.1-3 DOT1S[238044000]: dot1s_ih.c(1485) 2291 %% Setting Port(124) instance(4095) State: DISCARDING
<166>Mar 1 09:03:40 10.10.1.1-3 DOT1S[238044000]: dot1s_txrx.c(485) 2292 %% dot1sMstpTx(): CIST Role Disabled
<166>Mar 1 09:03:40 10.10.1.1-3 DOT1S[238044000]: dot1s_ih.c(1485) 2293 %% Setting Port(124) instance(0) State: DISCARDING
<166>Mar 1 09:03:40 10.10.1.1-3 DOT1S[238044000]: dot1s_sm.c(4253) 2294 %% Setting Port(124) Role: ROLE_DESIGNATED | STP Port(124) | Int Cost(20000) | Ext Cost(20000)
<166>Mar 1 09:03:40 10.10.1.1-3 DOT1S[238044000]: dot1s_ih.c(1360) 2295 %% Setting Port(124) instance(0) State: LEARNING
<166>Mar 1 09:03:40 10.10.1.1-3 DOT1S[238044000]: dot1s_ih.c(1424) 2296 %% Setting Port(124) instance(0) State: FORWARDING
<166>Mar 1 09:04:52 10.10.1.1-3 DOT1S[238044000]: dot1s_ih.c(1485) 2297 %% Setting Port(130) instance(4095) State: DISCARDING
<166>Mar 1 09:04:52 10.10.1.1-3 DOT1S[238044000]: dot1s_txrx.c(485) 2298 %% dot1sMstpTx(): CIST Role Disabled
<166>Mar 1 09:04:52 10.10.1.1-3 DOT1S[238044000]: dot1s_ih.c(1485) 2299 %% Setting Port(130) instance(0) State: DISCARDING
<166>Mar 1 09:04:52 10.10.1.1-3 DOT1S[238044000]: dot1s_sm.c(4253) 2300 %% Setting Port(130) Role: ROLE_DESIGNATED | STP Port(130) | Int Cost(2000) | Ext Cost(2000)
<166>Mar 1 09:04:54 10.10.1.1-3 DOT1S[238044000]: dot1s_ih.c(1360) 2301 %% Setting Port(130) instance(0) State: LEARNING
<166>Mar 1 09:04:54 10.10.1.1-3 DOT1S[238044000]: dot1s_ih.c(1424) 2302 %% Setting Port(130) instance(0) State: FORWARDING
We had this happen in the past with a single Netgear junk switch that someone had plugged in at their desk. It turned out that it would cause the same symptoms every day when the person would boot it up. There was no loop anywhere on the switch at that time.
Any ideas on where to begin?