cancel
Showing results for 
Search instead for 
Did you mean: 

C5 high CPU utilization - ipMapForwardingTask

C5 high CPU utilization - ipMapForwardingTask

Jason_Wisniewsk
New Contributor
I have a C5 that is giving me some grief. It is the core L3 for a medium sized network. There are 2 C5Gs and 1 C5K stacked.

Every so often when adding new hardware to the network the CPU goes nuts on the device and the only resolution is to randomly disconnect trunk ports to reset STP, essentially.

Today we added a new HP stack to the mix to act as an L2 for a VM network. This all went fine. The uplinks are trunked on both sides and we have a good link. I plugged in a VM server without issue. I then plugged in a simple DHCP device (APC PDU) and it completely brought down the network. CPU went to 95% and brought down pretty much all traffic. The process breakdown is below:

Total CPU Utilization:
Switch CPU 5 sec 1 min 5 min
-------------------------------------------------
3 1 95% 96% 96%

Switch:3 CPU:1

TID Name 5Sec 1Min 5Min
----------------------------------------------------------
3eb5430 tNet0 0.20% 0.17% 0.13%
3f53ea0 tXbdService 0.00% 0.08% 0.02%
4713b20 osapiTimer 2.20% 2.16% 2.13%
4a79ff0 bcmL2X.0 0.60% 0.53% 0.57%
4b26eb0 bcmCNTR.0 1.00% 0.94% 0.96%
4b9f490 bcmTX 1.00% 1.01% 1.19%
53b9f40 bcmRX 16.00% 15.57% 16.38%
54042f0 bcmATP-TX 25.60% 22.90% 23.34%
54097f0 bcmATP-RX 0.00% 0.08% 0.14%
59fb7f0 MAC Send Task 0.20% 0.20% 0.20%
5a0ccf0 MAC Age Task 0.20% 0.06% 0.05%
6e02f30 bcmLINK.0 0.40% 0.40% 0.40%
90e38d0 osapiMemMon 2.20% 2.47% 2.63%
91177f0 SysIdleTask 2.40% 1.64% 1.74%
920dce0 C5IntProc 0.00% 0.11% 0.07%
9dfe8b0 hapiRxTask 2.00% 1.81% 1.86%
9e33d40 tEmWeb 0.40% 0.32% 0.18%
b61e280 EDB BXS Req 0.00% 4.58% 2.32%
b763a90 SNMPTask 0.00% 1.30% 0.68%
b7ab5d0 RMONTask 0.00% 0.31% 1.24%
e2f2e30 dot1s_timer_task 1.00% 1.00% 1.00%
106fa4a0 fftpTask 0.00% 0.04% 0.01%
10793cc0 ipMapForwardingTask 42.60% 39.87% 40.37%
10c3a880 ARP Timer 0.20% 0.03% 0.00%

And this is what we saw in the logs. There was a topo change, but it had happened almost 2 hours before.

<166>Mar 1 07:21:45 10.10.1.1-3 DOT1S[238044000]: dot1s_ih.c(1485) 2257 %% Setting Port(130) instance(4095) State: DISCARDING<166>Mar 1 07:21:45 10.10.1.1-3 DOT1S[238044000]: dot1s_txrx.c(485) 2258 %% dot1sMstpTx(): CIST Role Disabled
<166>Mar 1 07:21:45 10.10.1.1-3 DOT1S[238044000]: dot1s_ih.c(1485) 2259 %% Setting Port(130) instance(0) State: DISCARDING
<166>Mar 1 07:21:45 10.10.1.1-3 DOT1S[238044000]: dot1s_sm.c(4253) 2260 %% Setting Port(130) Role: ROLE_DESIGNATED | STP Port(130) | Int Cost(2000) | Ext Cost(2000)
<166>Mar 1 07:21:47 10.10.1.1-3 DOT1S[238044000]: dot1s_ih.c(1360) 2261 %% Setting Port(130) instance(0) State: LEARNING
<166>Mar 1 07:21:47 10.10.1.1-3 DOT1S[238044000]: dot1s_ih.c(1424) 2262 %% Setting Port(130) instance(0) State: FORWARDING
<166>Mar 1 09:00:27 10.10.1.1-3 DOT1S[238044000]: dot1s_ih.c(1485) 2274 %% Setting Port(130) instance(0) State: DISCARDING
<166>Mar 1 09:03:31 10.10.1.1-3 DOT1S[238044000]: dot1s_txrx.c(485) 2277 %% dot1sMstpTx(): CIST Role Disabled
<166>Mar 1 09:03:31 10.10.1.1-3 DOT1S[238044000]: dot1s_ih.c(1485) 2278 %% Setting Port(123) instance(0) State: DISCARDING
<166>Mar 1 09:03:31 10.10.1.1-3 DOT1S[238044000]: dot1s_txrx.c(485) 2279 %% dot1sMstpTx(): CIST Role Disabled
<166>Mar 1 09:03:31 10.10.1.1-3 DOT1S[238044000]: dot1s_ih.c(1485) 2280 %% Setting Port(124) instance(0) State: DISCARDING
<166>Mar 1 09:03:31 10.10.1.1-3 DOT1S[238044000]: dot1s_ih.c(1346) 2281 %% Setting Port(445) instance(4095) State: DISABLED
<166>Mar 1 09:03:31 10.10.1.1-3 DOT1S[238044000]: dot1s_ih.c(1346) 2282 %% Setting Port(446) instance(4095) State: DISABLED
<166>Mar 1 09:03:40 10.10.1.1-3 DOT1S[238044000]: dot1s_ih.c(1485) 2283 %% Setting Port(123) instance(4095) State: DISCARDING
<166>Mar 1 09:03:40 10.10.1.1-3 DOT1S[238044000]: dot1s_txrx.c(485) 2284 %% dot1sMstpTx(): CIST Role Disabled
<166>Mar 1 09:03:40 10.10.1.1-3 DOT1S[238044000]: dot1s_ih.c(1485) 2285 %% Setting Port(123) instance(0) State: DISCARDING
<166>Mar 1 09:03:40 10.10.1.1-3 DOT1S[238044000]: dot1s_sm.c(4253) 2286 %% Setting Port(123) Role: ROLE_DESIGNATED | STP Port(123) | Int Cost(20000) | Ext Cost(20000)
<166>Mar 1 09:03:40 10.10.1.1-3 DOT1S[238044000]: dot1s_ih.c(1360) 2287 %% Setting Port(123) instance(0) State: LEARNING
<166>Mar 1 09:03:40 10.10.1.1-3 DOT1S[238044000]: dot1s_ih.c(1424) 2288 %% Setting Port(123) instance(0) State: FORWARDING
<166>Mar 1 09:03:40 10.10.1.1-3 DOT1S[238044000]: dot1s_ih.c(1503) 2289 %% Setting Port(445) instance(4095) State: MANUAL_FORWARDING
<166>Mar 1 09:03:40 10.10.1.1-3 DOT1S[238044000]: dot1s_ih.c(1503) 2290 %% Setting Port(446) instance(4095) State: MANUAL_FORWARDING
<166>Mar 1 09:03:40 10.10.1.1-3 DOT1S[238044000]: dot1s_ih.c(1485) 2291 %% Setting Port(124) instance(4095) State: DISCARDING
<166>Mar 1 09:03:40 10.10.1.1-3 DOT1S[238044000]: dot1s_txrx.c(485) 2292 %% dot1sMstpTx(): CIST Role Disabled
<166>Mar 1 09:03:40 10.10.1.1-3 DOT1S[238044000]: dot1s_ih.c(1485) 2293 %% Setting Port(124) instance(0) State: DISCARDING
<166>Mar 1 09:03:40 10.10.1.1-3 DOT1S[238044000]: dot1s_sm.c(4253) 2294 %% Setting Port(124) Role: ROLE_DESIGNATED | STP Port(124) | Int Cost(20000) | Ext Cost(20000)
<166>Mar 1 09:03:40 10.10.1.1-3 DOT1S[238044000]: dot1s_ih.c(1360) 2295 %% Setting Port(124) instance(0) State: LEARNING
<166>Mar 1 09:03:40 10.10.1.1-3 DOT1S[238044000]: dot1s_ih.c(1424) 2296 %% Setting Port(124) instance(0) State: FORWARDING
<166>Mar 1 09:04:52 10.10.1.1-3 DOT1S[238044000]: dot1s_ih.c(1485) 2297 %% Setting Port(130) instance(4095) State: DISCARDING
<166>Mar 1 09:04:52 10.10.1.1-3 DOT1S[238044000]: dot1s_txrx.c(485) 2298 %% dot1sMstpTx(): CIST Role Disabled
<166>Mar 1 09:04:52 10.10.1.1-3 DOT1S[238044000]: dot1s_ih.c(1485) 2299 %% Setting Port(130) instance(0) State: DISCARDING
<166>Mar 1 09:04:52 10.10.1.1-3 DOT1S[238044000]: dot1s_sm.c(4253) 2300 %% Setting Port(130) Role: ROLE_DESIGNATED | STP Port(130) | Int Cost(2000) | Ext Cost(2000)
<166>Mar 1 09:04:54 10.10.1.1-3 DOT1S[238044000]: dot1s_ih.c(1360) 2301 %% Setting Port(130) instance(0) State: LEARNING
<166>Mar 1 09:04:54 10.10.1.1-3 DOT1S[238044000]: dot1s_ih.c(1424) 2302 %% Setting Port(130) instance(0) State: FORWARDING

We had this happen in the past with a single Netgear junk switch that someone had plugged in at their desk. It turned out that it would cause the same symptoms every day when the person would boot it up. There was no loop anywhere on the switch at that time.

Any ideas on where to begin?
26 REPLIES 26

Jeremy_Gibbs
Contributor
Hmm, I was thinking it could be a routing loop. This is just what I do, but any routes that I don't have routes for, I create a generic black hole route.

Also, you can 'set dos-control ....' and maybe get some idea of what is going on, surely the C will identify this as a DOS style event.. Usually it logs the events.

Jason_Wisniewsk
New Contributor
Hi Jeremy-

Here are the outputs, note that 10.10.3.100 is my fw cluster and 10.10.11.1 is the L3 with the CPU issue. I did a few just for a complete overview. None of these IPs fall anywhere in my subnets.

C:\Users\jwisniewski>tracert -d 10.3.1.1
Tracing route to 10.3.1.1 over a maximum of 30 hops

1 1 ms 1 ms 2 ms 10.10.11.1
2 <1 ms <1 ms <1 ms 10.10.3.100
3 1 ms 1 ms 2 ms 74.126.23.89
4 2 ms 2 ms 2 ms 216.234.118.33
5 2 ms 2 ms 2 ms 216.234.96.1
6 ^C
C:\Users\jwisniewski>ping 10.244.244.244

Pinging 10.244.244.244 with 32 bytes of data:
Control-C
^C

C:\Users\jwisniewski>tracert -d 10.244.244.244

Tracing route to 10.244.244.244 over a maximum of 30 hops

1 2 ms 1 ms 1 ms 10.10.11.1
2 1 ms <1 ms <1 ms 74.126.4.9
3 2 ms 1 ms * 74.126.23.89
4 3 ms 3 ms 3 ms 216.234.118.33
5 2 ms 2 ms 2 ms 216.234.96.1
6 * ^C
C:\Users\jwisniewski>tracert -d 10.230.24.32

Tracing route to 10.230.24.32 over a maximum of 30 hops

1 2 ms 1 ms 1 ms 10.10.11.1
2 <1 ms <1 ms <1 ms 10.10.3.100
3 1 ms 2 ms 1 ms 74.126.23.89
4 2 ms 2 ms 2 ms 216.234.118.33
5 3 ms 2 ms 2 ms 216.234.96.1
6 ^C
C:\Users\jwisniewski>

Also some further info. We had another power outage that brought down the building after hours. This did not cause a problem, at all which has been confirmed by my SNMP monitoring tool. This leads me to two potential thoughts:

1. The device isn't online after hours
2. The network is so calm afterhours that whatever traffic tries to pass manages to do so without issue.

Jason_Wisniewsk
New Contributor
Hi Mike-

There is no such thing as diminishing returns when it comes to problem solving, any input is MUCH appreciated. I hope that this discussion gets indexed and assists others in the future. Finding problems that appear impossible are no fun.

Jeremy_Gibbs
Contributor
Try this, from a computer on your LAN, run a traceroute to an IP address that is RFC1918(private) that doesn't exist.

I know it sounds weird, but just give it a try and let me know the results. I am just going on a hunch here.

Edit: When I say "IP that doesn't exist", I mean, pick an IP that is in a subnet that you aren't using and shouldn't technically have a L3 interface for.

Mike_D
Extreme Employee
Hi Jason,
you're right the command for egress is set vlan egress (tag options)

As far as the trace goes, I definitely think you should follow your gut here. If you dont think the IDFs are introducing reflection or flood behavior, I'm in no position to fault your reasoning.
As net admin your time should be used to its best purpose, right now that's the shortest path to discovering root cause and a solution; intuition is often a big part of that.

Looks like my input on the thread has reached the point of diminishing returns.
Worst case, my posts may keep others from adding fresh perspectives - so I'll pipe down.
You will get to the bottom of this if you keep at it, I predict sooner rather than later. Good luck,

Mike
GTM-P2G8KFN