Header Only - DO NOT REMOVE - Extreme Networks
Question

Virtual XOS killing ESX host CPU


After initial install and configuration of the vSummit, VM stays at around 60-100Mhz host CPU usage but after about 3min sitting idle, CPU utilization jumps to 3GHz+ and in many cases maxes out Host CPU resources.

Extreme VM performance becomes sluggish and in many cases CLI freezes.

Tested on multiple hosts in the Datacenter with the same outcome.

No other VMs on the hosts (Linux, Windows, Juniper, Fortinet, Cisco, Checkpoint, Avaya) produce the same issue.

Anyone experienced this behavior and if yes, is there a fix/workaround for it?

Specs:

Extreme vXOS ver 16.1.2.14 - same issue with earlier releases.
VMWare vSphere 5.5 with latest patches.
CPU: Xeon 3.2GHz E3-v1225 - 4 Cores

Thank you.

11 replies

What is using the CPU on the xos switch? Run "top" to find out. Post the top processes.
Mem: 221472K used, 32364K free, 0K shrd, 51284K buff, 56156K cachedCPU: 0.3% usr 0.6% sys 0.0% nic 14.9% idle 0.0% io 0.0% irq 84.1% sirq
Load average: 26.60 27.59 25.90 14/178 2034
PID PPID USER STAT RSS %MEM CPU %CPU COMMAND
1650 1 root S 1480 0.5 0 27.1 ./nodemgr
4 2 root RW< 0 0.0 0 11.5 [ksoftirqd/0]
1376 1 root S 2488 0.9 0 7.9 /exos/bin/epm -t 40 -f /exos/config/epmrc -d /exos/config/epmdprc
1664 1 root S 3096 1.2 0 6.9 ./fdb
1700 1 root S 2072 0.8 0 1.6 ./r.png
1762 1 root S 2460 0.9 0 0.6 ./isis
1688 1 root S 8668 3.4 0 0.3 ./dcbgp
2027 1 root S 8364 3.2 0 0.3 ./dcbgp -v 3
1678 1 root S 2096 0.8 0 0.3 ./esrp
1642 1 root S 1368 0.5 0 0.3 ./ds
1648 1 root S < 17312 6.8 0 0.0 ./hal
1652 1 root S 13288 5.2 0 0.0 ./cliMaster
2018 1 root S 8328 3.2 0 0.0 ./dcbgp -v 4
1708 1 root S 5444 2.1 0 0.0 ./netTools
1658 1 root S 4860 1.9 0 0.0 ./snmpSubagent
1827 1 root S 4664 1.8 0 0.0 ./policy
1737 1 root S 4636 1.8 0 0.0 ./xmld
1656 1 root S 3900 1.5 0 0.0 ./snmpMaster
1766 1 root S 3868 1.5 0 0.0 ./idMgr
1644 1 root S 3772 1.4 0 0.0 ./emsServer
1660 1 root S 3596 1.4 0 0.0 ./aaa -t random
1684 1 root S 3156 1.2 0 0.0 ./mcmgr
1682 1 root S 3080 1.2 0 0.0 ./rtmgr update
1662 1 root S 3016 1.1 0 0.0 ./vlan
1702 1 root S 2940 1.1 0 0.0 ./pim
2021 1 root S 2856 1.1 0 0.0 ./ospf -v 3
1692 1 root S 2820 1.1 0 0.0 ./ospf
2012 1 root S 2816 1.1 0 0.0 ./ospf -v 4
1706 1 root S 2804 1.1 0 0.0 ./acl
1722 1 root S 2552 1.0 0 0.0 ./etmon
1768 1 root S 2496 0.9 0 0.0 ./vmt
1694 1 root S 2480 0.9 0 0.0 ./ospfv3
1696 1 root S 2436 0.9 0 0.0 ./rip
1654 1 root S 2428 0.9 0 0.0 ./cfgmgr
1680 1 root S 2332 0.9 0 0.0 ./stp
1764 1 root S 2268 0.8 0 0.0 ./dot1ag
1788 1 root S 2264 0.8 0 0.0 ./erps
1676 1 root S 2196 0.8 0 0.0 ./eaps
1674 1 root S 2192 0.8 0 0.0 ./lacp
1710 1 root S 2176 0.8 0 0.0 ./netLogin
1770 1 root S 2104 0.8 0 0.0 ./vsm
1690 1 root S 2076 0.8 0 0.0 ./msdp
Userlevel 6
Are you sure there is no loop?
Userlevel 6
A loop would see process BCMrX high probably, but it could be a cause if you added some ports all connected to the same vSwitch and all in the same vlan in EXOS (vlan default for example).
Do you have added ports to the VM, how many ? Does EXOS see these ports ?
Userlevel 7
OscarK wrote:

A loop would see process BCMrX high probably, but it could be a cause if you added some ports all connected to the same vSwitch and all in the same vlan in EXOS (vlan default for example).
Do you have added ports to the VM, how many ? Does EXOS see these ports ?

The EXOS-VM doesn't have any BCM process 🙂
I just created a loop in a vSwitch on one of the EXOS VMs in the lab and saw mcmgr, fdb, and ksoftirqd/0 spike, but not nodemgr. As soon as I disabled one of the ports, all processes were back to normal.

Konstantin, if you can confirm there is no network loop, I might suggest re-creating the VM from scratch. Something may have not initialized properly.
Thank you all for the suggestions. I have created a diagram to better visualize one of the segments in the topology. Please see below. I also did further testing and noticed that when I remove vlan 10 which is the only vlan that spans all 3 switches in VR-Default, CPU utilization goes to normal.

Try this:

configure elrp-client one-shot vlan-10 ports all print
Starting ELRP Poll . . .# NO LOOP DETECTED # --- vlan "vlan10" elrp statistics ---
3 packets transmitted, 0 received, ingress port (nil)
Userlevel 7
What kind of traffic is on vlan-10 ?
OSPF and BGP protocols.

There is more - running ELRP on on vXOS1 and vXOS3 - NO LOOP ever,

On vXOS2:

Starting ELRP Poll . . .
# NO LOOP DETECTED # --- vlan "vlan10" elrp statistics ---
3 packets transmitted, 0 received, ingress port (nil)

2 sec later

Starting ELRP Poll
# LOOP DETECTED # --- vlan "vlan10" elrp statistics ---
1 packets transmitted, 1 received, ingress port 1
. . .

2 sec later:

Starting ELRP Poll
# LOOP DETECTED # --- vlan "vlan10" elrp statistics ---
1 packets transmitted, 1 received, ingress port 2
. . .
Userlevel 7
Konstantin Mikholap wrote:

OSPF and BGP protocols.

There is more - running ELRP on on vXOS1 and vXOS3 - NO LOOP ever,

On vXOS2:

Starting ELRP Poll . . .
# NO LOOP DETECTED # --- vlan "vlan10" elrp statistics ---
3 packets transmitted, 0 received, ingress port (nil)

2 sec later

Starting ELRP Poll
# LOOP DETECTED # --- vlan "vlan10" elrp statistics ---
1 packets transmitted, 1 received, ingress port 1
. . .

2 sec later:

Starting ELRP Poll
# LOOP DETECTED # --- vlan "vlan10" elrp statistics ---
1 packets transmitted, 1 received, ingress port 2
. . .

Double check your network adapter configs on the server and verify your VLAN config on the switch.
It may be that your tags aren't actually making it to the wire or to the vSwitch. Its been a while since I have used tagged ports on EXOS-VM, so I can't remember if there are any nuances to their function.

Reply