Virtual XOS killing ESX host CPU

  • 0
  • 1
  • Problem
  • Updated 2 years ago
  • Not a Problem
After initial install and configuration of the vSummit, VM stays at around 60-100Mhz host CPU usage but after about 3min sitting idle, CPU utilization jumps to 3GHz+ and in many cases maxes out Host CPU resources.

Extreme VM performance becomes sluggish and in many cases CLI freezes.

Tested on multiple hosts in the Datacenter with the same outcome.

No other VMs on the hosts (Linux, Windows, Juniper, Fortinet, Cisco, Checkpoint, Avaya) produce the same issue.

Anyone experienced this behavior and if yes, is there a fix/workaround for it?

Specs:

Extreme vXOS ver 16.1.2.14 - same issue with earlier releases.
VMWare vSphere 5.5 with latest patches.
CPU: Xeon 3.2GHz E3-v1225 - 4 Cores

Thank you.
Photo of Konstantin Mikholap

Konstantin Mikholap

  • 110 Points 100 badge 2x thumb

Posted 2 years ago

  • 0
  • 1
Photo of Jeremy

Jeremy, Embassador

  • 9,788 Points 5k badge 2x thumb
What is using the CPU on the xos switch?  Run "top" to find out.  Post the top processes.
Photo of Konstantin Mikholap

Konstantin Mikholap

  • 110 Points 100 badge 2x thumb
Mem: 221472K used, 32364K free, 0K shrd, 51284K buff, 56156K cachedCPU:  0.3% usr  0.6% sys  0.0% nic 14.9% idle  0.0% io  0.0% irq 84.1% sirq
Load average: 26.60 27.59 25.90 14/178 2034
  PID  PPID USER     STAT   RSS %MEM CPU %CPU COMMAND
 1650     1 root     S     1480  0.5   0 27.1 ./nodemgr
    4     2 root     RW<      0  0.0   0 11.5 [ksoftirqd/0]
 1376     1 root     S     2488  0.9   0  7.9 /exos/bin/epm -t 40 -f /exos/config/epmrc -d /exos/config/epmdprc
 1664     1 root     S     3096  1.2   0  6.9 ./fdb
 1700     1 root     S     2072  0.8   0  1.6 ./ripng
 1762     1 root     S     2460  0.9   0  0.6 ./isis
 1688     1 root     S     8668  3.4   0  0.3 ./dcbgp
 2027     1 root     S     8364  3.2   0  0.3 ./dcbgp -v 3
 1678     1 root     S     2096  0.8   0  0.3 ./esrp
 1642     1 root     S     1368  0.5   0  0.3 ./ds
 1648     1 root     S <  17312  6.8   0  0.0 ./hal
 1652     1 root     S    13288  5.2   0  0.0 ./cliMaster
 2018     1 root     S     8328  3.2   0  0.0 ./dcbgp -v 4
 1708     1 root     S     5444  2.1   0  0.0 ./netTools
 1658     1 root     S     4860  1.9   0  0.0 ./snmpSubagent
 1827     1 root     S     4664  1.8   0  0.0 ./policy
 1737     1 root     S     4636  1.8   0  0.0 ./xmld
 1656     1 root     S     3900  1.5   0  0.0 ./snmpMaster
 1766     1 root     S     3868  1.5   0  0.0 ./idMgr
 1644     1 root     S     3772  1.4   0  0.0 ./emsServer
 1660     1 root     S     3596  1.4   0  0.0 ./aaa -t random
 1684     1 root     S     3156  1.2   0  0.0 ./mcmgr
 1682     1 root     S     3080  1.2   0  0.0 ./rtmgr update
 1662     1 root     S     3016  1.1   0  0.0 ./vlan
 1702     1 root     S     2940  1.1   0  0.0 ./pim
 2021     1 root     S     2856  1.1   0  0.0 ./ospf -v 3
 1692     1 root     S     2820  1.1   0  0.0 ./ospf
 2012     1 root     S     2816  1.1   0  0.0 ./ospf -v 4
 1706     1 root     S     2804  1.1   0  0.0 ./acl
 1722     1 root     S     2552  1.0   0  0.0 ./etmon
 1768     1 root     S     2496  0.9   0  0.0 ./vmt
 1694     1 root     S     2480  0.9   0  0.0 ./ospfv3
 1696     1 root     S     2436  0.9   0  0.0 ./rip
 1654     1 root     S     2428  0.9   0  0.0 ./cfgmgr
 1680     1 root     S     2332  0.9   0  0.0 ./stp
 1764     1 root     S     2268  0.8   0  0.0 ./dot1ag
 1788     1 root     S     2264  0.8   0  0.0 ./erps
 1676     1 root     S     2196  0.8   0  0.0 ./eaps
 1674     1 root     S     2192  0.8   0  0.0 ./lacp
 1710     1 root     S     2176  0.8   0  0.0 ./netLogin
 1770     1 root     S     2104  0.8   0  0.0 ./vsm
 1690     1 root     S     2076  0.8   0  0.0 ./msdp
Photo of Pala, Zdenek

Pala, Zdenek, Employee

  • 8,442 Points 5k badge 2x thumb
Are you sure there is no loop?
Photo of OscarK

OscarK, ESE

  • 7,702 Points 5k badge 2x thumb
A loop would see process BCMrX high probably, but it could be a cause if you added some ports all connected to the same vSwitch and all in the same vlan in EXOS (vlan default for example).
Do you have added ports to the VM, how many ? Does EXOS see these ports ?
(Edited)
Photo of Drew C.

Drew C., Community Manager

  • 37,350 Points 20k badge 2x thumb
The EXOS-VM doesn't have any BCM process :)
I just created a loop in a vSwitch on one of the EXOS VMs in the lab and saw mcmgr, fdb, and ksoftirqd/0 spike, but not nodemgr.  As soon as I disabled one of the ports, all processes were back to normal.

Konstantin, if you can confirm there is no network loop, I might suggest re-creating the VM from scratch.  Something may have not initialized properly.
Photo of Konstantin Mikholap

Konstantin Mikholap

  • 110 Points 100 badge 2x thumb
Thank you all for the suggestions. I have created a diagram to better visualize one of the segments in the topology. Please see below. I also did further testing and noticed that when I remove vlan 10 which is the only vlan that spans all 3 switches in VR-Default, CPU utilization goes to normal. 

Photo of Jeremy

Jeremy, Embassador

  • 9,788 Points 5k badge 2x thumb
Try this:

configure elrp-client one-shot vlan-10 ports all print 

Photo of Konstantin Mikholap

Konstantin Mikholap

  • 110 Points 100 badge 2x thumb
Starting ELRP Poll . . .# NO LOOP DETECTED # --- vlan "vlan10" elrp statistics ---
3 packets transmitted, 0 received, ingress port (nil)
Photo of Drew C.

Drew C., Community Manager

  • 37,350 Points 20k badge 2x thumb
What kind of traffic is on vlan-10 ?
Photo of Konstantin Mikholap

Konstantin Mikholap

  • 110 Points 100 badge 2x thumb
OSPF and BGP protocols.

There is more - running ELRP on on vXOS1 and vXOS3 - NO LOOP ever,  

On vXOS2:

Starting ELRP Poll . . .
# NO LOOP DETECTED # --- vlan "vlan10" elrp statistics ---
3 packets transmitted, 0 received, ingress port (nil)

2 sec later

Starting ELRP Poll
# LOOP DETECTED # --- vlan "vlan10" elrp statistics ---
1 packets transmitted, 1 received, ingress port 1
. . .

2 sec later:

Starting ELRP Poll
# LOOP DETECTED # --- vlan "vlan10" elrp statistics ---
1 packets transmitted, 1 received, ingress port 2
. . .
Photo of Drew C.

Drew C., Community Manager

  • 37,350 Points 20k badge 2x thumb
Double check your network adapter configs on the server and verify your VLAN config on the switch.
It may be that your tags aren't actually making it to the wire or to the vSwitch.  Its been a while since I have used tagged ports on EXOS-VM, so I can't remember if there are any nuances to their function.