x770 EXOS 21.1.2.14 memory utilization - what is normal and how to interpret output of mem-stats.py


My customer is complaining of memory leak on x770 running EXOS 21.1.2.14 patch 1-2. I see no known issues documented in TAC for this version and also refer to https://gtacknowledge.extremenetworks.com/articles/How_To/Troubleshooting-memory-issue-s?q=exos+memory+leak+21.1.2.14&l=en_US&fs=Search&pn=3 to help me troubleshoot.

Based upon these outputs that were recently collected (see below), my questions are:

1. Can anybody guide me on what is "normal" memory utilization for this switch and EXOS version?

2. If I interpret the results correctly, I believe the script mem-stats.py is telling me that roughly 145MB of memory has been consumed during the 1 month sample period and that xmld is responsible for most of that (135MB).

3. If I'm correct on point #2, is this normal and correct behavior or is this indeed pointing to a leak?

Thanks,

Brad

System Memory Information
-------------------------
Total DRAM (KB): 1048576
System (KB): 27080
User (KB): 524572
Free (KB): 496924

Memory Utilization Statistics
-----------------------------

Process Name Memory (KB)
-----------------------------
aaa 2059
acl 1916
bfd 959
bgp 0
brm 862
cfgmgr 3031
cli 22368
devmgr 768
dirser 418
dosprotect 375
dot1ag 1168
eaps 1129
edp 980
elrp 916
elsm 922
ems 2309
epm 1408
erps 1176
esrp 1051
ethoam 913
etmon 6593
exacl 0
exdhcpsnoop 0
exdos 0
exfib 0
exfipSnoop 0
exosmc 0
exosq 0
exsflow 0
exsnoop 0
exsshd 4715
exvlan 0
fcoe 961
fdb 2295
gptp 0
hal 19964
hclag 929
idMgr 2793
ipSecurity 1049
ipfix 952
isis 0
ismb 8095
lacp 1399
lldp 1174
mcmgr 2202
mpls 0
mrp 1095
msdp 603
netLogin 1187
netTools 4138
nettx 0
nodemgr 460
ntp 609
openflow 0
ospf 1106
ospfv3 4514
otm 1010
pim 1791
polMgr 420
policy 6942
ptpV2 0
pwmib 365
rip 830
r.png 625
rtmgr 2586
snmpMaster 2598
snmpSubagent 3590
stp 1301
techSupport 580
telnetd 830
tftpd 290
throw 5802
thttpd 8813
twamp 401
upm 1067
vlan 2775
vmt 1381
vrrp 1173
vsm 1334
xmlc 735
xmld 143221

TRAFFIC_SWA_X770.2 # run script mem-stats.py
##[ Baseline ]##########################################################
- Baseline found : /usr/local/cfg/mem-stats.pckl
- Current time : 2017-03-23 06:50:30.497816
- Baseline time : 2017-02-23 02:47:25.890948
- Min since baseline : 40563.0

##[ Memory overview ]###################################################
[ COUNTER ] [ MEM (KB) ] [ BASELINE ]

MemFree 217028 -145508
MemFreeAct 495084 -144476
MemTotal 1021496 0
Slab 91580 1508

##[ Processes ]#########################################################
--[ Sorted by memory allocation ]---------------------------------------
[ PID ] [ PROCESS ] [ MEM (KB) ] [ BASELINE ]

2000 xmld 144688 135376
1874 cliMaster 24620 19500
1870 hal 22052 -7172
2062 expy 12780 10256
2004 expy 9488 7288
2058 policy 8876 -3412
1972 netTools 6044 3788
1880 snmpSubagent 5032 608
2016 idMgr 4736 44
1876 cfgmgr 4528 -476

--[ Sorted by increased memory allocation ]-----------------------------
[ PID ] [ PROCESS ] [ MEM (KB) ] [ BASELINE ]

2000 xmld 144688 135376
1874 cliMaster 24620 19500
2062 expy 12780 10256
2004 expy 9488 7288
1972 netTools 6044 3788
1984 vrrp 2880 2388
1886 vlan 4384 1992
1930 rtmgr 4208 1796
1969 acl 3760 1064
1954 r.png 2228 692

##[ Kernel slab cache ]#################################################
--[ Sorted by slabs ]---------------------------------------------------
[ NAME ] [ SLABS ] [ SIZE (KB) ] [ BASELINE ]

buffer_head 4317 17268 21
jffs2_refblock 1212 4848 12
size-4096 1081 4324 16
dentry 539 2156 -20
jffs2_node_frag 396 1584 1
size-1024 393 1572 -20
radix_tree_node 354 1416 1
UNIX 315 1260 -4
sock_inode_cache 309 1236 -3
size-16384 299 4784 -8

--[ Sorted by cache size ]----------------------------------------------
[ NAME ] [ SLABS ] [ SIZE (KB) ] [ BASELINE ]

buffer_head 4317 17268 21
size-65536 173 11072 0
size-524288 21 10752 0
size-262144 32 8192 0
jffs2_refblock 1212 4848 12
size-16384 299 4784 -8
size-4096 1081 4324 16
size-131072 28 3584 0
dentry 539 2156 -20
jffs2_node_frag 396 1584 1

--[ Sorted by increased slabs ]-----------------------------------------
[ NAME ] [ SLABS ] [ SIZE (KB) ] [ BASELINE ]

FldPatFld 221 884 112
ip_dst_cache 114 456 108
size-512 196 784 72
FldActAct 67 268 36
arp_cache &nb

1 reply

Userlevel 6
Sorry there hasn't been a reply yet. Below are the answers to your questions.

1. What is "normal" memory utilization?
  • It depends on what the switch is doing. I would jsut make sure the memory is being released at slower times in your network, and now always increasing.
2. 145MB of more memory has been consumed during the 1 month sample period?
  • It increased by 135MB in one month. This is a problem unless the baseline was taken before it was in production.
3. Is this normal or is this indeed pointing to a leak?
Hope this helps.

Reply