cancel
Showing results for 
Search instead for 
Did you mean: 

X460G2 (Stack) - stack node crash

X460G2 (Stack) - stack node crash

Admin_ZML
New Contributor II
Today we had an crash of 1 node in a 2 node X460G2-48p-10G4 stacking configuration. things began to become unresponsive. After checking the chalet gui and checking true serial port i saw node 1 unresponsive.

The error was with extremexos 15.7.1.4 and now i have already have installed 15.7.2.9

Event logs:

2015-09-07 19:24:04.46 Slot-1: Epm application wdg timer warning - 111 sec, kepc 0xffffffff805fa5f4(__cond_resched+0x20/0x44) uepc 0x2acdb150.2015-09-07 19:24:03.48 Slot-1: 2acdb164 00000000 nop
2015-09-07 19:24:03.48 Slot-1: 2acdb160 8f8393ac lw v1,-27732(gp)
2015-09-07 19:24:03.48 Slot-1: 2acdb158 7c03e83b Unknown at 0x2acdb158, 0x7c03e83b, op 31
2015-09-07 19:24:03.48 Slot-1: 2acdb154 00408021 addu s0,v0,zero
2015-09-07 19:24:03.48 Slot-1: 2acdb150 <10e00008>beq a3,zero,0x2acdb174
2015-09-07 19:24:03.48 Slot-1: 2acdb14c 0000000c syscall 0
2015-09-07 19:24:03.48 Slot-1: 2acdb15c 00601021 addu v0,v1,zero
2015-09-07 19:24:03.42 Slot-1: 2acdb148 24020fa7
2015-09-07 19:24:03.42 Slot-1: 2acdb144 02003021 addu a2,s0,zero
2015-09-07 19:24:03.42 Slot-1: Code:
2015-09-07 19:24:03.42 Slot-1:
2015-09-07 19:24:03.42 Slot-1: Process epm pid 1141 died with signal 6
2015-09-07 19:24:03.42 Slot-1: Application watchdog killing process 1141(epm) in state 1.
2015-09-07 19:24:03.41 Slot-1: App timer for index 0 app: (epm) expired, delta 12031 timeout: 120000
2015-09-07 19:23:53.70 Slot-1: Epm application wdg timer warning - 111 sec, kepc 0xffffffff802dcca8(do_wait+0x2d0/0x478) uepc 0x2acdb150.
2015-09-07 19:23:43.62 Slot-1: Epm application wdg timer warning - 101 sec, kepc 0xffffffff802dcca8(do_wait+0x2d0/0x478) uepc 0x2acdb150.
2015-09-07 19:23:33.51 Slot-1: Epm application wdg timer warning - 90 sec, kepc 0xffffffff802dcca8(do_wait+0x2d0/0x478) uepc 0x2acdb150.
2015-09-07 19:23:23.36 Slot-1: Epm application wdg timer warning - 80 sec, kepc 0xffffffff802dcca8(do_wait+0x2d0/0x478) uepc 0x2acdb150.
2015-09-07 19:23:13.56 Slot-1: Epm application wdg timer warning - 70 sec, kepc 0xffffffff802dcca8(do_wait+0x2d0/0x478) uepc 0x2acdb150.
2015-09-07 19:23:03.34 Slot-1: Epm application wdg timer warning - 60 sec, kepc 0xffffffff802dcca8(do_wait+0x2d0/0x478) uepc 0x2acdb150.
2015-09-07 19:22:53.04 Slot-1: Epm application wdg timer warning - 50 sec, kepc 0xffffffff802dcca8(do_wait+0x2d0/0x478) uepc 0x2acdb150.
2015-09-07 19:22:42.90 Slot-1: Epm application wdg timer warning - 40 sec, kepc 0xffffffff802dcca8(do_wait+0x2d0/0x478) uepc 0x2acdb150.
2015-09-07 19:22:32.83 Slot-1: Epm application wdg timer warning - 30 sec, kepc 0xffffffff802dcca8(do_wait+0x2d0/0x478) uepc 0x2acdb150.
2015-09-07 19:22:22.68 Slot-1: Epm application wdg timer warning - 20 sec, kepc 0xffffffff802dcca8(do_wait+0x2d0/0x478) uepc 0x2acdb150.
2015-09-07 19:22:02.36 Slot-1: CPU utilization monitor: process epm consumes 99 % CPU
2015-09-07 19:21:57.60 Slot-1: Epm application wdg timer warning - 60 sec, kepc 0xffffffff805fee1c(schedule_timeout+0x64/0xe0) uepc 0x2aaec2e8.
2015-09-07 19:21:47.48 Slot-1: Epm application wdg timer warning - 50 sec, kepc 0xffffffff805fee1c(schedule_timeout+0x64/0xe0) uepc 0x2aaec2e8.
2015-09-07 19:21:37.35 Slot-1: Epm application wdg timer warning - 40 sec, kepc 0xffffffff805fee1c(schedule_timeout+0x64/0xe0) uepc 0x2aaec2e8.
2015-09-07 19:21:27.22 Slot-1: Epm application wdg timer warning - 30 sec, kepc 0xffffffff805fee1c(schedule_timeout+0x64/0xe0) uepc 0x2aaec2e8.
2015-09-07 19:21:17.09 Slot-1: Epm application wdg timer warning - 20 sec, kepc 0xffffffff805fee1c(schedule_timeout+0x64/0xe0) uepc 0x2aaec2e8.
2015-09-07 19:20:53.76 Slot-1: Process mrp sends hello too often, expected once in 5 secs
2015-09-07 19:20:53.76 Slot-1: Received hellos from process mrp 2 more often then expected 3
2015-09-07 19:20:53.76 Slot-1: Process elsm sends hello too often, expected once in 10 secs
2015-09-07 19:20:53.76 Slot-1: Received hellos from process elsm 2 more often then expected 3
2015-09-07 19:20:53.76 Slot-1: Process mcmgr sends hello too often, expected once in 10 secs
2015-09-07 19:20:53.76 Slot-1: Received hellos from process mcmgr 2 more often then expected 3
2015-09-07 19:20:47.92 Slot-1: Process mrp sends hello too often, expected once in 5 secs
2015-09-07 19:20:47.92 Slot-1: Received hellos from process mrp 2 more often then expected 3
2015-09-07 19:20:47.50 Slot-1: Epm application wdg timer warning - 30 sec, kepc 0xffffffff805fee1c(schedule_timeout+0x64/0xe0) uepc 0x2aaec2e8.
2015-09-07 19:20:37.34 Slot-1: Epm application wdg timer warning - 20 sec, kepc 0xffffffff805fee1c(schedule_timeout+0x64/0xe0) uepc 0x2aaec2e8.
2015-09-07 19:19:58.79 Slot-1: Process mrp sends hello too often, expected once in 5 secs
2015-09-07 19:19:58.79 Slot-1: Received hellos from process mrp 2 more often then expected 3
2015-09-07 19:19:24.23 Slot-1: Process mcmgr sends hello too often, expected once in 10 secs
2015-09-07 19:19:24.23 Slot-1: Received hellos from process mcmgr 2 more often then expected 3
2015-09-07 19:19:23.80 Slot-1: Received hellos from process elsm 2 more often then expected 3
2015-09-07 19:19:23.80 Slot-1: Process elsm sends hello too often, expected once in 10 secs
2015-09-07 19:19:08.82 Slot-1: Received hellos from process mrp 2 more often then expected 3
2015-09-07 19:19:08.82 Slot-1: Process mrp sends hello too often, expected once in 5 secs
2015-09-07 19:19:05.23 Slot-1: Epm application wdg timer warning - 20 sec, kepc 0xffffffff802dcca8(do_wait+0x2d0/0x478) uepc 0x2acdb150.
2015-09-07 19:18:31.06 Slot-1: Received hellos from process mrp 2 more often then expected 3
2015-09-07 19:18:31.06 Slot-1: Process mrp sends hello too often, expected once in 5 secs
2015-09-07 19:18:31.06 Slot-1: Received hellos from process mrp 2 more often then expected 3
2015-09-07 19:18:31.06 Slot-1: Process elsm sends hello too often, expected once in 10 secs
2015-09-07 19:18:31.06 Slot-1: Received hellos from process elsm 2 more often then expected 3
2015-09-07 19:18:31.06 Slot-1: Process mcmgr sends hello too often, expected once in 10 secs
2015-09-07 19:18:31.06 Slot-1: Process mrp sends hello too often, expected once in 5 secs
2015-09-07 19:18:31.06 Slot-1: Received hellos from process mcmgr 2 more often then expected 3
4 REPLIES 4

Drew_C
Valued Contributor III
Thanks for coming back to update the thread. I've marked this post as "solved."

Admin_ZML
New Contributor II
The GTAC support has answered the following:


Hello,

My name is Christopher and this case has just been escalated to me.

From the show tech information I can see that there was a process crash of process epm on slot 1 on the 7th of September at 19:24:02
What I also can see are additional memory depletion messages due to process climaster following this process crash at 19:27:16, 19:27:22, and 19:27:29.

I can see that you are having webhttp enabled, can you tell me, are you using the web-interface of this switch?

Taken your comment that at this point you were running EXOS 15.7.1. there is a known issue (xos0062016) in this version of code that cause reboots due to memory depletion of process CliMaster, so (with having the web-interface enabled) I'm quite certain that this is the cause of your reboot. Process EPM is responsible for handling all the running processes, and I'm quite certain that it crashed due to not having sufficient memory left due to the known issue. This would explain the memory depletions showing up right after the process crash.

xos0062016 has been fixed in EXOS 15.7.2, so coincidentally the version that you have already upgraded to.

kind regards,

Christopher Henrich
EMEA TAC Sr. Escalation Support Engineer / Extreme Networks

Admin_ZML
New Contributor II
I have opened an GTAC case and will post the outcome if it is solved.

Patrick_Voss
Extreme Employee
From the looks of it you ran into a process crash. Can you paste the output for "ls" and "ls internal memory". We may be able to assist you here but ultimately a GTAC case may have to be opened to see what can be done.
GTM-P2G8KFN