XOS 12.6.1.3 stack crashes after 1555 days uptime

  • 4 December 2015
  • 12 replies
  • 300 views

We have two Extreme stacks that have shown the same behavior at 1555 days uptime:

Slave switch breaks out of stack and becomes 2nd master,
All ports on slave switch go dark. (they can be bounced to re-enable directly from switch 2)

Has anyone else experienced this? I just love uptime bugs.....

12 replies

Userlevel 4
Hi Matthew,

What type of switches are in the stack and what EXOS version are they running?
X460V-48T in stack 1 and X670V-48x in stack 2. Both are on XOS 12.6.1.3
Userlevel 7
Edit - should've refreshed before posting. Thanks Matthew
---
That's an odd one, Matthew. I don't think we've seen it before.
12.6 is unsupported now but we can test and see if it's something that has been fixed in a later version. Can you give me some more information about your gear?

code:
show switch
and
code:
show slot
should be enough for now.
Sure, here you go - Ill paste both switches outputs for the X670s so you can see the difference now:

Switch 1

Slot-1 stor-cwc-sw1.iad.1 # sh switch
SysName: stor-cwc-sw1.iad
SysLocation:
SysContact: support@extremenetworks.com, +1 888 257 3000
System MAC: 02:04:96:52:9B:13
System Type: X670V-48x (Stack)

SysHealth check: Enabled (Normal)
Recovery Mode: All
System Watchdog: Enabled

Current Time: Fri Dec 4 12:58:37 2015
Timezone: [Auto DST Enabled] GMT Offset: -360 minutes, name is CST.
DST of 60 minutes is currently not in effect, name is EDT.
DST begins every second Sunday March at 2:00
DST ends every first Sunday November at 2:00

Boot Time: Thu Sep 1 05:18:03 2011
Boot Count: 6
Next Reboot: None scheduled
System UpTime: 1555 days 8 hours 40 minutes 34 seconds

Slot: Slot-1 * No Backup
------------------------ ------------------------
Current State: MASTER

Image Selected: secondary
Image Booted: secondary
Primary ver: 12.6.0.31
Secondary ver: 12.6.1.3

Config Selected: primary.cfg
Config Booted: primary.cfg

primary.cfg Created by ExtremeXOS version 12.6.1.3
401039 bytes saved on Mon Aug 10 09:28:54 2015
Slot-1 stor-cwc-sw1.iad.2 # sh slot
Slots Type Configured State Ports
--------------------------------------------------------------------
Slot-1 X670V-48x X670V-48x Operational 64
Slot-2 X670V-48x X670V-48x Failed 64
Slot-3 Empty 0
Slot-4 Empty 0
Slot-5 Empty 0
Slot-6 Empty 0
Slot-7 Empty 0
Slot-8 Empty 0

Slot-1 stor-cwc-sw1.iad.3 #

Switch 2:

* Slot-2 stor-cwc-sw1.iad.1 # sh switch
SysName: stor-cwc-sw1.iad
SysLocation:
SysContact: support@extremenetworks.com, +1 888 257 3000
System MAC: 02:04:96:52:9B:13
System Type: X670V-48x (Stack)

SysHealth check: Enabled (Normal)
Recovery Mode: All
System Watchdog: Enabled

Current Time: Fri Dec 4 12:59:34 2015
Timezone: [Auto DST Enabled] GMT Offset: -360 minutes, name is CST.
DST of 60 minutes is currently not in effect, name is EDT.
DST begins every second Sunday March at 2:00
DST ends every first Sunday November at 2:00

Boot Time: Thu Sep 1 18:29:15 2011
Boot Count: 6
Next Reboot: None scheduled
System UpTime: 1555 days 8 hours 40 minutes 51 seconds

Slot: Slot-2 * No Backup
------------------------ ------------------------
Current State: MASTER

Image Selected: secondary
Image Booted: secondary
Primary ver: 12.6.0.31
Secondary ver: 12.6.1.3

Config Selected: primary.cfg
Config Booted: primary.cfg

primary.cfg Created by ExtremeXOS version 12.6.1.3
401039 bytes saved on Mon Aug 10 09:28:54 2015
* Slot-2 stor-cwc-sw1.iad.2 # sh slot
Slots Type Configured State Ports
--------------------------------------------------------------------
Slot-1 X670V-48x X670V-48x Failed 64
Slot-2 X670V-48x X670V-48x Operational 64
Slot-3 Empty 0
Slot-4 Empty 0
Slot-5 Empty 0
Slot-6 Empty 0
Slot-7 Empty 0
Slot-8 Empty 0
Here are the X460s:

Switch 1:

Slot-1 aggr-cwc-sw1.1 # sh sw
SysName: aggr-cwc-sw1
SysLocation:
SysContact: support@extremenetworks.com, +1 888 257 3000
System MAC: 02:04:96:51:F9:C6
System Type: X460-48t (Stack)

SysHealth check: Enabled (Normal)
Recovery Mode: All
System Watchdog: Enabled

Current Time: Sun Nov 29 15:58:05 2015
Timezone: [Auto DST Enabled] GMT Offset: -360 minutes, name is CST.
DST of 60 minutes is currently not in effect, name is EDT.
DST begins every second Sunday March at 2:00
DST ends every first Sunday November at 2:00

Boot Time: Mon Aug 22 21:52:09 2011
Boot Count: 5
Next Reboot: None scheduled
System UpTime: 1559 days 19 hours 5 minutes 55 seconds

Slot: Slot-1 * No Backup
------------------------ ------------------------
Current State: MASTER

Image Selected: secondary
Image Booted: secondary
Primary ver: 12.5.0.14
Secondary ver: 12.6.1.3

Config Selected: primary.cfg
Config Booted: primary.cfg

primary.cfg Created by ExtremeXOS version 12.6.1.3
582582 bytes saved on Sun Nov 29 14:27:51 2015
Slot-1 aggr-cwc-sw1.2 # sh sl
Slots Type Configured State Ports
--------------------------------------------------------------------
Slot-1 X460-48t X460-48t Operational 54
Slot-2 X460-48t X460-48t Failed 54
Slot-3 Empty 0
Slot-4 Empty 0
Slot-5 Empty 0
Slot-6 Empty 0
Slot-7 Empty 0
Slot-8 Empty 0

Switch 2:

* Slot-2 aggr-cwc-sw1.1 # sh sw
SysName: aggr-cwc-sw1
SysLocation:
SysContact: support@extremenetworks.com, +1 888 257 3000
System MAC: 02:04:96:51:F9:C6
System Type: X460-48t (Stack)

SysHealth check: Enabled (Normal)
Recovery Mode: All
System Watchdog: Enabled

Current Time: Sun Nov 29 15:54:34 2015
Timezone: [Auto DST Enabled] GMT Offset: -360 minutes, name is CST.
DST of 60 minutes is currently not in effect, name is EDT.
DST begins every second Sunday March at 2:00
DST ends every first Sunday November at 2:00

Boot Time: Tue Aug 23 16:05:18 2011
Boot Count: 5
Next Reboot: None scheduled
System UpTime: 1559 days 19 hours 2 minutes 18 seconds

Slot: Slot-2 * No Backup
------------------------ ------------------------
Current State: MASTER

Image Selected: secondary
Image Booted: secondary
Primary ver: 12.5.0.14
Secondary ver: 12.6.1.3

Config Selected: primary.cfg
Config Booted: primary.cfg

primary.cfg Created by ExtremeXOS version 12.6.1.3
580097 bytes saved on Fri Nov 20 03:14:46 2015
* Slot-2 aggr-cwc-sw1.2 # sh sl
Slots Type Configured State Ports
--------------------------------------------------------------------
Slot-1 X460-48t X460-48t Failed 54
Slot-2 X460-48t X460-48t Operational 54
Slot-3 Empty 0
Slot-4 Empty 0
Slot-5 Empty 0
Slot-6 Empty 0
Slot-7 Empty 0
Slot-8 Empty 0
Userlevel 7
Thanks - now I need to see how I can fake uptime in EXOS so we don't have to wait 4 years :)

Meanwhile, I would suggest a reboot of the stack to bring things back in proper order. Go ahead and grab the output of "show tech all" from both stacks to have on hand. I would also recommend opening a case with GTAC. Just know that one of the first things they're going to ask you to do is upgrade your stacks to a supported version.

Once a case is opened, GTAC can help track this to resolution.
Thanks Drew - I already have a TAC case open but nothing conclusive has come of it. I turned over the sh tech output to them yesterday and they've sorted through it.

I'm not able to reboot the stack, but will work on a window to reboot each of the slave switches and let you know what happens. That was also supports suggestion.
Userlevel 7
Thanks Drew - I already have a TAC case open but nothing conclusive has come of it. I turned over the sh tech output to them yesterday and they've sorted through it.

I'm not able to reboot the stack, but will work on a window to reboot each of the slave switches and let you know what happens. That was also supports suggestion.
Excellent. I found your case in the system and will see if I can or need to work with the owner to help replicate this.
Thanks Drew - I already have a TAC case open but nothing conclusive has come of it. I turned over the sh tech output to them yesterday and they've sorted through it.

I'm not able to reboot the stack, but will work on a window to reboot each of the slave switches and let you know what happens. That was also supports suggestion.
Hey, I appreciate it! This is my first time posting anyting to the hub and only my 2nd ticket in years with Extreme, so dunno if I went about it backwards, but I do appreciate the assistance.
Userlevel 7
Thanks Drew - I already have a TAC case open but nothing conclusive has come of it. I turned over the sh tech output to them yesterday and they've sorted through it.

I'm not able to reboot the stack, but will work on a window to reboot each of the slave switches and let you know what happens. That was also supports suggestion.
No worries. It's okay that you've gone through two of our support channels. Since there is a case opened, most of your updates are going to come from the case owner.
If this is, in fact, a software issue (and it seems that it is) it's a rare one. Most systems get at least one reboot for some reason or other in 4 years - so I'm impressed by that!
Thanks Drew - I already have a TAC case open but nothing conclusive has come of it. I turned over the sh tech output to them yesterday and they've sorted through it.

I'm not able to reboot the stack, but will work on a window to reboot each of the slave switches and let you know what happens. That was also supports suggestion.
Ha, yeah, understood - these switches are absolutely critical to us but even I was surprised by that. We'll take this as an opportunity to update them once we review what the newest rock-solid version is.
Hello.
The problem with crash after 1554 days of uptime on the stack solved by the vendor?
In which version?

We got this problem on stacks with the versions 12.6.3.2 and 15.3.3.5.
Now we have installed version 16.2.4.5 - is this problem solved in this version?

We opened a ticket to the partner on the problem with version 12.6.3.2 in December 2017 and the answer: update to new version.
The ticket for the problem with version 15.3.3.5 is just open, but there is no answer.

Reply