XOS 12.6.1.3 stack crashes after 1555 days uptime

  • 1
  • 2
  • Problem
  • Updated 4 months ago
  • Solved
We have two Extreme stacks that have shown the same behavior at 1555 days uptime:

Slave switch breaks out of stack and becomes 2nd master,
All ports on slave switch go dark.  (they can be bounced to re-enable directly from switch 2)

Has anyone else experienced this?  I just love uptime bugs.....
Photo of Matthew Tedder

Matthew Tedder

  • 140 Points 100 badge 2x thumb

Posted 3 years ago

  • 1
  • 2
Photo of Dorian Perry

Dorian Perry, Employee

  • 2,302 Points 2k badge 2x thumb
Hi Matthew,

What type of switches are in the stack and what EXOS version are they running?
Photo of Matthew Tedder

Matthew Tedder

  • 140 Points 100 badge 2x thumb
X460V-48T in stack 1 and X670V-48x in stack 2.  Both are on XOS 12.6.1.3
Photo of Drew C.

Drew C., Community Manager

  • 40,724 Points 20k badge 2x thumb
Edit - should've refreshed before posting.  Thanks Matthew
---
That's an odd one, Matthew.  I don't think we've seen it before.
12.6 is unsupported now but we can test and see if it's something that has been fixed in a later version.  Can you give me some more information about your gear?

show switch and show slot should be enough for now.
(Edited)
Photo of Matthew Tedder

Matthew Tedder

  • 140 Points 100 badge 2x thumb
Sure, here you go - Ill paste both switches outputs for the X670s so you can see the difference now:

Switch 1

Slot-1 stor-cwc-sw1.iad.1 # sh switch
SysName:          stor-cwc-sw1.iad
SysLocation:
SysContact:       support@extremenetworks.com, +1 888 257 3000
System MAC:       02:04:96:52:9B:13
System Type:      X670V-48x (Stack)

SysHealth check:  Enabled (Normal)
Recovery Mode:    All
System Watchdog:  Enabled

Current Time:     Fri Dec  4 12:58:37 2015
Timezone:         [Auto DST Enabled] GMT Offset: -360 minutes, name is CST.
                  DST of 60 minutes is currently not in effect, name is EDT.
                  DST begins every second Sunday March at 2:00
                  DST ends every first Sunday November at 2:00

Boot Time:        Thu Sep  1 05:18:03 2011
Boot Count:       6
Next Reboot:      None scheduled
System UpTime:    1555 days 8 hours 40 minutes 34 seconds

Slot:             Slot-1 *                     No Backup
                  ------------------------     ------------------------
Current State:    MASTER

Image Selected:   secondary
Image Booted:     secondary
Primary ver:      12.6.0.31
Secondary ver:    12.6.1.3

Config Selected:  primary.cfg
Config Booted:    primary.cfg

primary.cfg       Created by ExtremeXOS version 12.6.1.3
                  401039 bytes saved on Mon Aug 10 09:28:54 2015
Slot-1 stor-cwc-sw1.iad.2 # sh slot
Slots    Type                 Configured           State       Ports
--------------------------------------------------------------------
Slot-1   X670V-48x            X670V-48x            Operational   64
Slot-2   X670V-48x            X670V-48x            Failed        64
Slot-3                                             Empty          0
Slot-4                                             Empty          0
Slot-5                                             Empty          0
Slot-6                                             Empty          0
Slot-7                                             Empty          0
Slot-8                                             Empty          0

Slot-1 stor-cwc-sw1.iad.3 #


Switch 2:

* Slot-2 stor-cwc-sw1.iad.1 # sh switch
SysName:          stor-cwc-sw1.iad
SysLocation:
SysContact:       support@extremenetworks.com, +1 888 257 3000
System MAC:       02:04:96:52:9B:13
System Type:      X670V-48x (Stack)

SysHealth check:  Enabled (Normal)
Recovery Mode:    All
System Watchdog:  Enabled

Current Time:     Fri Dec  4 12:59:34 2015
Timezone:         [Auto DST Enabled] GMT Offset: -360 minutes, name is CST.
                  DST of 60 minutes is currently not in effect, name is EDT.
                  DST begins every second Sunday March at 2:00
                  DST ends every first Sunday November at 2:00

Boot Time:        Thu Sep  1 18:29:15 2011
Boot Count:       6
Next Reboot:      None scheduled
System UpTime:    1555 days 8 hours 40 minutes 51 seconds

Slot:             Slot-2 *                     No Backup
                  ------------------------     ------------------------
Current State:    MASTER

Image Selected:   secondary
Image Booted:     secondary
Primary ver:      12.6.0.31
Secondary ver:    12.6.1.3

Config Selected:  primary.cfg
Config Booted:    primary.cfg

primary.cfg       Created by ExtremeXOS version 12.6.1.3
                  401039 bytes saved on Mon Aug 10 09:28:54 2015
* Slot-2 stor-cwc-sw1.iad.2 # sh slot
Slots    Type                 Configured           State       Ports
--------------------------------------------------------------------
Slot-1   X670V-48x            X670V-48x            Failed        64
Slot-2   X670V-48x            X670V-48x            Operational   64
Slot-3                                             Empty          0
Slot-4                                             Empty          0
Slot-5                                             Empty          0
Slot-6                                             Empty          0
Slot-7                                             Empty          0
Slot-8                                             Empty          0
Photo of Matthew Tedder

Matthew Tedder

  • 140 Points 100 badge 2x thumb
Here are the X460s:

Switch 1:

Slot-1 aggr-cwc-sw1.1 # sh sw
SysName:          aggr-cwc-sw1
SysLocation:
SysContact:       support@extremenetworks.com, +1 888 257 3000
System MAC:       02:04:96:51:F9:C6
System Type:      X460-48t (Stack)

SysHealth check:  Enabled (Normal)
Recovery Mode:    All
System Watchdog:  Enabled

Current Time:     Sun Nov 29 15:58:05 2015
Timezone:         [Auto DST Enabled] GMT Offset: -360 minutes, name is CST.
                  DST of 60 minutes is currently not in effect, name is EDT.
                  DST begins every second Sunday March at 2:00
                  DST ends every first Sunday November at 2:00

Boot Time:        Mon Aug 22 21:52:09 2011
Boot Count:       5
Next Reboot:      None scheduled
System UpTime:    1559 days 19 hours 5 minutes 55 seconds

Slot:             Slot-1 *                     No Backup
                  ------------------------     ------------------------
Current State:    MASTER

Image Selected:   secondary
Image Booted:     secondary
Primary ver:      12.5.0.14
Secondary ver:    12.6.1.3

Config Selected:  primary.cfg
Config Booted:    primary.cfg

primary.cfg       Created by ExtremeXOS version 12.6.1.3
                  582582 bytes saved on Sun Nov 29 14:27:51 2015
Slot-1 aggr-cwc-sw1.2 # sh sl
Slots    Type                 Configured           State       Ports
--------------------------------------------------------------------
Slot-1   X460-48t             X460-48t             Operational   54
Slot-2   X460-48t             X460-48t             Failed        54
Slot-3                                             Empty          0
Slot-4                                             Empty          0
Slot-5                                             Empty          0
Slot-6                                             Empty          0
Slot-7                                             Empty          0
Slot-8                                             Empty          0






Switch 2:

* Slot-2 aggr-cwc-sw1.1 # sh sw
SysName:          aggr-cwc-sw1
SysLocation:
SysContact:       support@extremenetworks.com, +1 888 257 3000
System MAC:       02:04:96:51:F9:C6
System Type:      X460-48t (Stack)

SysHealth check:  Enabled (Normal)
Recovery Mode:    All
System Watchdog:  Enabled

Current Time:     Sun Nov 29 15:54:34 2015
Timezone:         [Auto DST Enabled] GMT Offset: -360 minutes, name is CST.
                  DST of 60 minutes is currently not in effect, name is EDT.
                  DST begins every second Sunday March at 2:00
                  DST ends every first Sunday November at 2:00

Boot Time:        Tue Aug 23 16:05:18 2011
Boot Count:       5
Next Reboot:      None scheduled
System UpTime:    1559 days 19 hours 2 minutes 18 seconds

Slot:             Slot-2 *                     No Backup
                  ------------------------     ------------------------
Current State:    MASTER

Image Selected:   secondary
Image Booted:     secondary
Primary ver:      12.5.0.14
Secondary ver:    12.6.1.3

Config Selected:  primary.cfg
Config Booted:    primary.cfg

primary.cfg       Created by ExtremeXOS version 12.6.1.3
                  580097 bytes saved on Fri Nov 20 03:14:46 2015
* Slot-2 aggr-cwc-sw1.2 # sh sl
Slots    Type                 Configured           State       Ports
--------------------------------------------------------------------
Slot-1   X460-48t             X460-48t             Failed        54
Slot-2   X460-48t             X460-48t             Operational   54
Slot-3                                             Empty          0
Slot-4                                             Empty          0
Slot-5                                             Empty          0
Slot-6                                             Empty          0
Slot-7                                             Empty          0
Slot-8                                             Empty          0
Photo of Drew C.

Drew C., Community Manager

  • 40,724 Points 20k badge 2x thumb
Thanks - now I need to see how I can fake uptime in EXOS so we don't have to wait 4 years :)

Meanwhile, I would suggest a reboot of the stack to bring things back in proper order.  Go ahead and grab the output of "show tech all" from both stacks to have on hand.  I would also recommend opening a case with GTAC.  Just know that one of the first things they're going to ask you to do is upgrade your stacks to a supported version.

Once a case is opened, GTAC can help track this to resolution.
Photo of Matthew Tedder

Matthew Tedder

  • 140 Points 100 badge 2x thumb
Thanks Drew - I already have a TAC case open but nothing conclusive has come of it.  I turned over the sh tech output to them yesterday and they've sorted through it.

I'm not able to reboot the stack, but will work on a window to reboot each of the slave switches and let you know what happens.  That was also supports suggestion.
Photo of Drew C.

Drew C., Community Manager

  • 40,684 Points 20k badge 2x thumb
Excellent. I found your case in the system and will see if I can or need to work with the owner to help replicate this.
Photo of Matthew Tedder

Matthew Tedder

  • 140 Points 100 badge 2x thumb
Hey, I appreciate it!  This is my first time posting anyting to the hub and only my 2nd ticket in years with Extreme, so dunno if I went about it backwards, but I do appreciate the assistance.
Photo of Drew C.

Drew C., Community Manager

  • 40,684 Points 20k badge 2x thumb
No worries.  It's okay that you've gone through two of our support channels.  Since there is a case opened, most of your updates are going to come from the case owner.
If this is, in fact, a software issue (and it seems that it is) it's a rare one.  Most systems get at least one reboot for some reason or other in 4 years - so I'm impressed by that!
Photo of Matthew Tedder

Matthew Tedder

  • 140 Points 100 badge 2x thumb
Ha, yeah, understood - these switches are absolutely critical to us but even I was surprised by that.  We'll take this as an opportunity to update them once we review what the newest rock-solid version is. 
Photo of Alexander Kazakov

Alexander Kazakov

  • 80 Points 75 badge 2x thumb
Hello.
The problem with crash after 1554 days of uptime on the stack solved by the vendor?
In which version?

We got this problem on stacks with the versions 12.6.3.2 and 15.3.3.5.
Now we have installed version 16.2.4.5 - is this problem solved in this version? 

We opened a ticket to the partner on the problem with version 12.6.3.2 in December 2017 and the answer: update to new version.
The ticket for the problem with version 15.3.3.5 is just open, but there is no answer.