1x X670-V with 2x x460-V stack - switch 2 fails ?

  • 0
  • 1
  • Problem
  • Updated 3 years ago
  • Solved
HI all, 

I am hoping someone can shed some light on an issue that a client is experiencing.


They have this stack, and switch 2 keeps failing every now and then.  I have gone through some pages here, but couldn't find a page that gives an answer.

Here is the details as well as extracts from the LOG.  I do have the entire show tech-support output if you guys would like to check some info.

I have asked the client to check the environmental aspects - power cables, swop PSUs, swop SFP transceivers and I am waiting for their reply.  

If you have any other ideas for me to try, I would be happy to.  I have thus far mainly worked on EOS devices, so the XOS is still a little new to me, so is there something I am missing or would this ultimately just be a switch error and a swop out should be arranged? 

The switches are located in different areas and connected via the native 10gb sfp ports.  Any suggestions ?

This is the error from the log:

10/21/2015 00:13:38.70
<Info:HAL.Card.Info> Slot-1: Slot-2 down, resetting all TCP connections
to it
10/21/2015 00:13:38.70
<Info:HAL.Card.Info> Slot-1: Module in Slot-2 is removed
10/21/2015 00:13:38.29 <Info:HAL.Port.Info>
Slot-1: Stacking port 3:1 link down.
10/21/2015 00:13:38.29
<Info:HAL.Port.Info> Slot-1: Stacking port 2:2 link down.
10/21/2015 00:13:38.29
<Info:HAL.Port.Info> Slot-1: Stacking port 2:1 link down.
10/21/2015 00:13:38.26
<Info:HAL.Card.Info> Slot-1: Slot-2 down, resetting all TCP connections
to it
10/21/2015 00:13:38.26
<Info:HAL.Card.Info> Slot-1: Module in Slot-2 is removed
10/21/2015 00:13:37.86
<Info:HAL.Port.Info> Slot-1: Stacking port 1:2 link down.
10/21/2015 00:13:34.90
<Info:LACP.RemPortFromAggr> Slot-1: Remove port 3:14 from aggregator
10/21/2015 00:13:34.71
<Noti:HAL.Sys.Notice> Slot-1: Module in fan slot 2 is removed
10/21/2015 00:13:31.07
<Warn:DM.Warning> Slot-3: Slot-2 FAILED (1) No Master
10/21/2015 00:13:31.06
<Erro:DM.Error> Slot-2: Node State[3] = FAIL (No Master)
10/21/2015 00:13:31.06
<Warn:DM.Warning> Slot-2: PRIMARY NODE (Slot-1) DOWN
10/21/2015 00:13:22.79
<Info:HAL.Sys.Info> Slot-1: Internal PSU-2 in slot 2 is disconnected.
10/21/2015 00:13:22.79
<Info:HAL.Sys.Info> Slot-1: Internal PSU-1 in slot 2 is disconnected.
Here is output from the config and show tech-support command that I think is relevant.  Let me know if there is anything else I should be l looking at ? 

System Type:      X670V-48x (Stack)
SysHealth check:  Enabled (Normal)
Recovery Mode:    All
System Watchdog:  Enabled
Slot:             Slot-1 *

Current State:    MASTER                                               
Image Selected:   secondary                                            
Image Booted:     secondary                                            
Primary ver:      15.5.2.9                                             
Secondary ver:    15.5.3.4                                             
                  patch1-5

Slot-1         : X670V-48x
Slot-2         : X460-48x  
Slot-3         : X460-48x 

Slot Port Select Node MAC Address  Port State  Flags Speed---- ----
*1   1    47     00:04:96:8b:xxxx Operational C-      10G
*1   2    48     00:04:96:8b:xxxx Operational C-      10G
 2   1    S1     00:04:96:8b:xxxx Operational C-      10G
 2   2    S2     00:04:96:8b:xxxx Operational CB      10G
 3   1    S1     00:04:96:8b:xxxx Operational CB      10G
 3   2    S2     00:04:96:8b:xxxx Operational C-      10G

   Stacking protocol         : Enhanced on all switches
configure stack-ports 1:1 debounce time 300
configure stack-ports 1:2 debounce time 300
configure stack-ports 2:1 debounce time 300
configure stack-ports 2:2 debounce time 300
configure stack-ports 3:1 debounce time 300
configure stack-ports 3:2 debounce time 300
As always - much appreciated !!!!!
Photo of Dewald Botha

Dewald Botha

  • 674 Points 500 badge 2x thumb

Posted 3 years ago

  • 0
  • 1
Photo of Patrick Voss

Patrick Voss, Employee

  • 11,474 Points 10k badge 2x thumb
Hi Dewald,

I would recommend looking into the slot that keeps failing. This can be done by using the "telnet slot <slot#> command and login using the same credentials. This switch should have different logs and should be able to give you some more information in regards to the crash.

I see that you have configured debounce on the stack-ports. Was this a recommendation from a GTAC engineer or a SE?
Photo of Dewald Botha

Dewald Botha

  • 674 Points 500 badge 2x thumb
Yes it was.  I will revert to the client to get the output from the switch.   thanks !
Photo of Dewald Botha

Dewald Botha

  • 674 Points 500 badge 2x thumb
HI Patrick, 

Some interesting things.  First off the time on stack.sw2 is different. 

I did also notice the following from the log; 
10/20/2015 22:15:40.36
<Noti:EPM.wd_warm_reset> Slot-2: Changing to watchdog warm reset mode
10/20/2015 22:13:31.62
<Warn:EPM.all_shutdown> Slot-2: Shutting down all processes
10/20/2015 22:13:31.58
<Warn:DM.Warning> Slot-2: Slot-2 FAILED (1) No Master

I checked online and found where this line appears, people are indicating that it is related to software glitches ?  Or is this expected?

This is the Firmware version currently in the stack :
Slot-1      : xxxx Rev 10.0 BootROM: 2.0.1.7    IMG: 15.5.3.4  
Slot-2      : xxxx Rev 2.0 BootROM: 2.0.1.7    IMG: 15.5.3.4  
Slot-3      : xxxx Rev 2.0 BootROM: 2.0.1.7    IMG: 15.5.3.4
Thanks !
Photo of Patrick Voss

Patrick Voss, Employee

  • 11,474 Points 10k badge 2x thumb
Hi Dewald,

As far as the time goes you can attempts a "synchronize stacking" from the master and then reboot the entire stack. The time difference should not matter considering any information is pulled from the master.

A "no master" log could mean a couple things. Ultimately I believe this should be looked at by GTAC (If entitlement is applied on the switches).

If that is not an option and you have not seen this issue since the original occurrence you can try to upgrade to the latest of patch of the recommended version of code listed below:

https://gtacknowledge.extremenetworks.com/articles/Q_A/What-Is-The-Recommended-Release-of-EXOS-For-My-Platform

Please keep in mind that this is a mixed stack and the recommended version might be different.

Hope this helps!