Switch in stack keeps resetting

  • 0
  • 2
  • Problem
  • Updated 6 months ago
  • Solved
I have a stack with 5 x450e-48p switches. Back in January I replaced slot 3 due to it failing and all was good after that. A couple weeks ago slot 3 failed again so I replaced the switch and all was good until this past Friday when it started randomly resetting itself. I don't want to replace the switch again as I don't know if that is the issue. Could a bad stacking cable or module cause this? It is running on an older firmware of 12.4.1.7, could that be causing it? 
Photo of DH

DH

  • 1,104 Points 1k badge 2x thumb

Posted 10 months ago

  • 0
  • 2
Photo of Ash Curtis

Ash Curtis, Employee

  • 588 Points 500 badge 2x thumb
Are there any logs from slot 3 which indicate why the node has failed? Are you seeing any CRC errors or link flaps on your stack ports? 

The version you are running could certainly use an upgrade, the latest recommended release for an X450e is 15.5.3.2-patch1-14.

If you have access to this EXOS release, I would certainly recommend upgrading this stack as per the directions in this KB article:

https://gtacknowledge.extremenetworks.com/articles/How_To/How-to-upgrade-EXOS-on-a-Summit-Stack
Photo of DH

DH

  • 1,104 Points 1k badge 2x thumb
10/09/2017 04:33:37.21 <Warn:HAL.Card.Warning> Slot-2: Slot 3 is not present to do card exec cmd POWER_OFF
10/08/2017 19:22:29.80 <Info:HAL.Card.Info> Slot-1: Module in slot 3 is operational
10/08/2017 19:21:44.31 <Info:HAL.Card.Info> Slot-1: Module in slot 3 is inserted
10/08/2017 19:21:40.08 <Info:HAL.Card.Info> Slot-1: Module in slot 3 is removed
10/08/2017 19:20:31.23 <Warn:HAL.Card.Warning> Slot-2: Slot 3 is not present to do card exec cmd POWER_OFF
10/06/2017 22:26:08.08 <Info:HAL.Card.Info> Slot-1: Module in slot 3 is operational
10/06/2017 22:25:26.25 <Info:HAL.Card.Info> Slot-1: Module in slot 3 is inserted
10/06/2017 22:23:46.25 <Info:HAL.Card.Info> Slot-1: Module in slot 3 is removed
10/06/2017 13:10:17.48 <Info:HAL.Card.Info> Slot-1: Module in slot 3 is operational
10/06/2017 13:09:35.26 <Info:HAL.Card.Info> Slot-1: Module in slot 3 is inserted
10/06/2017 13:08:47.62 <Info:HAL.Card.Info> Slot-1: Module in slot 3 is removed

I am not showing any errors on the stacking ports either. As of now, slow 3 is in a present mode and active in the stack but none of the connections are active. 

Node MAC Address    Slot  Stack State  Role     Flags
------------------  ----  -----------  -------  ---
*00:04:96:34:f0:dc  1     Active       Master   CA-
 00:04:96:34:f1:2e  2     Active       Backup   CA-
 00:04:96:35:07:d6  3     Active       Standby  CA-
 00:04:96:34:f1:64  4     Active       Standby  CA-
 00:04:96:27:a1:b6  5     Active       Standby  CA-
* - Indicates this node
Flags:  (C) Candidate for this active topology, (A) Active Node
        (O) node may be in Other active topology
*  # sh slot 3

Slot-3 information:
     State:               Present
     Download %:          100
     Restart count:       1 (limit 5)
     Serial number:       800190-00-07 0820G-80558
     Hw Module Type:      X450e-48p
     SW Version:          12.4.1.7
     SW Build:            v1241b7
     Configured Type:     X450e-48p
     Ports available:     50
     Recovery Mode:       Reset
     Node MAC:            02:04:96:34:F0:DC
     Current State:       STANDBY
     Image Selected:      secondary
     Image Booted:        secondary
     Primary ver:         15.1.1.6
     Secondary ver:       12.4.1.7
     Config Selected:     secondary.cfg
(Edited)
Photo of DH

DH

  • 1,104 Points 1k badge 2x thumb
I was able to reboot the slot last night and the node became operational. I did notice that the time is off by about 4 hours. Isn't the time synced from the master node? How do I get it to show the correct time? All the other nodes are correct. 
Photo of DH

DH

  • 1,104 Points 1k badge 2x thumb
I have upgraded the XOS to the version stated above but slot 3 keeps failing and resetting. It states it lost communication with the master node. Could this be a stacking cable issue or stacking port issue? 

Here is a screenshot of the stacking-port utilization. It clearly shows very little traffic on ports 3:2 & 4:1 

Link Utilization Averages                            Fri Oct 13 12:12:03 2017
Port     Link    Rx              Peak Rx          Tx               Peak Tx
         State   pkts/sec        pkts/sec         pkts/sec         pkts/sec
================================================================================
1:1       A           395            395             5541            5541
1:2       A           815           1055             5591            5880
2:1       A          5293           6674              685            1132
2:2       A           311            460             5051            5985
3:1       A          5206           5974              379             379
3:2       A             1              4               21              28
4:1       A            21             77                1              43
4:2       A          5028           5267              113             240
5:1       A           113            240             5014            5267
5:2       A          5313           5452              337             390
(Edited)
Photo of Drew C.

Drew C., Community Manager

  • 38,358 Points 20k badge 2x thumb
Hi DH,
It might be best for you to open a case with GTAC so the team can dig a little deeper to see what might be happening to your stack.
Photo of Doug Hyde

Doug Hyde, Technical Support Manager

  • 20,502 Points 20k badge 2x thumb
Solved via GTAC case. 
Photo of Yurij

Yurij

  • 140 Points 100 badge 2x thumb
how was the problem solved?