HELP! Stack unexpected reboot

  • 0
  • 1
  • Problem
  • Updated 2 years ago
  • Solved
XOS: ExtremeXOS version 15.3.3.5 v1533b5-patch1-2


One of our extreme x460/x450 stacks rebooted unexpectedly this morning (at 04:52 ) Logs suggests the following:

2017-01-18 04:57:14.33 <Info:HAL.Port.Info> Stacking port 1:1 link up at 10Gbps.
2017-01-18 04:57:13.99 <Info:HAL.Sys.Info> Starting hal initialization ....
2017-01-18 04:57:12.29 <Info:telnetd.info> telnetd listening on port 23
                                           
2017-01-18 04:57:06.18 <Erro:HAL.Stacking.CfgStkMACAddrInv> The stack MAC address is not correctly configured on this node. The stack can not operate properly in this condition. Please correct and reboot.
2017-01-18 04:57:03.16 <Noti:DM.Notice> DM started
2017-01-18 04:57:02.95 <Noti:NM.StrtProc> The Node Manager (NM) has started processing.
2017-01-18 04:57:02.15 <Noti:EPM.start> EPM Started
2017-01-18 04:57:01.83 <Noti:EPM.wd_warm_reset> Changing to watchdog warm reset mode
2017-01-18 04:52:20.87 <Warn:DM.Warning> Slot-1 FAILED (1) Backup lost
2017-01-18 04:52:20.83 <Warn:EPM.all_shutdown> Shutting down all processes
2017-01-18 04:52:20.53 <Erro:DM.Error> Node State[4] = FAIL (Backup lost)
2017-01-18 04:52:20.53 <Warn:DM.Warning> MASTER decided that I am not BACKUP anymore
2017-01-18 04:52:20.53 <Warn:DM.Warning> BACKUP NODE (Slot-1) DOWN

has anyone had a similar problem?
Thx,
Mykhaylo
Photo of Mykhaylo Skrypka

Mykhaylo Skrypka

  • 936 Points 500 badge 2x thumb

Posted 2 years ago

  • 0
  • 1
Photo of Vellachery, Sumeesh

Vellachery, Sumeesh, Employee

  • 3,288 Points 3k badge 2x thumb
Mykhaylo,

Did you get a chance to review the logs from Slot-1? (during that time-stamp around 4:52)
Copy paste the output of "show debug system dump" to see if there is any process crash?
Also, check if there are any process crash files in the internal-memory? "ls internal-memory"
Photo of Mykhaylo Skrypka

Mykhaylo Skrypka

  • 936 Points 500 badge 2x thumb
Looks like no errors:

Slot-1 es-vil-vbwc-20.3 # show switch | i nt.TCurrent Time:     Wed Jan 18 17:46:43 2017
Slot-1 es-vil-vbwc-20.4 # show stacking detail
Stacking Node 00:04:96:83:4c:a6 information:
   Current:
      Stacking                  : Enabled
      Role                      : Master
      Priority                  : Automatic
      Slot number               : 1
      Stack state               : Active
      Master capable?           : Yes
      Stacking protocol         : Standard
      License level restriction : <none>
      In active topology?       : Yes
      Factory MAC address       : 00:04:96:83:4c:a6
      Stack MAC address         : 02:04:96:83:4c:a6
      Alternate IP address      : <none>
      Alternate gateway         : <none>
      Stack Port 1:
         State                  : Operational
         Blocked?               : No
         Control path active?   : Yes
         Selection              : Native
      Stack Port 2:
         State                  : Operational
         Blocked?               : No
         Control path active?   : Yes
         Selection              : Native
   Configured:
      Stacking                  : Enabled
      Master capable?           : Yes
      Slot number               : 1
      Stack MAC address         : 02:04:96:83:4c:a6
      Stacking protocol         : Standard
      License level restriction : <none>
      Stack Port 1:
         Selection              : Native
      Stack Port 2:
         Selection              : Native

Stacking Node 00:04:96:83:4c:a8 information:
   Current:
      Stacking                  : Enabled
      Role                      : Backup
      Priority                  : Automatic
      Slot number               : 2
      Stack state               : Active
      Master capable?           : Yes
      Stacking protocol         : Standard
      License level restriction : <none>
      In active topology?       : Yes
      Factory MAC address       : 00:04:96:83:4c:a8
      Stack MAC address         : 02:04:96:83:4c:a6
      Alternate IP address      : <none>
      Alternate gateway         : <none>
      Stack Port 1:
         State                  : Operational
         Blocked?               : No
         Control path active?   : Yes
         Selection              : Native
      Stack Port 2:
         State                  : Operational
         Blocked?               : No
         Control path active?   : Yes
         Selection              : Native
   Configured:
      Stacking                  : Enabled
      Master capable?           : Yes
      Slot number               : 2
      Stack MAC address         : 02:04:96:83:4c:a6
      Stacking protocol         : Standard
      License level restriction : <none>
      Stack Port 1:
         Selection              : Native
      Stack Port 2:
         Selection              : Native

Stacking Node 00:04:96:36:a2:53 information:
   Current:
      Stacking                  : Enabled
      Role                      : Standby
      Priority                  : Automatic
      Slot number               : 3
      Stack state               : Active
      Master capable?           : Yes
      Stacking protocol         : Standard
      License level restriction : <none>
      In active topology?       : Yes
      Factory MAC address       : 00:04:96:36:a2:53
      Stack MAC address         : 02:04:96:83:4c:a6
      Alternate IP address      : <none>
      Alternate gateway         : <none>
      Stack Port 1:
         State                  : Operational
         Blocked?               : No
         Control path active?   : Yes
         Selection              : Native
      Stack Port 2:
         State                  : Operational
         Blocked?               : No
         Control path active?   : Yes
         Selection              : Native
   Configured:
      Stacking                  : Enabled
      Master capable?           : Yes
      Slot number               : 3
      Stack MAC address         : 02:04:96:83:4c:a6
      Stacking protocol         : Standard
      License level restriction : <none>
      Stack Port 1:
         Selection              : Native
      Stack Port 2:
         Selection              : Native

Stacking Node 00:04:96:83:4c:2e information:
   Current:
      Stacking                  : Enabled
      Role                      : Standby
      Priority                  : Automatic
      Slot number               : 4
      Stack state               : Active
      Master capable?           : No
      Stacking protocol         : Standard
      License level restriction : <none>
      In active topology?       : Yes
      Factory MAC address       : 00:04:96:83:4c:2e
      Stack MAC address         : 02:04:96:83:4c:a6
      Alternate IP address      : <none>
      Alternate gateway         : <none>
      Stack Port 1:
         State                  : Operational
         Blocked?               : No
         Control path active?   : Yes
         Selection              : Native
      Stack Port 2:
         State                  : Operational
         Blocked?               : Yes
         Control path active?   : Yes
         Selection              : Native
   Configured:
      Stacking                  : Enabled
      Master capable?           : No
      Slot number               : 4
      Stack MAC address         : 02:04:96:83:4c:a6
      Stacking protocol         : Standard
      License level restriction : <none>
      Stack Port 1:
         Selection              : Native
      Stack Port 2:
         Selection              : Native

Stacking Node 00:04:96:35:cf:25 information:
   Current:
      Stacking                  : Enabled
      Role                      : Standby
      Priority                  : Automatic
      Slot number               : 5
      Stack state               : Active
      Master capable?           : Yes
      Stacking protocol         : Standard
      License level restriction : <none>
      In active topology?       : Yes
      Factory MAC address       : 00:04:96:35:cf:25
      Stack MAC address         : <none>
      Alternate IP address      : <none>
      Alternate gateway         : <none>
      Stack Port 1:
         State                  : Operational
         Blocked?               : Yes
         Control path active?   : Yes
         Selection              : Native
      Stack Port 2:
         State                  : Operational
         Blocked?               : No
         Control path active?   : Yes
         Selection              : Native
   Configured:
      Stacking                  : Enabled
      Master capable?           : Yes
      Slot number               : 5
      Stack MAC address         : <none>
      Stacking protocol         : Standard
      License level restriction : <none>
      Stack Port 1:
         Selection              : Native
      Stack Port 2:
         Selection              : Native
Slot-1 es-vil-vbwc-20.5 # 
Slot-1 es-vil-vbwc-20.5 # 
Slot-1 es-vil-vbwc-20.5 # show ports stack-ports rxerrors no-refresh
Port Rx Error Monitor
Port      Link     Rx      Rx      Rx        Rx      Rx         Rx         Rx   
          State    Crc    Over    Under     Frag    Jabber      Align      Lost 
================================================================================
1:1       A        0       0        0        0        0          0          0
1:2       A       12       0        0        0        0          0          0
2:1       A        0       0        0        0        0          0          0
2:2       A        0       0        0        0        0          0          0
3:1       A        0       0        0        0        0          0          0
3:2       A        0       0        0        0        0          0          0
4:1       A        0       0        0        0        0          0          0
4:2       A        0       0        0        0        0          0          0
5:1       A        0       0        0        0        0          0          0
5:2       A        0       0        0        0        0          0          0
================================================================================
          Link State: A-Active, R-Ready, NP-Port Not Present L-Loopback
Slot-1 es-vil-vbwc-20.6 # show ports stack-ports txerrors no-refresh
Port Tx Error Monitor
Port      Link      Tx          Tx          Tx          Tx       Tx       Tx
          State     Coll        Late coll   Deferred    Errors   Lost     Parity
================================================================================
1:1       A         0            0           0         0         0         0
1:2       A         0            0           0         0         0         0
2:1       A         0            0           0         0         0         0
2:2       A         0            0           0         0         0         0
3:1       A         0            0           0         0         0         0
3:2       A         0            0           0         0         0         0
4:1       A         0            0           0         0         0         0
4:2       A         0            0           0         0         0         0
5:1       A         0            0           0         0         0         0
5:2       A         0            0           0         0         0         0
================================================================================
          > indicates Port Display Name truncated past 8 characters
          Link State: A-Active, R-Ready, NP-Port Not Present L-Loopback
Slot-1 es-vil-vbwc-20.7 # 
Slot-1 es-vil-vbwc-20.7 # show switch | i nt.T
Current Time:     Wed Jan 18 17:47:10 2017
Slot-1 es-vil-vbwc-20.8 #
Photo of Mykhaylo Skrypka

Mykhaylo Skrypka

  • 936 Points 500 badge 2x thumb
Would be nice to have an option attaching .txt files while writing a reply as output takes a lot of screen space! Just a thought 
Photo of Vellachery, Sumeesh

Vellachery, Sumeesh, Employee

  • 3,288 Points 3k badge 2x thumb
Mykhaylo,

Have an eye on the stack-ports rxerrors. As of now i only see CRC errors on stacking port 1:2 as shown below.



Try clearing the counters and monitor the rxerrors on the stacking ports. In case, if you notice them getting frequently.
incremented. Swap the stacking cables and monitor them once again.
Photo of Vellachery, Sumeesh

Vellachery, Sumeesh, Employee

  • 3,288 Points 3k badge 2x thumb
Mykhaylo,

Adding to it, did you get an opportunity to review the logs from Slot-1 at the time of issue occurrence?

Also, ensure that we rule out physical layer related issues (isolating the power cord/power source). 
Photo of Mykhaylo Skrypka

Mykhaylo Skrypka

  • 936 Points 500 badge 2x thumb
Yeah seen these 12 CRC errors and will do as advised. Thanks. The logs are actually already from the slot 1. Because when i am trying to telnet to slot 1 getting expected error: Error: Cannot establish connection to self.
Photo of Aleixo Gomes

Aleixo Gomes, Employee

  • 334 Points 250 badge 2x thumb
show stacking detail , will provide the info on configured stack mac -address
show ports stack-ports rxerrors no-refresh
show ports stack-ports txerrors no-refresh
above two commands will provide if there are any crc errors on stack ports , if they re incrementing then consider , swapping stack cables or reseating stack ports connection.
Photo of Mykhaylo Skrypka

Mykhaylo Skrypka

  • 936 Points 500 badge 2x thumb
Swapping stack cable and reseating stack port (1:2) connection seems to resolved CRC errors 
Photo of Patrick Voss

Patrick Voss, Alum

  • 11,594 Points 10k badge 2x thumb
If you could also provide a "show version" output.
Photo of Mykhaylo Skrypka

Mykhaylo Skrypka

  • 936 Points 500 badge 2x thumb
version 15.3.3.5 v1533b5-patch1-2
Photo of Patrick Voss

Patrick Voss, Alum

  • 11,594 Points 10k badge 2x thumb
Can I get the full "show version" output?
Photo of Mykhaylo Skrypka

Mykhaylo Skrypka

  • 936 Points 500 badge 2x thumb
-> show version        
Slot-1      : 800324-00-10 1xxxN-4xx5 Rev 10.0 BootROM: 2.0.1.7    IMG: 15.3.3.5  
Slot-2      : 800324-00-10 xxxN-40xx4 Rev 10.0 BootROM: 2.0.1.7    IMG: 15.3.3.5  
Slot-3      : 800307-00-01 xxxx-80xx2 Rev 1.0 BootROM: 1.0.5.5    IMG: 15.3.3.5  
Slot-4      : 800324-00-10 xxxxxN-xx919 Rev 10.0 BootROM: 2.0.1.7    IMG: 15.3.3.5  
Slot-5      : 800190-00-07 xxxxx-xx140 Rev 7.0 BootROM: 1.0.5.5    IMG: 15.3.3.5  
(Edited)
Photo of Mykhaylo Skrypka

Mykhaylo Skrypka

  • 936 Points 500 badge 2x thumb
WOW! That is impressive community response :-)

So l need to telnet to slot 1 and get logs from there? show debug system dump and internal-memory. What is the exact command as l can only use show debug system 

Thx all,
Mykhaylo
Photo of Muhammad Tariq

Muhammad Tariq, Employee

  • 220 Points 100 badge 2x thumb
"show debug system-dump slot <lot-number>"
Photo of Muhammad Tariq

Muhammad Tariq, Employee

  • 220 Points 100 badge 2x thumb
"ls internal-memory"
Photo of Vellachery, Sumeesh

Vellachery, Sumeesh, Employee

  • 3,288 Points 3k badge 2x thumb
Mykhaylo,

Yes, telnet to Slot 1 and execute the following commands
- show log
- show debug system-dump
- Check the ls internal-memory from the Master slot as well as from Slot-1.
Photo of Mykhaylo Skrypka

Mykhaylo Skrypka

  • 936 Points 500 badge 2x thumb
Hi,

sorry didn't pay attention it is (ls). Will do
Photo of sukwinder gill

sukwinder gill

  • 110 Points 100 badge 2x thumb
I also have a problem with an unexpected reboot.
X440-G2-24t. 21.1.1.4 Patch 1-5.

All works but then backup switch reboots when one of the stack cables are disconnected.
 
Photo of Aleixo Gomes

Aleixo Gomes, Employee

  • 334 Points 250 badge 2x thumb
is it having any conduit error , than it means it lost communication with master and assumes as a single node in stack and am suspecting its a dual master situation in stack,.
Photo of sukwinder gill

sukwinder gill

  • 110 Points 100 badge 2x thumb
the stack config is Master/Standby:

* Slot-1 Stack.9 # sh stackingStack Topology is a Daisy-Chain
Active Topology is a Daisy-Chain
Node MAC Address    Slot  Stack State  Role     Flags
------------------  ----  -----------  -------  ---
*00:04:96:9b:ba:06  1     Active       Master   CA-
 00:04:96:9b:ba:24  2     Active       Standby  CA-

Please advice
Photo of Hernandez, Joshua

Hernandez, Joshua, Employee

  • 1,564 Points 1k badge 2x thumb
Sukwinder Gill, if the stack is still in a master / standby configuration and daisy chain.  The assumption is that there is only 1 stack cable connecting the nodes.  This means if standby looses connection to its master for any reason (in this case disconnecting the stack cable) it will reboot because it is not master capable and relies on communication to a master.