Extreme Stacking Backup Node Reboot

  • 0
  • 1
  • Problem
  • Updated 3 weeks ago
  • Solved
Colleagues, good afternoon . Prompt
please, there is such problem - if one of switches in a stack (master) is overloaded that the second node backup is overloaded after it through
10-15 seconds. And this in my opinion is some nonsense. Have you encountered such behavior of the stack and can it be defeated?

 

On the X670G2-48x and G670G2-72x stack, the license level on both switches is the same, both have master capability enabled.

Slot-2 Stack.12 # show stacking detail
Stacking Node 00:04:96:a0:7f:9d information:
   Current:
      Stacking                  : Enabled
      Role                      : Master
      Priority                  : 100
      Slot number               : 2
      Stack state               : Active
      Master capable?           : Yes
      Stacking protocol         : Enhanced
      License level restriction : <none>
      In active topology?       : Yes
      Factory MAC address       : 00:04:96:a0:7f:9d
      Stack MAC address         : 02:04:96:98:79:f9
      Alternate IP address      : <none>
      Alternate gateway         : <none>
      Stack Port 1:
         State                  : Operational
         Blocked?               : Yes
         Control path active?   : Yes
         Selection              : Alternate (47)
      Stack Port 2:
         State                  : Operational
         Blocked?               : No
         Control path active?   : Yes
         Selection              : Alternate (48)
   Configured:
      Stacking                  : Enabled
      Master capable?           : Yes
      Slot number               : 2
      Stack MAC address         : 02:04:96:98:79:f9
      Stacking protocol         : Enhanced
      License level restriction : <none>
      Stack Port 1:
         Selection              : Alternate (47)
      Stack Port 2:
         Selection              : Alternate (48)

 

Stacking Node 00:04:96:98:79:f9 information:
   Current:
      Stacking                  : Enabled
      Role                      : Backup
      Priority                  : 50
      Slot number               : 1
      Stack state               : Active
      Master capable?           : Yes
      Stacking protocol         : Enhanced
      License level restriction : <none>
      In active topology?       : Yes
      Factory MAC address       : 00:04:96:98:79:f9
      Stack MAC address         : 02:04:96:98:79:f9
      Alternate IP address      : <none>
      Alternate gateway         : <none>
      Stack Port 1:
         State                  : Operational
         Blocked?               : Yes
         Control path active?   : Yes
         Selection              : Alternate (71)
      Stack Port 2:
         State                  : Operational
         Blocked?               : No
         Control path active?   : Yes
         Selection              : Alternate (72)
   Configured:
      Stacking                  : Enabled
      Master capable?           : Yes
      Slot number               : 1
      Stack MAC address         : 02:04:96:98:79:f9
      Stacking protocol         : Enhanced
      License level restriction : <none>
      Stack Port 1:
         Selection              : Alternate (71)
      Stack Port 2:
         Selection              : Alternate (72)

 

Slot-2 Stack.15 # show stacking configuration
Stack MAC in use: 02:04:96:98:79:f9
Node               Slot         Alternate          Alternate      
MAC Address        Cfg Cur Prio Mgmt IP / Mask     Gateway         Flags     Lic
------------------ --- --- ---- ------------------ --------------- --------- ---
*00:04:96:a0:7f:9d 2   2   100  <none>             <none>          CcEeMm-Nn --
 00:04:96:98:79:f9 1   1   50   <none>             <none>          CcEeMm-Nn --
* - Indicates this node
Flags:  (C) master-Capable in use, (c) master-capable is configured,
        (E) Stacking is currently Enabled, (e) Stacking is configured Enabled,
        (M) Stack MAC in use, (m) Stack MACs configured and in use are the same,
        (i) Stack MACs configured and in use are not the same or unknown,
        (N) Enhanced protocol is in use, (n) Enhanced protocol is configured,
        (-) Not in use or not configured
License level restrictions: (C) Core, (A) Advanced edge, or (E) Edge in use,
        (c) Core, (a) Advanced edge, or (e) Edge configured,
        (-) Not in use or not configured

 

Photo of Vladimir Monomah

Vladimir Monomah

  • 130 Points 100 badge 2x thumb

Posted 3 weeks ago

  • 0
  • 1
Photo of Robert Cummins

Robert Cummins

  • 452 Points 250 badge 2x thumb
Why leads you to think that stacking of these two switches is the source of your problem?   What do you mean by the stack members being overloaded?   
Photo of renevanegdom

renevanegdom

  • 70 Points
I think the firmware on both switch are not equal. Use the command show switch and show version image to check this.
Photo of Alexandr P

Alexandr P, Embassador

  • 12,192 Points 10k badge 2x thumb
Hello!

1. What EXOS version?
2. Is there some messages appear in logs?
3. What sys-recovery level configured?
4. Is both nodes powered by same UPS? Is backup node rebooting when power off manually Master node?
5. As I understand Master node was changed from 00:04:96:98:79:f9 to 00:04:96:a0:7f:9d?

Thank you!
Photo of Vladimir Monomah

Vladimir Monomah

  • 130 Points 100 badge 2x thumb
Hi Robert,

1. I do not think that the problem is stacking different switches

2. It means that I disconnect the power supply from the switch which in this topology is a master
Photo of Robert Cummins

Robert Cummins

  • 440 Points 250 badge 2x thumb
I am sorry but I am not understanding what the problem is you are trying to solve.   Are you trying to force the backup master to take over by powering off the master slot?
Photo of Vladimir Monomah

Vladimir Monomah

  • 130 Points 100 badge 2x thumb
Slot-2 Stack.3 # show switch

SysName:          Stack
SysLocation:     
SysContact:       support@extremenetworks.com, +1 888 257 3000
System MAC:       02:04:96:98:79:F9
System Type:      X670G2-48x-4q (Stack)

SysHealth check:  Enabled (Normal)
Recovery Mode:    All
System Watchdog:  Enabled

Current Time:     Wed Jul 25 23:24:37 2018
Timezone:         [Auto DST Disabled] GMT Offset: 0 minutes, name is UTC.
Boot Time:        Wed Jul 25 23:15:49 2018
Boot Count:       42
Next Reboot:      None scheduled
System UpTime:    8 minutes 47 seconds

Slot:             Slot-2 *                     Slot-1                 
                  ------------------------     ------------------------
Current State:    MASTER                       BACKUP (In Sync)       

Image Selected:   secondary                    secondary              
Image Booted:     secondary                    secondary              
Primary ver:      21.1.3.7                     15.6.0.15              
Secondary ver:    16.2.4.5                     16.2.4.5               
                  patch1-8                     patch1-8

Config Selected:  primary.cfg                                         
Config Booted:    primary.cfg                                         

primary.cfg       Created by ExtremeXOS version 16.2.4.5
                  721253 bytes saved on Wed Jul 25 23:14:56 2018

>>  As I understand Master node was changed from 00:04:96:98:79:f9 to 00:04:96:a0:7f:9d?

Yes
Photo of David Rahn

David Rahn

  • 974 Points 500 badge 2x thumb

please write a scenario of what is happening. as it stands it appears you are unplugging the master, and testing failover, but when you do this it reboots the standby node? is this correct?

log in to the secondary switch and view the log messages. see if you can post those here, we would need the switch to tell us why it felt the need to reboot.


I do know that it is common issue for the whole stack to reboot if 2 switches are in state Master ( the log will state "dual Master" and reboot the whole stack.

I have seen this when the master stacking modules fail. the switch stays running, but the links between switches flicker/ bounce sending the stack in to dual master... the whole stack fails and reboots.

so I would check the cables you have between the switches to see if either of them are having issues.


you can also look as stacking port RX errors



Photo of Vladimir Monomah

Vladimir Monomah

  • 130 Points 100 badge 2x thumb
Colleagues, thank you.

The problem is solved by "config stacking redundancy maximal"  on EACH chassis in the stack.

P.S.

The master capability was enabled on each chassis in the stack.

the level of the core license on both chassis in the stack.

All this did not help the backup chassis rebooting with the master , when the power was turned off on

the chassis master in the stack .
Photo of Alexandr P

Alexandr P, Embassador

  • 12,192 Points 10k badge 2x thumb
Hello, Vladimir!

It's little bit strange, because command "config stacking redundancy maximal" for setting all stack-nodes as master-capable:
- default value have to maximal
- in your outputs both nodes have master-capability enabled:
"Master capable?           : Yes"
"MAC Address        Cfg Cur Prio Mgmt IP / Mask     Gateway         Flags     Lic
------------------ --- --- ---- ------------------ --------------- --------- ---
*00:04:96:a0:7f:9d 2   2   100  <none>             <none>          CcEeMm-Nn --
 00:04:96:98:79:f9 1   1   50   <none>             <none>          CcEeMm-Nn --"

Thank you!
Photo of Vladimir Monomah

Vladimir Monomah

  • 130 Points 100 badge 2x thumb
And yet it is so