Extreme Stacking Backup Node Reboot


Colleagues, good afternoon . Prompt
please, there is such problem - if one of switches in a stack (master) is overloaded that the second node backup is overloaded after it through
10-15 seconds. And this in my opinion is some nonsense. Have you encountered such behavior of the stack and can it be defeated?



On the X670G2-48x and G670G2-72x stack, the license level on both switches is the same, both have master capability enabled.

Slot-2 Stack.12 # show stacking detail
Stacking Node 00:04:96:a0:7f:9d information:
Current:
Stacking : Enabled
Role : Master
Priority : 100
Slot number : 2
Stack state : Active
Master capable? : Yes
Stacking protocol : Enhanced
License level restriction :
In active topology? : Yes
Factory MAC address : 00:04:96:a0:7f:9d
Stack MAC address : 02:04:96:98:79:f9
Alternate IP address :
Alternate gateway :
Stack Port 1:
State : Operational
Blocked? : Yes
Control path active? : Yes
Selection : Alternate (47)
Stack Port 2:
State : Operational
Blocked? : No
Control path active? : Yes
Selection : Alternate (48)
Configured:
Stacking : Enabled
Master capable? : Yes
Slot number : 2
Stack MAC address : 02:04:96:98:79:f9
Stacking protocol : Enhanced
License level restriction :
Stack Port 1:
Selection : Alternate (47)
Stack Port 2:
Selection : Alternate (48)



Stacking Node 00:04:96:98:79:f9 information:
Current:
Stacking : Enabled
Role : Backup
Priority : 50
Slot number : 1
Stack state : Active
Master capable? : Yes
Stacking protocol : Enhanced
License level restriction :
In active topology? : Yes
Factory MAC address : 00:04:96:98:79:f9
Stack MAC address : 02:04:96:98:79:f9
Alternate IP address :
Alternate gateway :
Stack Port 1:
State : Operational
Blocked? : Yes
Control path active? : Yes
Selection : Alternate (71)
Stack Port 2:
State : Operational
Blocked? : No
Control path active? : Yes
Selection : Alternate (72)
Configured:
Stacking : Enabled
Master capable? : Yes
Slot number : 1
Stack MAC address : 02:04:96:98:79:f9
Stacking protocol : Enhanced
License level restriction :
Stack Port 1:
Selection : Alternate (71)
Stack Port 2:
Selection : Alternate (72)



Slot-2 Stack.15 # show stacking configuration
Stack MAC in use: 02:04:96:98:79:f9
Node Slot Alternate Alternate
MAC Address Cfg Cur Prio Mgmt IP / Mask Gateway Flags Lic
------------------ --- --- ---- ------------------ --------------- --------- ---
*00:04:96:a0:7f:9d 2 2 100 CcEeMm-Nn --
00:04:96:98:79:f9 1 1 50 CcEeMm-Nn --
* - Indicates this node
Flags: (C) master-Capable in use, (c) master-capable is configured,
(E) Stacking is currently Enabled, (e) Stacking is configured Enabled,
(M) Stack MAC in use, (m) Stack MACs configured and in use are the same,
(i) Stack MACs configured and in use are not the same or unknown,
(N) Enhanced protocol is in use, (n) Enhanced protocol is configured,
(-) Not in use or not configured
License level restrictions: (C) Core, (A) Advanced edge, or (E) Edge in use,
(c) Core, (a) Advanced edge, or (e) Edge configured,
(-) Not in use or not configured

10 replies

Userlevel 2
Why leads you to think that stacking of these two switches is the source of your problem? What do you mean by the stack members being overloaded?
I think the firmware on both switch are not equal. Use the command show switch and show version image to check this.
Userlevel 6
Hello!

1. What EXOS version?
2. Is there some messages appear in logs?
3. What sys-recovery level configured?
4. Is both nodes powered by same UPS? Is backup node rebooting when power off manually Master node?
5. As I understand Master node was changed from 00:04:96:98:79:f9 to 00:04:96:a0:7f:9d?

Thank you!
Alexandr P wrote:

Hello!

1. What EXOS version?
2. Is there some messages appear in logs?
3. What sys-recovery level configured?
4. Is both nodes powered by same UPS? Is backup node rebooting when power off manually Master node?
5. As I understand Master node was changed from 00:04:96:98:79:f9 to 00:04:96:a0:7f:9d?

Thank you!

Slot-2 Stack.3 # show switch

SysName: Stack
SysLocation:
SysContact: support@extremenetworks.com, +1 888 257 3000
System MAC: 02:04:96:98:79:F9
System Type: X670G2-48x-4q (Stack)

SysHealth check: Enabled (Normal)
Recovery Mode: All
System Watchdog: Enabled

Current Time: Wed Jul 25 23:24:37 2018
Timezone: [Auto DST Disabled] GMT Offset: 0 minutes, name is UTC.
Boot Time: Wed Jul 25 23:15:49 2018
Boot Count: 42
Next Reboot: None scheduled
System UpTime: 8 minutes 47 seconds

Slot: Slot-2 * Slot-1
------------------------ ------------------------
Current State: MASTER BACKUP (In Sync)

Image Selected: secondary secondary
Image Booted: secondary secondary
Primary ver: 21.1.3.7 15.6.0.15
Secondary ver: 16.2.4.5 16.2.4.5
patch1-8 patch1-8

Config Selected: primary.cfg
Config Booted: primary.cfg

primary.cfg Created by ExtremeXOS version 16.2.4.5
721253 bytes saved on Wed Jul 25 23:14:56 2018

>> As I understand Master node was changed from 00:04:96:98:79:f9 to 00:04:96:a0:7f:9d?

Yes
Hi Robert,

1. I do not think that the problem is stacking different switches

2. It means that I disconnect the power supply from the switch which in this topology is a master
Userlevel 2
Vladimir Monomah wrote:

Hi Robert,

1. I do not think that the problem is stacking different switches

2. It means that I disconnect the power supply from the switch which in this topology is a master

I am sorry but I am not understanding what the problem is you are trying to solve. Are you trying to force the backup master to take over by powering off the master slot?
Slot-2 Stack.3 # show switch

SysName: Stack
SysLocation:
SysContact: support@extremenetworks.com, +1 888 257 3000
System MAC: 02:04:96:98:79:F9
System Type: X670G2-48x-4q (Stack)

SysHealth check: Enabled (Normal)
Recovery Mode: All
System Watchdog: Enabled

Current Time: Wed Jul 25 23:24:37 2018
Timezone: [Auto DST Disabled] GMT Offset: 0 minutes, name is UTC.
Boot Time: Wed Jul 25 23:15:49 2018
Boot Count: 42
Next Reboot: None scheduled
System UpTime: 8 minutes 47 seconds

Slot: Slot-2 * Slot-1
------------------------ ------------------------
Current State: MASTER BACKUP (In Sync)

Image Selected: secondary secondary
Image Booted: secondary secondary
Primary ver: 21.1.3.7 15.6.0.15
Secondary ver: 16.2.4.5 16.2.4.5
patch1-8 patch1-8

Config Selected: primary.cfg
Config Booted: primary.cfg

primary.cfg Created by ExtremeXOS version 16.2.4.5
721253 bytes saved on Wed Jul 25 23:14:56 2018

>> As I understand Master node was changed from 00:04:96:98:79:f9 to 00:04:96:a0:7f:9d?

Yes
Userlevel 3
please write a scenario of what is happening. as it stands it appears you are unplugging the master, and testing failover, but when you do this it reboots the standby node? is this correct?

log in to the secondary switch and view the log messages. see if you can post those here, we would need the switch to tell us why it felt the need to reboot.

I do know that it is common issue for the whole stack to reboot if 2 switches are in state Master ( the log will state "dual Master" and reboot the whole stack.

I have seen this when the master stacking modules fail. the switch stays running, but the links between switches flicker/ bounce sending the stack in to dual master... the whole stack fails and reboots.

so I would check the cables you have between the switches to see if either of them are having issues.

you can also look as stacking port RX errors
Colleagues, thank you.

The problem is solved by "config stacking redundancy maximal" on EACH chassis in the stack.

P.S.

The master capability was enabled on each chassis in the stack.

the level of the core license on both chassis in the stack.

All this did not help the backup chassis rebooting with the master , when the power was turned off on

the chassis master in the stack .
Userlevel 6
Vladimir Monomah wrote:

Colleagues, thank you.

The problem is solved by "config stacking redundancy maximal" on EACH chassis in the stack.

P.S.

The master capability was enabled on each chassis in the stack.

the level of the core license on both chassis in the stack.

All this did not help the backup chassis rebooting with the master , when the power was turned off on

the chassis master in the stack .

Hello, Vladimir!

It's little bit strange, because command "config stacking redundancy maximal" for setting all stack-nodes as master-capable:
- default value have to maximal
- in your outputs both nodes have master-capability enabled:
"Master capable? : Yes"
"MAC Address Cfg Cur Prio Mgmt IP / Mask Gateway Flags Lic
------------------ --- --- ---- ------------------ --------------- --------- ---
*00:04:96:a0:7f:9d 2 2 100 CcEeMm-Nn --
00:04:96:98:79:f9 1 1 50 CcEeMm-Nn --"

Thank you!
Vladimir Monomah wrote:

Colleagues, thank you.

The problem is solved by "config stacking redundancy maximal" on EACH chassis in the stack.

P.S.

The master capability was enabled on each chassis in the stack.

the level of the core license on both chassis in the stack.

All this did not help the backup chassis rebooting with the master , when the power was turned off on

the chassis master in the stack .

And yet it is so

Reply