X440 Stack occasionally break and end up with two masters

  • 0
  • 1
  • Problem
  • Updated 2 years ago
  • Solved
I have several stacks composed of 3x X440. All of them occasionally have this issue where the master fail and both of the other switches take the master role... upon reboot everythings goes back to normal. Since this is happeneing on multiple istances of the same installation I was thinking about a firmware issue.... does this sound familiar to anyone ?

Firmware version is 15.2.1.5 v1521b5

the logs show this:
05/18/2016 08:07:20.89 <Noti:DM.Notice> Slot-3: Slot-2 being Powered ON
05/18/2016 08:07:20.89 <Noti:DM.Notice> Slot-3: Slot-1 being Powered ON
05/18/2016 08:07:00.86 <Info:HAL.Card.Info> Slot-3: Module in Slot-2 is inserted
05/18/2016 08:07:00.86 <Info:HAL.Card.Info> Slot-3: Module in Slot-1 is inserted
05/18/2016 08:06:48.48 <Info:HAL.Port.Info> Slot-3: Stacking port 2:2 link up at 10Gbps.
05/18/2016 08:06:48.48 <Info:HAL.Port.Info> Slot-3: Stacking port 2:1 link up at 10Gbps.
05/18/2016 08:06:39.29 <Info:HAL.Port.Info> Slot-3: Stacking port 1:2 link up at 10Gbps.
05/18/2016 08:06:39.29 <Info:HAL.Port.Info> Slot-3: Stacking port 1:1 link up at 10Gbps.
05/18/2016 08:06:32.29 <Info:HAL.Port.Info> Slot-3: Stacking port 1:2 link down.
05/18/2016 08:06:32.29 <Info:HAL.Port.Info> Slot-3: Stacking port 1:1 link down.
05/18/2016 08:06:30.86 <Info:HAL.Port.Info> Slot-3: Stacking port 3:2 link up at 10Gbps.
05/18/2016 08:06:30.86 <Info:HAL.Port.Info> Slot-3: Stacking port 3:1 link up at 10Gbps.
05/18/2016 08:06:30.86 <Info:HAL.Port.Info> Slot-3: Stacking port 1:2 link up at 10Gbps.
05/18/2016 08:06:30.86 <Info:HAL.Port.Info> Slot-3: Stacking port 1:1 link up at 10Gbps.
05/18/2016 08:05:55.58 <Warn:IPMC.Warning> Slot-3: Our own packet received. Mac address of the received packet is [2:4:96:90:0:e],there could be physical loop in the network
05/18/2016 08:05:06.11 <Warn:IPMC.Warning> Slot-3: Previous message repeated 3 additional times in the last 33 second(s)
05/18/2016 08:05:01.71 <Info:vlan.msgs.portLinkStateUp> Slot-3: Port 3:11 link UP at speed 100 Mbps and full-duplex
05/18/2016 08:05:00.03 <Info:vlan.msgs.portLinkStateDown> Slot-3: Port 3:11 link down
05/18/2016 08:04:59.43 <Info:HAL.Sys.Info> Slot-3: Internal Power Supply in slot 2 is disconnected.
05/18/2016 08:04:59.43 <Info:HAL.Sys.Info> Slot-3: Internal Power Supply in slot 1 is disconnected.
 05/18/2016 08:04:59.40 <Info:HAL.Card.Info> Slot-3: Slot-2 down, resetting all TCP connections to it
05/18/2016 08:04:59.40 <Info:HAL.Card.Info> Slot-3: Module in Slot-2 is removed
05/18/2016 08:04:59.38 <Info:HAL.Card.Info> Slot-3: Slot-1 down, resetting all TCP connections to it
05/18/2016 08:04:59.38 <Info:HAL.Card.Info> Slot-3: Module in Slot-1 is removed
05/18/2016 08:04:59.37 <Info:HAL.Port.Info> Slot-3: Stacking port 3:1 link down.
05/18/2016 08:04:59.37 <Info:HAL.Port.Info> Slot-3: Stacking port 2:2 link down.
05/18/2016 08:04:59.37 <Info:HAL.Port.Info> Slot-3: Stacking port 2:1 link down.
05/18/2016 08:04:59.30 <Info:HAL.Card.Info> Slot-3: Slot-2 down, resetting all TCP connections to it
05/18/2016 08:04:59.30 <Info:HAL.Card.Info> Slot-3: Module in Slot-2 is removed
05/18/2016 08:04:59.30 <Info:HAL.Port.Info> Slot-3: Stacking port 1:2 link down.
05/18/2016 08:04:59.30 <Info:HAL.Port.Info> Slot-3: Stacking port 1:1 link down.
05/18/2016 08:04:59.27 <Info:HAL.Card.Info> Slot-3: Slot-1 down, resetting all TCP connections to it
05/18/2016 08:04:59.26 <Info:HAL.Card.Info> Slot-3: Module in Slot-1 is removed
05/18/2016 08:04:57.26 <Info:HAL.Port.Info> Slot-3: Stacking port 3:2 link down.
05/18/2016 08:04:50.18 <Info:vlan.msgs.portLinkStateDown> Slot-3: Port 2:1 link down
05/18/2016 08:04:48.33 <Info:vlan.msgs.portLinkStateDown> Slot-3: Port 1:48 link down
05/18/2016 08:04:48.33 <Info:vlan.msgs.portLinkStateDown> Slot-3: Port 1:46 link down
05/18/2016 08:04:48.33 <Info:vlan.msgs.portLinkStateDown> Slot-3: Port 1:43 link down
05/18/2016 08:04:48.32 <Info:vlan.msgs.portLinkStateDown> Slot-3: Port 1:41 link down
05/18/2016 08:04:48.31 <Info:vlan.msgs.portLinkStateDown> Slot-3: Port 1:39 link down
05/18/2016 08:04:48.31 <Info:vlan.msgs.portLinkStateDown> Slot-3: Port 1:38 link down
05/18/2016 08:04:48.30 <Info:vlan.msgs.portLinkStateDown> Slot-3: Port 1:37 link down
05/18/2016 08:04:48.24 <Info:vlan.msgs.portLinkStateDown> Slot-3: Port 1:35 link down
05/18/2016 08:04:48.24 <Info:vlan.msgs.portLinkStateDown> Slot-3: Port 1:34 link down
05/18/2016 08:04:48.24 <Info:vlan.msgs.portLinkStateDown> Slot-3: Port 1:33 link down
 05/18/2016 08:04:48.24 <Info:vlan.msgs.portLinkStateDown> Slot-3: Port 1:32 link down
05/18/2016 08:04:48.23 <Info:vlan.msgs.portLinkStateDown> Slot-3: Port 1:31 link down
05/18/2016 08:04:48.23 <Info:vlan.msgs.portLinkStateDown> Slot-3: Port 1:30 link down
05/18/2016 08:04:48.23 <Info:vlan.msgs.portLinkStateDown> Slot-3: Port 1:29 link down
05/18/2016 08:04:48.21 <Info:vlan.msgs.portLinkStateDown> Slot-3: Port 1:28 link down
05/18/2016 08:04:48.07 <Info:vlan.msgs.portLinkStateDown> Slot-3: Port 1:27 link down
05/18/2016 08:04:48.05 <Info:vlan.msgs.portLinkStateDown> Slot-3: Port 1:26 link down
05/18/2016 08:04:48.05 <Info:vlan.msgs.portLinkStateDown> Slot-3: Port 1:23 link down
05/18/2016 08:04:48.05 <Info:vlan.msgs.portLinkStateDown> Slot-3: Port 1:21 link down
05/18/2016 08:04:48.05 <Info:vlan.msgs.portLinkStateDown> Slot-3: Port 1:20 link down
05/18/2016 08:04:48.05 <Info:vlan.msgs.portLinkStateDown> Slot-3: Port 1:18 link down
05/18/2016 08:04:48.05 <Info:vlan.msgs.portLinkStateDown> Slot-3: Port 1:16 link down
05/18/2016 08:04:48.04 <Info:vlan.msgs.portLinkStateDown> Slot-3: Port 1:15 link down
05/18/2016 08:04:48.04 <Info:vlan.msgs.portLinkStateDown> Slot-3: Port 1:14 link down
05/18/2016 08:04:48.00 <Info:vlan.msgs.portLinkStateDown> Slot-3: Port 1:13 link down
05/18/2016 08:04:48.00 <Info:vlan.msgs.portLinkStateDown> Slot-3: Port 1:12 link down
05/18/2016 08:04:48.00 <Info:vlan.msgs.portLinkStateDown> Slot-3: Port 1:11 link down
05/18/2016 08:04:48.00 <Info:vlan.msgs.portLinkStateDown> Slot-3: Port 1:10 link down
05/18/2016 08:04:48.00 <Info:vlan.msgs.portLinkStateDown> Slot-3: Port 1:9 link down
05/18/2016 08:04:48.00 <Info:vlan.msgs.portLinkStateDown> Slot-3: Port 1:8 link down
05/18/2016 08:04:47.99 <Info:vlan.msgs.portLinkStateDown> Slot-3: Port 1:7 link down
05/18/2016 08:04:47.99 <Info:vlan.msgs.portLinkStateDown> Slot-3: Port 1:6 link down
05/18/2016 08:04:47.99 <Info:vlan.msgs.portLinkStateDown> Slot-3: Port 1:4 link down
05/18/2016 08:04:47.99 <Info:vlan.msgs.portLinkStateDown> Slot-3: Port 1:3 link down
05/18/2016 08:04:47.99 <Info:vlan.msgs.portLinkStateDown> Slot-3: Port 1:2 link down
05/18/2016 08:04:47.98 <Info:vlan.msgs.portLinkStateDown> Slot-3: Port 1:1 link down
05/18/2016 08:04:47.98 <Warn:DM.Warning> Slot-3: Slot-1 FAILED (1) Dual Master
05/18/2016 08:04:47.98 <Warn:DM.Warning> Slot-3: BACKUP NODE (Slot-1) DOWN
05/18/2016 08:04:47.98 <Crit:NM.NodeStateFail> Slot-3: Slot-1 has failed for the reason of "Dual Master".
05/18/2016 08:04:46.57 <Warn:IPMC.Warning> Slot-3: Our own packet received. Mac address of the received packet is [2:4:96:90:0:e],there could be physical loop in the network
05/18/2016 08:04:44.52 <Warn:IPMC.Warning> Slot-3: Our own packet received. Mac address of the received packet is [2:4:96:90:0:e],there could be physical loop in the network
05/18/2016 08:04:43.70 <Info:vlan.msgs.portLinkStateUp> Slot-3: Port 3:24 link UP at speed 100 Mbps and full-duplex
05/18/2016 08:04:42.53 <Erro:cm.sys.msgDrop> Slot-2: Dropped CM_MSG_CHKP_STANDBY_BANNER_FROM_CFG: Length 16 Peer 63 (primary)
05/18/2016 08:04:42.53 <Erro:cm.sys.msgDrop> Slot-2: Dropped CM_MSG_CHKP_STANDBY_BANNER_ACK: Length 14 Peer 63 (primary)
05/18/2016 08:04:42.53 <Erro:cm.sys.msgDrop> Slot-2: Dropped CM_MSG_CHKP_STANDBY_BANNER: Length 14 Peer 63 (primary)
05/18/2016 08:04:42.53 <Erro:cm.sys.msgDrop> Slot-2: Dropped CM_MSG_CONFIG_COREDUMP_STANDBY: Length 16 Peer 63 (primary)
05/18/2016 08:04:42.53 <Erro:cm.sys.msgDrop> Slot-2: Dropped CM_MSG_CHKP_STANDBY_BANNER_FROM_CFG: Length 16 Peer 63 (primary)
05/18/2016 08:04:42.53 <Erro:cm.sys.msgDrop> Slot-2: Dropped CM_MSG_CHKP_STANDBY_BANNER_ACK: Length 14 Peer 63 (primary)
05/18/2016 08:04:42.53 <Erro:cm.sys.msgDrop> Slot-2: Dropped CM_MSG_CHKP_STANDBY_BANNER: Length 14 Peer 63 (primary)
05/18/2016 08:04:42.53 <Erro:cm.sys.msgDrop> Slot-2: Dropped CM_MSG_CONFIG_COREDUMP_STANDBY: Length 16 Peer 63 (primary)
05/18/2016 08:04:42.49 <Erro:cm.sys.msgDrop> Slot-2: Dropped CM_MSG_CONFIG_STANDBY: Length 164 Peer 63 (primary)
05/18/2016 08:04:42.49 <Erro:cm.sys.msgDrop> Slot-2: Dropped CM_MSG_CONFIG_STANDBY: Length 164 Peer 63 (primary)
05/18/2016 08:04:42.40 <Warn:IPMC.Warning> Slot-3: Our own packet received. Mac address of the received packet is [2:4:96:90:0:e],there could be physical loop in the network
05/18/2016 08:04:41.82 <Warn:IPMC.Warning> Slot-3: Previous message repeated 34 additional times in the last 1 second(s)
05/18/2016 08:04:41.82 <Info:vlan.msgs.portLinkStateUp> Slot-3: Port 2:1 link UP at speed 100 Mbps and full-duplex
05/18/2016 08:04:41.42 <Warn:DM.Warning> Slot-3: BACKUP NODE (Slot-1) DOWN
05/18/2016 08:04:41.42 <Warn:DM.Warning> Slot-3: BACKUP is NOT in SYNC
05/18/2016 08:04:41.38 <Warn:DM.Warning> Slot-1: PRIMARY NODE (Slot-3) DOWN
05/18/2016 08:04:41.00 <Info:vlan.msgs.portLinkStateDown> Slot-3: Port 3:24 link down
05/18/2016 08:04:33.05 <Warn:IPMC.Warning> Slot-3: Our own packet received. Mac address of the received packet is [2:4:96:90:0:e],there could be physical loop in the network
05/18/2016 08:04:30.99 <Warn:IPMC.Warning> Slot-3: Previous message repeated 4 additional times in the last 1 second(s)
05/18/2016 08:04:29.69 <Info:vlan.msgs.portLinkStateUp> Slot-3: Port 1:21 link UP at speed 100 Mbps and full-duplex
05/18/2016 08:04:27.63 <Info:vlan.msgs.portLinkStateDown> Slot-3: Port 1:21 link down
05/18/2016 08:04:25.05 <Info:vlan.msgs.portLinkStateUp> Slot-3: Port 3:6 link UP at speed 100 Mbps and full-duplex
05/18/2016 08:04:21.95 <Info:vlan.msgs.portLinkStateDown> Slot-3: Port 3:6 link down
05/18/2016 08:04:20.84 <Info:vlan.msgs.portLinkStateUp> Slot-3: Port 3:8 link UP at speed 100 Mbps and full-duplex
05/18/2016 08:04:17.41 <Info:vlan.msgs.portLinkStateDown> Slot-3: Port 3:8 link down
05/18/2016 08:04:06.31 <Warn:IPMC.Warning> Slot-3: Our own packet received. Mac address of the received packet is [2:4:96:90:0:e],there could be physical loop in the network
05/18/2016 08:04:05.79 <Warn:DM.Warning> Slot-1: Slot-2 FAILED (1) Error on Slot-2
Photo of Omar Valente

Omar Valente

  • 152 Points 100 badge 2x thumb

Posted 2 years ago

  • 0
  • 1
Photo of jackmikel

jackmikel

  • 310 Points 250 badge 2x thumb

Maybe you should use the recommended version for the x440. Follow the link

 http://documentation.extremenetworks.com/hw_sw_compatibility/HardwareSoftwareCompatibility/r_exos-re...  It must be Version 15.6.4.2-patch1-6.xos

Check if there are some rx or tx errors on the uplinks:

sh port stack-ports rx

sh port stack-ports tx

Regards, jackmikel


Photo of Omar Valente

Omar Valente

  • 152 Points 100 badge 2x thumb
Thank you very much, I'll check it out
Photo of Ron Huygens

Ron Huygens, Employee

  • 2,878 Points 2k badge 2x thumb
There is a known issue in your EXOS version:
See the following article
https://gtacknowledge.extremenetworks.com/articles/Solution/X440-X460-dual-stack-master