Switch Hang down - All port are disable

  • 0
  • 2
  • Question
  • Updated 4 years ago
  • Answered
Hello,

The company I'm using System Switch Extreme stacking connection and are very inadequate to meet the situation, hope you help:
- There are a number of switches are used, it is normal phenomenon all LAN ports of the switch down it all, then just switch the power supply for the morning newspaper still in power lights. To temporarily fix the power to pull the plug and then everything ok. Sometimes the situation is so.
(Who happens to Switch slot 2)
Check the log around the time of this phenomenon is as follows:

01/23/2015 13: 43: 04.83 <Warn: Kern.IPv4Mc.Warning> Slot-2: IPv4 multicast entry not added. Hardware Table L3 full. (Logged nhất at once per hour.)
01/23/2015 13: 42: 58.79 <Info: vlan.msgs.portLinkStateUp> Slot-1: 2:14 Port 10 Mbps link speed and UP at full-duplex
01/23/2015 13: 42: 56.24 <Info: vlan.msgs.portLinkStateDown> Slot-1: Port link down 2:14
01/23/2015 13: 42: 42.50 <Info: vlan.msgs.portLinkStateUp> Slot-1: Port 2:22 100 Mbps link speed and UP at full-duplex
01/23/2015 13: 42: 39.89 <Info: vlan.msgs.portLinkStateUp> Slot-1: Port 2:11 at speed 1 Gbps link UP and full-duplex
Press <SPACE> to continue or <Q> to quit: 23/01/2015 13: 42: 39.52 <Info: vlan.msgs.portLinkStateUp> Slot-1: Port 2:42 at speed 1 Gbps link UP and full-duplex
.....
01/23/2015 13: 42: 26.97 <Info: HAL.IPv4ACL.Info> Slot-1: Done synching ACLs to Slot-2
Press <SPACE> to continue or <Q> to quit: 23/01/2015 13: 42: 25.77 <Info: HAL.IPv4ACL.Info> Slot-1: Synching ACLs to Slot-2
01/23/2015 13: 42: 03.88 <Noti: DM.Notice> Slot-1: Slot-2 Powered being ON
01/23/2015 13: 42: 00.88 <Info: HAL.Card.Info> Slot-1: Module in Slot-2 is inserted
01/23/2015 13: 42: 00:26 <Info: HAL.Port.Info> Slot-1: Stacking port 2: 2 link up at 10Gbps.
01/23/2015 13: 42: 00:26 <Info: HAL.Port.Info> Slot-1: Stacking port 2: 1 link up at 10Gbps.
01/23/2015 13: 41: 37.34 <Info: HAL.Port.Info> Slot-1: Stacking port 2: 2 link down.
01/23/2015 13: 41: 37.34 <Info: HAL.Port.Info> Slot-1: Stacking port 2: 1 link down.
...
01/23/2015 13: 39: 58.85 <Info: HAL.Card.Info> Slot-1: Slot-2 down, resetting all TCP connections to it
01/23/2015 13: 39: 58.85 <Info: HAL.Card.Info> Slot-1: Module in Slot-2 is removed
01/23/2015 13: 39: 58.06 <Info: HAL.Port.Info> Slot-1: Stacking port 3: 1 link down.
01/23/2015 13: 39: 58.05 <Info: HAL.Card.Info> Slot-1: Slot-2 down, resetting all TCP connections to it
01/23/2015 13: 39: 58.05 <Info: HAL.Card.Info> Slot-1: Module in Slot-2 is removed
01/23/2015 13: 39: 58.05 <Info: HAL.Port.Info> Slot-1: Stacking port 1: 2 link down.
01/23/2015 13: 39: 51.88 <Info: HAL.Card.Info> Slot-1: Module in Slot-2 is inserted
01/23/2015 13: 39: 51.44 <Info: HAL.Port.Info> Slot-1: Stacking port 2: 2 link up at 10Gbps.
01/23/2015 13: 39: 51.44 <Info: HAL.Port.Info> Slot-1: Stacking port 2: 1 link up at 10Gbps.
01/23/2015 13: 39: 47.25 <Warn: DM.Warning> Slot-1: Slot-2 FAILED (1) Not In Sync
Press <SPACE> to continue or <Q> to quit: 23/01/2015 13: 39: 47.22 <Crit: NM.NodeStateFail> Slot-1: Slot-2 has failed for the reason of "Not In Sync".
01/23/2015 13: 39: 47.16 <Info: HAL.Port.Info> Slot-1: Stacking port 3: 1 link up at 10Gbps.
01/23/2015 13: 39: 47.16 <Info: HAL.Port.Info> Slot-1: Stacking port 1: 2 link up at 10Gbps.
01/23/2015 13: 39: 46.27 <Warn: DM.Warning> Slot-3: Slot-2 FAILED (1) Not In Sync
01/23/2015 13: 39: 40.31 <Info: vlan.msgs.portLinkStateDown> Slot-1: Port link down 2:32
...
01/23/2015 13: 39: 39.91 <Info: HAL.Card.Info> Slot-1: Slot-2 down, resetting all TCP connections to it
Press <SPACE> to continue or <Q> to quit: 23/01/2015 13: 39: 39.91 <Info: HAL.Card.Info> Slot-1: Module in Slot-2 is removed
01/23/2015 13: 39: 39.89 <Erro: HAL.Sys.Error> Slot-1: pibPortPktBufLog: Failed to get buffer port range parameters for instance 10, rc = -3
                                                    
01/23/2015 13: 39: 39.89 <Erro: HAL.Sys.Error> Slot-1: pibPortPktBufLog: Failed to get buffer port range parameters for instance 9, rc = -19
                                                    
01/23/2015 13: 39: 39.89 <Info: HAL.Port.Info> Slot-1: Stacking port 3: 1 link down.
01/23/2015 13: 39: 39.89 <Info: HAL.Port.Info> Slot-1: Stacking port 2: 2 link down.
01/23/2015 13: 39: 39.89 <Info: HAL.Port.Info> Slot-1: Stacking port 2: 1 link down.
01/23/2015 13: 39: 39.86 <Warn: DM.Warning> Slot-1: Slot-2 FAILED (1)
01/23/2015 13: 39: 39.84 <Warn: DM.Warning> Slot-1: BACKUP NODE (Slot-2) DOWN
01/23/2015 13: 39: 39.78 <Info: HAL.Card.Info> Slot-1: Slot-2 down, resetting all TCP connections to it
01/23/2015 13: 39: 39.78 <Info: HAL.Card.Info> Slot-1: Module in Slot-2 is removed
01/23/2015 13: 39: 39.78 <Info: HAL.Port.Info> Slot-1: Stacking port 1: 2 link down.
01/23/2015 13: 39: 38.94 <Warn: DM.Warning> Slot-3: BACKUP NODE (Slot-2) DOWN
01/23/2015 13: 39: 33.74 <Info: vlan.msgs.portLinkStateUp> Slot-1: Port 2:32 at speed 1 Gbps link UP and full-duplex
...
01/23/2015 13: 36: 53.37 <Warn: DM.Warning> Slot-1: Slot-2 FAILED (1) Not In Sync
01/23/2015 13: 36: 53.35 <Info: HAL.Port.Info> Slot-1: Stacking port 3: 1 link up at 10Gbps.
01/23/2015 13: 36: 53.35 <Info: HAL.Port.Info> Slot-1: Stacking port 1: 2 link up at 10Gbps.
01/23/2015 13: 36: 53.35 <Crit: NM.NodeStateFail> Slot-1: Slot-2 has failed for the reason of "Not In Sync".
01/23/2015 13: 36: 52.42 <Warn: DM.Warning> Slot-3: Slot-2 FAILED (1) Not In Sync
01/23/2015 13: 36: 45.91 <Info: HAL.Card.Info> Slot-1: Slot-2 down, resetting all TCP connections to it


Some relevant information:
- The temperature at that time:
* Slot-1 Stack.2 # sho temperature
Field replaceable Units Temp (C) Normal Status Min Max
-------------------------------------------------- -------------------------
Slot-1: 0-45 60 -10 Normal X460-24t 30.00
Slot-2: 0-45 60 -10 Normal X440-48t 32.00
Slot-3: 0-45 60 -10 Normal X440-48t 32.50

Slab of hardware, software.
* Slot-1 Stack.3 # sho version
Slot-1: 800321-00-09 1235G-80580 Rev 9.0 bootrom: 2.0.1.7 IMG: 15.1.3.4
Slot-2: 800473-00-05 1236G-00349 Rev 5.0 bootrom: 2.0.1.7 IMG: 15.1.3.4
Slot-3: 800473-00-09 1332N-40517 Rev 9.0 bootrom: 2.0.1.7 IMG: 15.1.3.4

* Slot-1 Stack.4 # sho switch

SysName:          Stack
SysLocation:     
SysContact:       support@extremenetworks.com, +1 888 257 3000
System MAC:       02:04:96:7E:2E:13
System Type:      X460-24t (Stack)

SysHealth check:  Enabled (Normal)
Recovery Mode:    All
System Watchdog:  Enabled

Current Time:     Fri Jan 23 16:13:17 2015
Timezone:         [Auto DST Disabled] GMT Offset: 0 minutes, name is UTC.
Boot Time:        Thu Jan 22 10:45:48 2015
Boot Count:       60
Next Reboot:      None scheduled
System UpTime:    1 day 5 hours 27 minutes 29 seconds

Slot:             Slot-1 *                     Slot-2                 
                  ------------------------     ------------------------
Current State:    MASTER                       BACKUP (In Sync)       

Image Selected:   secondary                    secondary              
Image Booted:     secondary                    secondary              
Primary ver:      12.5.4.5                     15.1.2.12              
Secondary ver:    15.1.3.4                     15.1.3.4               

Config Selected:  primary.cfg                                         
Config Booted:    primary.cfg                                         

primary.cfg       Created by ExtremeXOS version 15.1.3.4
                  488756 bytes saved on Tue Jan 13 10:49:41 2015
* Slot-1 Stack.5 #


Thanks!
Photo of Nucteiv

Nucteiv

  • 130 Points 100 badge 2x thumb

Posted 4 years ago

  • 0
  • 2
Photo of Grosjean, Stephane

Grosjean, Stephane, Employee

  • 13,676 Points 10k badge 2x thumb
Chao Anh,

I'm assuming you're vietnamese based on some characters I see in your log, I hope I'm not totally wrong. If so, can you briefly describe your issue in vietnamese?

I see you have a 3 members stack, 1 x460 and 2 x440.

I assume the x460 is the master. Is it correct?
Do you have a backup member?

It looks like slot-2 is. The error in the logs shows that it's not In Sync, causing an error. Not sure which entries are due to you powering down the x440, tho.

Can you tell us if this stack is doing L3?
If yes, how many LPM routes, and ARP do you have?
How many multicast entries do you have? Do you need L3 for multicast?

Cam on.
Photo of Nucteiv

Nucteiv

  • 130 Points 100 badge 2x thumb
Hi,
Đúng mình là người Việt Nam. Vấn đề mình đang gặp phải ngắn gọn như sau:
- Switch Access có hiện tượng đang dùng thỉnh thoảng (khá thường xuyên) có 1 slot (switch) trong stack bị trạng thái tất cả các cổng access down hết không rõ nguyên nhân. Lúc này phải khởi động lại switch thì ok.
- Ngoài ra còn có hiện tượng thỉnh thoảng 1 loạt cổng kề nhau nối tới máy người dùng và các máy này không xin được IP từ DHCP dù trạng thái port vẫn up, phải restart lại switch thì ok.
- Switch x460 của mình là master, có switch backup.
- Khối switch stack này của mình hiện chỉ chạy dạng Layer 2, còn tất cả quá trình Routing Vlan thì đẩy hết tới Swich Cisco Layer3.

Hiện mình đang rất đau đầu về vấn đề này vì không tìm ra nguyên nhân, liên hệ bên cung cấp cũng hỗ trợ mãi chưa giải quyết vấn đề, dù thay cả swich mới cũng vẫn bị. Lúc đầu mình nghĩ có thể do loop ở cổng access nào đó và đã cấu hình elrp để disale port loop nhưng có vẻ không phải nguyên nhân đó.
Photo of Nucteiv

Nucteiv

  • 130 Points 100 badge 2x thumb
Hi,
I'm Vietnamese. The problem I'm having briefly as follows:
- Switch Access phenomenon is used sometimes (quite often) with 1 slot (switch) in the stack is the state all access gates down all unexplained. Now switch to restart it ok.
- There is also a phenomenon sometimes one contiguous range of ports connected to the user and the machine does not obtain IP from DHCP port status still up though, to restart the switch is ok.
- Switch X460 his master, with backup switch.
- Block the switch stack their only run Layer 2 format, and all processes are pushed out to the VLAN Routing Switch Cisco Layer3.

I was very painful first time on this issue because not find the cause, contact your provider supports not always solve the problem, replace the swich though still new. At first I thought may be due to certain loop access port and port configuration elrp to disale loop but does not seem to cause it.
Photo of Grosjean, Stephane

Grosjean, Stephane, Employee

  • 13,676 Points 10k badge 2x thumb
Hi,

I understand you have a problem with one slot in the stack. You need to reboot it from time to time to solve a recurring issue:

- the whole stack (?) hangs and you have no connectivity. You restart it and it's fine.
- sometimes some users on contiguous ports can't obtain an IP address from the DHCP server. You restart it and it's fine.

The stack is only doing L2, you do L3 on a Cisco router.

You already tried to replace the switch in slot 1 (x440) with no improvement.

The master of the stack is the x460, the backup is the x440 in slot 1.

Your first thought of a loop, but after configuring ELRP, you do not see any loop detected.

In your log, I can see you are filling up the IP Multicast table of slot-1. This is typical because EXOS performs by default a L3 lookup for multicast, thus quickly filling the table of the x440. Hopefully, we can easily solve that part of the issue, if this is related, by using the mac-vlan mode. This configuration requires 15.3, and cannot work with IGMPv3, MVR, PVLAN and PIM, because it's using L2 table. If you need either of such protocol, you need to move to mixed-mode, which will give you the benefits of L2 table size for multicast traffic, but the entries using these protocols, which would still use the L3 table.

configure forwarding ipmc lookup-key [group-vlan | source-group-vlan | mac-vlan | mixed-mode]

Do you have an issue only on slot1, or do you have issues on every x440? Is the x460 impacted or not?


Thanks.
Photo of Nucteiv

Nucteiv

  • 130 Points 100 badge 2x thumb
Hi,

- the whole stack (?) hangs and you have no connectivity. You restart it and it's fine.  => Yes, it's stack 2.

- As you say, the problem may be related to: filling up the IP Multicast table of slot-1?

To solve the need to upgrade to 15.3 EXOS to mac-vlan configuration mode under the command: configure forwarding lookup-key IPMC [group-vlan | source-group-vlan | mac-vlan | mixed-mode]?

Do you have an issue only on slot1, or do you have issues on every x440? Is the x460 impacted or not?  -> I have issed on slot 2 (x440) (computers connect to slot 2 disconnect from network after slot 2 hang down (every port down). And some other switches in the company I was this phenomenon and are x440.

Thanks!
Photo of Grosjean, Stephane

Grosjean, Stephane, Employee

  • 13,676 Points 10k badge 2x thumb
Hi,

ok, so this is limited to the x440. You should validate you are not exceeding some hardware limits of the x440 here. IP Multicast, ARP...

I don't know if the Multicast table could be (part of) the issue. But this is something you need to address anyway.

When you reboot the x440, you reset every forwarding tables. Over time the x440 only are experiencing issues. I'll start to look at some resources exhaustion.

Have you opened a case with our GTAC on that matter? They should be able to drive you into this troubleshooting.

Best Regards,
Stephane
(Edited)
Photo of Nucteiv

Nucteiv

  • 130 Points 100 badge 2x thumb
Sory, what is GTAC? I don't understand.
Photo of Grosjean, Stephane

Grosjean, Stephane, Employee

  • 13,676 Points 10k badge 2x thumb
The GTAC is our Global TAC, our Technical Assistance Center. If you have an issue, hardware or software, you should open a case with them. This may be done by your reseller or yourself directly depending on the contract.
Photo of Nucteiv

Nucteiv

  • 130 Points 100 badge 2x thumb
I have opened a case on GTAC and following the case...