Troubleshooting 'STACK: communication timeout' errors on a SecureStack

  • 0
  • 1
  • Article
  • Updated 5 years ago
  • (Edited)
Article ID: 5624 

Products
Matrix C2
SecureStack C2
SecureStack B2
SecureStack A2 

Protocols/Features
Stacking 

Goals
Troubleshooting a Failing Stack 

Symptoms
Unable to communicate across stack modules
Stacking connectivity issues
"ATP: TX timeout"
"communication timeout" 

Cause
Inter-stack communication issues may be suggested by error messages seen directly within a cli session, as shown in this example: 
 C2(su)->ATP: TX timeout, seq 41315. cli 778. to 2 tx cnt 21.
STACK: communication timeout to d:1c
ATP: TX timeout, seq 25724. cli 8. to 2 tx cnt 21.
STACK: communication timeout to d:1c
ATP: TX timeout, seq 41316. cli 778. to 2 tx cnt 21.
STACK: communication timeout to d:1c
C2(su)->ATP: TX timeout, seq 41319. cli 778. to 2 tx cnt 21.
STACK: communication timeout to d:1c
C2(su)->ATP: TX timeout, seq 41320. cli 778. to 2 tx cnt 21.
STACK: communication timeout to d:1c

Solution
The messages are written from the perspective of the stack manager, whose unit number can be determined via a 'show switch' command. 

In this example, the manager of a three-stack is unable to communicate to its immediate (either upstream or downstream) neighboring switch whose base MAC Address ends in "0d:1c".
  1. Identify both the stack manager unit and the peer unit in question. This establishes the fault domain of the communication problem. Either of the two units, or the stack cable between them, could be at fault.
  2. Extract a Fault Log (5487) to see if it sheds any light on the issue.
  3. Issue a 'show switch stack-ports' command on this stack, to see if there are any transmit or receive errors on any of the stack ports, since boot. There may be sufficient information to point to just one of the three components.
  4. If further refinement is necessary, break up the fault domain so that the next time an event occurs, it may be determined to a fair degree of accuracy what component is responsible.
     
    For example, assume we are starting with Unit 1 (MAC 0d:1c), Cable 1-2, Unit 2, Cable 2-3, Unit 3 (the manager), Cable 3-1, and back again.
     
    We are suspicious of fault domain Unit 3, Cable 3-1, Unit 1, so should break it up.
      Note: In the explanation below, each physical component will keep its designated name, and each switch unit will keep its allocated logical unit number and configurations.
       
      First, physically swap Units 1 and 2, giving Unit 2, Cable 1-2, Unit 1, Cable 2-3, Unit 3, Cable 3-1, and back again.
       
      Then, physically swap Cable 2-3 and Cable 3-1, giving Unit 2, Cable 3-1, Unit 1, Cable 1-2, Unit 3, Cable 2-3, and back again.
       
      Cable 3-1 is still connected to Unit 1, but to its other stack port.
        Unit 3 is still connected to Unit 1, but using their other stack ports.
         
      • If the issue recurs between Unit 3 and Unit 2, the problem is Unit 3.
          If the issue recurs between Unit 3 and Unit 1, the problem is Cable 3-1.
            If the issue recurs between Unit 1 and Unit 2, the problem is Unit 1.

          Stacking/cabling guidelines are explained in 5668
          Contact Enterasys Networks Technical Services for further assistance, as necessary.
          Photo of FAQ User

          FAQ User, Official Rep

          • 13,620 Points 10k badge 2x thumb

          Posted 5 years ago

          • 0
          • 1

          There are no replies.

          This conversation is no longer open for comments or replies.