Question

Error: MSM-A: Slot-1 FAILED (6) Conduit asynchronous transmit error encountered


Anyone had this log on the switch:

05/29/2014 22:01:30.37 [i] MSM-A: Login passed for user user001 through telnet (10.10.10.1)
05/29/2014 21:01:58.95 [i] MSM-A: Port 1:5 link down
05/29/2014 21:01:58.95 [i] MSM-A: Port 1:4 link down
05/29/2014 21:01:58.95 [i] MSM-A: Port 1:3 link down
05/29/2014 21:01:58.95 [i] MSM-A: Port 1:2 link down
05/29/2014 21:01:58.95 [i] MSM-A: Port 1:1 link down
05/29/2014 21:01:58.94 MSM-A: Slot-1 FAILED (6) Conduit asynchronous transmit error encountered
05/29/2014 21:01:58.94 MSM-A: System Error 0: Conduit asynchronous transmit error encountered
05/29/2014 21:01:11.92 MSM-B: Slot-1 FAILED (6) Error on Slot-1

05/29/2014 21:00:28.26 [i] MSM-A: Port 3:10 link UP at speed 1 Gbps and full-duplex
05/29/2014 21:00:28.08 [i] MSM-A: Port 3:10 link down
05/29/2014 21:00:16.06 [i] MSM-A: Port 3:10 link UP at speed 1 Gbps and full-duplex
05/29/2014 21:00:15.86 [i] MSM-A: Port 3:10 link down
05/29/2014 20:59:37.19 [i] MSM-A: Port 3:10 link UP at speed 1 Gbps and full-duplex
05/29/2014 20:59:37.06 [i] MSM-A: Port 3:10 link down
05/29/2014 20:59:10.71 MSM-A: Sys-Health-Check Card 1 not responding for 10 ticks over interface 2

05/29/2014 20:59:10.71 MSM-A: Sys-Health-Check Card 1 not responding for 10 ticks over interface 1

05/29/2014 20:58:44.48 MSM-A: Sys-Health-Check: Switch fabric port state is Down for Module in MSM-A (unit 2, internal port 1) connected to Module in slot 1 (unit 0, internal port 26)
05/29/2014 20:58:44.48 MSM-A: Sys-Health-Check: Switch fabric port state is Down for Module in MSM-B (unit 2, internal port 1) connected to Module in slot 1 (unit 1, internal port 24)
05/29/2014 20:58:44.48 MSM-A: Sys-Health-Check: Switch fabric port state is Down for Module in MSM-A (unit 0, internal port 6) connected to Module in slot 1 (unit 1, internal port 25)
05/29/2014 20:58:44.48 MSM-A: Sys-Health-Check: Switch fabric port state is Down for Module in MSM-B (unit 0, internal port 6) connected to Module in slot 1 (unit 0, internal port 27)
05/29/2014 20:58:23.47 MSM-B: Sys-Health-Check Card 1 not responding for 10 ticks over interface 2

05/29/2014 20:58:23.47 MSM-B: Sys-Health-Check Card 1 not responding for 10 ticks over interface 1

==============================================================

Chassis : 800392-00-03 1111A-11111 Rev 3.0
Slot-1 : 800225-00-05 1111A-11112 Rev 5.0 BootROM: 1.0.4.0 IMG: 12.6.3.2
Slot-2 : 800225-00-05 1111A-11113 Rev 5.0 BootROM: 1.0.4.0 IMG: 12.6.3.2
Slot-3 : 800226-00-05 1111A-11114 Rev 5.0 BootROM: 1.0.4.0 IMG: 12.6.3.2
Slot-4 : 800226-00-05 1111A-11115 Rev 5.0 BootROM: 1.0.4.0 IMG: 12.6.3.2
Slot-5 :
Slot-6 :
Slot-7 :
Slot-8 :
Slot-9 :
Slot-10 :
MSM-A : 800314-00-01 1111A-11116 Rev 1.0 BootROM: 1.0.4.4 IMG: 12.6.3.2
MSM-B : 800314-00-01 1111A-11117 Rev 1.0 BootROM: 1.0.4.4 IMG: 12.6.3.2

7 replies

Userlevel 4
How long you are facing this issue?
Try to do hard reaseat of slot 1.
Try to connect the same I/O module to some other slot and see the behaviour of slot 1?
Collect "debug hal show sys-health-check" command output after performing step 2 for couple of times.
- More less two months.
- Yes, it worked. But, I've had to do this 3 times.
- The behavior is in slot 1 only, I have done the inversion module.
- I collected the "debug hal show sys-health-check" command, but it was after the problem has been resolved. Under the command:

[System Info]-------------------------
System Time: Fri May 30 09:44:24 2014
Last Reboot: COLD Restart

[Power-Fan Controller]
-------------------------
PsuCtrl-1 OPERATIONAL
PsuCtrl-2 OPERATIONAL

[Card State (Mask = 7C0F)]
Slot Hardware Abstraction Layer(HAL) Boot DataPath AsyncQueue
No. CardType CardState Mode MsmA MsmB Curr(Max)
----------------------------------------------------------------
1 G48Te2 OPERATIONAL Cold 0(15)
2 G48Te2 OPERATIONAL Cold 0(18)
3 G48Xc OPERATIONAL Cold 0(48)
4 G48Xc OPERATIONAL Cold 0(48)
A MSM-48c OPERATIONAL Cold
B MSM-48c OPERATIONAL Cold

[Card Processes]

[Low Memory Alerts]

[ControlPath Status]

[DataPath Status]

[Fabric Port Events]

[CPU ECC Counters]

[Chip Stat/ECC Counters]

[Chip TCAM Counters]

[IMPORTANT NOTE]
This only represents HealthCheck results from current MSM's point of view.
You MUST view HealthCheck results on the other MSM for a complete picture.
Userlevel 4
Get the above output during the time of issue for couple of times to see the change in output.
Did you see congestion on the slot before the time of issue? (debug hal show congestion)
Did you see port utilization went high before the time of issue?
what is the software version running on switch?
As you have mentioned if you connect slot 1 in some other slot then I/O module works fine. I would suggest to connect any other I/O slot on slot 1 and see the behaviour.
- Right, I'll do that next time.
- I did not. I'll do next time.
- I did not.
- Show version:

show version
Chassis : 800392-00-03 1150A-11111 Rev 3.0
Slot-1 : 800225-00-05 1140A-11111 Rev 5.0 BootROM: 1.0.4.0 IMG: 12.6.3.2
Slot-2 : 800225-00-05 1130A-11111 Rev 5.0 BootROM: 1.0.4.0 IMG: 12.6.3.2
Slot-3 : 800226-00-05 1120A-11111 Rev 5.0 BootROM: 1.0.4.0 IMG: 12.6.3.2
Slot-4 : 800226-00-05 1119A-11111 Rev 5.0 BootROM: 1.0.4.0 IMG: 12.6.3.2
Slot-5 :
Slot-6 :
Slot-7 :
Slot-8 :
Slot-9 :
Slot-10 :
MSM-A : 800314-00-01 1118A-11111 Rev 1.0 BootROM: 1.0.4.4 IMG: 12.6.3.2
MSM-B : 800314-00-01 11117A-11111 Rev 1.0 BootROM: 1.0.4.4 IMG: 12.6.3.2
PSUCTRL-1 : 450306-00-03 1116A-11111 Rev 3.0 BootROM: 2.18
PSUCTRL-2 : 450306-00-03 1115A-11111 Rev 3.0 BootROM: 2.18
PSU-1 : PS 2350 4300-00146 1114A-11111 Rev 3.0
PSU-2 : PS 2350 4300-00146 1113A-11111 Rev 5.0
PSU-3 : PS 2350 4300-00146 1112A-11111 Rev 5.0
PSU-4 : PS 2350 4300-00146 1111A-11111 Rev 5.0
PSU-5 :
PSU-6 :

Image : ExtremeXOS version 12.6.3.2 v1263b2 by release-manager
on Thu Jun 21 22:46:37 EDT 2012
BootROM : 1.0.4.4
Diagnostics : 1.19

- I will do this test.
but why this alarm is generating...
Hi,
I am receiving below alarm since 1 month. Have any one any idea about this alarm , why this alarm is generating.
Userlevel 4
there is communication problem between MSM and back plane/cards.
debug hal show sys-health-check ---can tell you if there are errors in fabric or backplane.
donot leave this as it is it will reboot your chassis one day.

Reply