cancel
Showing results for 
Search instead for 
Did you mean: 

BD-12804: Slot (GM-20XTR) turns off with strange Erro:HAL.Card.Error

BD-12804: Slot (GM-20XTR) turns off with strange Erro:HAL.Card.Error

ilyinilyas
New Contributor
Today after 2 years of uptime one of our Slots on BD-12804 had been turned off for no apparent reason. We had got some strange logs before the accident:
06/29/2017 20:06:29.22 MSM-A: Slot-1 FAILED (2) cartmanPollMBReady-594: cartman4 on slot 1 (1 errors):Mailbox Polling Timeour
06/29/2017 20:06:29.22 MSM-A: Slot-1, Error 12: cartmanPollMBReady-594: cartman4 on slot 1 (1 errors):Mailbox Polling Timeou)
06/29/2017 20:06:29.21 MSM-A: cartmanPollMBReady-594: cartman4 on slot 1 (1 errors):Mailbox Polling Timeout(reg 705=87)And after that that all the ports on a slot starts to turn off:
06/29/2017 20:06:29.64 MSM-A: Port 1:5 link down
06/29/2017 20:06:29.64 MSM-A: Port 1:4 link down
06/29/2017 20:06:29.23 MSM-A: Remove port 1:3 from aggregator
06/29/2017 20:06:29.23 MSM-A: Remove port 1:2 from aggregator
06/29/2017 20:06:29.23 MSM-A: Remove port 1:1 from aggregator
06/29/2017 20:06:29.22 MSM-A: Port 1:3 is Down, remove from aggregator 1:1
06/29/2017 20:06:29.22 MSM-A: Port 1:3 link down
06/29/2017 20:06:29.22 MSM-A: Port 1:2 is Down, remove from aggregator 1:1
06/29/2017 20:06:29.22 MSM-A: Port 1:2 link down
06/29/2017 20:06:29.22 MSM-A: Port 1:1 is Down, remove from aggregator 1:1
06/29/2017 20:06:29.22 MSM-A: Port 1:1 link downI have not found any references in Internet to the problem, and logs look really strange for me. I have not found any PollMBReady or Mailbox Poling Timeouts in documentation. We even have no any mailboxes in configuration of BD-12804.

Our equipment:
Chassis : 804023-00-09 06135-01409 Rev 9.0
Slot-1 : 804032-00-06 06284-00059 Rev 6.0
Slot-5 : 804032-00-06 0721F-00331 Rev 6.0
Slot-6 : 804032-00-06 0720F-00670 Rev 6.0
MSM-A : 804047-00-07 0711F-00084 Rev 7.0 BootROM: 1.0.0.3 IMG: 12.6.2.10
PSUCTRL-1 : 700087-00-07 06105-00862 Rev 7.0 BootROM: 2.13
PSUCTRL-2 : 700087-00-07 06105-00911 Rev 7.0 BootROM: 2.13
PSU-1 : PS 2336 4300-00145 0722K-30342 Rev 10.0
PSU-2 : PS 2336 4300-00137 0502J-03684 Rev 7.0
PSU-3 : PS 2336 4300-00137 0519J-05462 Rev 7.0
Image : ExtremeXOS version 12.6.2.10 v1262b10 by release-manager
on Thu Sep 29 17:48:22 EDT 2011
BootROM : 1.0.0.3
Any idea? After restart the chassis works perfect as ever, but I fear of repeating of the problem and don't understand, what was the problem with our Slot-1 (GM-20XTR)?
3 REPLIES 3

EtherMAN
Contributor III
If this happens again and since you are going to reboot it to clear it up you may want to run an extended diagnostics on slot 1 and the MSM to see if there are any issues that show up. Be warned though if there are indeed bad memory or other hardware that it finds it may take the bad card offline due to the hardware problems so I would only do this if indeed you have a spare. Also be sure and have a back up of the config if you do the MSM... We only had one 12k in our network and if I recall the diagnostics is about 5 or 6 minutes per card and you have to do them one at a time.

Drew_C
Valued Contributor III
I did some searching and found a few instances of this that were resolved with software updates, but that was in 12.0 and 12.1 versions, so 12.6 should be okay. I see a later instance where an RMA was requested for the blade and no trouble was found at the repair facility. It's hard for me to say with certainty what caused this, but you'll want to monitor for sure. Keep in mind that you're dealing with 11+ year old equipment 🙂

Nick_Yakimenko
New Contributor II
Looks like it can be a hardware issue, e.g. broken capacitors due to overheat
GTM-P2G8KFN