Failing Address


Does anyone knows what's causing this error?

05/22/2017 07:12:51.95 Failing Address: 0x000000006aadb730, Data: 0x0000000000000000
05/22/2017 07:12:51.95 LMC0 ECC: syndrome: 0x9b
05/22/2017 07:12:51.95 LMC0 ECC: Failing column: 0x36d6
05/22/2017 07:12:51.95 LMC0 ECC: Failing row: 0x6aad
05/22/2017 07:12:51.95 LMC0 ECC: Failing bank: 6
05/22/2017 07:12:51.95 LMC0 ECC: Failing rank: 0
05/22/2017 07:12:51.95 LMC0 ECC: Failing dimm: 0
05/22/2017 07:12:51.95 ERROR LMC0 ECC: sec_err:4 ded_err:0
05/22/2017 07:12:51.77 Failing Address: 0x000000006aadb730, Data: 0x0000000000000000
05/22/2017 07:12:51.77 LMC0 ECC: syndrome: 0x9b
05/22/2017 07:12:51.77 LMC0 ECC: Failing column: 0x36d6
05/22/2017 07:12:51.77 LMC0 ECC: Failing row: 0x6aad
05/22/2017 07:12:51.77 LMC0 ECC: Failing bank: 6
05/22/2017 07:12:51.77 LMC0 ECC: Failing rank: 0
05/22/2017 07:12:51.77 LMC0 ECC: Failing dimm: 0
05/22/2017 07:12:51.77 ERROR LMC0 ECC: sec_err:4 ded_err:0

What should I do?

Switch information:

System Type: X670G2-72x

Image : ExtremeXOS version 15.6.4.2 v1564b2-patch1-3 by release-manager
on Thu Jan 28 11:12:00 EST 2016
BootROM : 1.0.2.1
Diagnostics : 2.1

Thanks

Bruno L.

5 replies

Userlevel 1
hi ,

did you try booting in different partition/
Userlevel 5
Bruno, please also check the log for any "CPU/L2 Memory ECC Counters have incremented" messages. These messages may be related to Memory.

From within EXOS, please also collect the following output:

# debug hal show sys-health-check

It would also be recommended to run extended diagnostics.
Ty Izzet wrote:

Bruno, please also check the log for any "CPU/L2 Memory ECC Counters have incremented" messages. These messages may be related to Memory.

From within EXOS, please also collect the following output:

# debug hal show sys-health-check

It would also be recommended to run extended diagnostics.

Hi, Thanks for the reply.

I have no match in my logs for "CPU/L2 Memory ECC Counters have incremented" or similar.

Folow de output for "debug hal show sys-health-check"

==================================================
# debug hal show sys-health-check

[System Info]
-------------------------
System Time: Mon Jun 12 09:28:39 2017

[Conduit Retry Stats]
Retry Value = 15 Action on Error = 0

[Low Memory Alerts]

[CPU ECC Counters]

[BCM Counters]

[Chip TCAM Counters]

==================================================

I have the same error yesterday:

06/11/2017 09:25:27.71 Failing Address: 0x000000006aadb730, Data: 0x0000000000000000
06/11/2017 09:25:27.71 LMC0 ECC: syndrome: 0x9b
06/11/2017 09:25:27.71 LMC0 ECC: Failing column: 0x36d6
06/11/2017 09:25:27.71 LMC0 ECC: Failing row: 0x6aad
06/11/2017 09:25:27.71 LMC0 ECC: Failing bank: 6
06/11/2017 09:25:27.71 LMC0 ECC: Failing rank: 0
06/11/2017 09:25:27.71 LMC0 ECC: Failing dimm: 0
06/11/2017 09:25:27.71 ERROR LMC0 ECC: sec_err:4 ded_err:0

Should I be worried?

Regards
Hi, Thanks for the reply.

I have no match in my logs for "CPU/L2 Memory ECC Counters have incremented" or similar.

Folow de output for "debug hal show sys-health-check"

==================================================
# debug hal show sys-health-check

[System Info]
-------------------------
System Time: Mon Jun 12 09:28:39 2017

[Conduit Retry Stats]
Retry Value = 15 Action on Error = 0

[Low Memory Alerts]

[CPU ECC Counters]

[BCM Counters]

[Chip TCAM Counters]

==================================================

I have the same error yesterday:

06/11/2017 09:25:27.71 Failing Address: 0x000000006aadb730, Data: 0x0000000000000000
06/11/2017 09:25:27.71 LMC0 ECC: syndrome: 0x9b
06/11/2017 09:25:27.71 LMC0 ECC: Failing column: 0x36d6
06/11/2017 09:25:27.71 LMC0 ECC: Failing row: 0x6aad
06/11/2017 09:25:27.71 LMC0 ECC: Failing bank: 6
06/11/2017 09:25:27.71 LMC0 ECC: Failing rank: 0
06/11/2017 09:25:27.71 LMC0 ECC: Failing dimm: 0
06/11/2017 09:25:27.71 ERROR LMC0 ECC: sec_err:4 ded_err:0

Should I be worried?

Regards
Userlevel 7
Bruno wrote:

Hi, Thanks for the reply.

I have no match in my logs for "CPU/L2 Memory ECC Counters have incremented" or similar.

Folow de output for "debug hal show sys-health-check"

==================================================
# debug hal show sys-health-check

[System Info]
-------------------------
System Time: Mon Jun 12 09:28:39 2017

[Conduit Retry Stats]
Retry Value = 15 Action on Error = 0

[Low Memory Alerts]

[CPU ECC Counters]

[BCM Counters]

[Chip TCAM Counters]

==================================================

I have the same error yesterday:

06/11/2017 09:25:27.71 Failing Address: 0x000000006aadb730, Data: 0x0000000000000000
06/11/2017 09:25:27.71 LMC0 ECC: syndrome: 0x9b
06/11/2017 09:25:27.71 LMC0 ECC: Failing column: 0x36d6
06/11/2017 09:25:27.71 LMC0 ECC: Failing row: 0x6aad
06/11/2017 09:25:27.71 LMC0 ECC: Failing bank: 6
06/11/2017 09:25:27.71 LMC0 ECC: Failing rank: 0
06/11/2017 09:25:27.71 LMC0 ECC: Failing dimm: 0
06/11/2017 09:25:27.71 ERROR LMC0 ECC: sec_err:4 ded_err:0

Should I be worried?

Regards

Hi Bruno,

I'd suggest opening up a case with GTAC. I believe this may require an RMA.

Reply