cancel
Showing results for 
Search instead for 
Did you mean: 

HELP! Stack unexpected reboot

HELP! Stack unexpected reboot

Mykhaylo_Skrypk
New Contributor III
XOS: ExtremeXOS version 15.3.3.5 v1533b5-patch1-2

One of our extreme x460/x450 stacks rebooted unexpectedly this morning (at 04:52 ) Logs suggests the following:

2017-01-18 04:57:14.33 Stacking port 1:1 link up at 10Gbps.
2017-01-18 04:57:13.99 Starting hal initialization ....
2017-01-18 04:57:12.29 telnetd listening on port 23

2017-01-18 04:57:06.18 The stack MAC address is not correctly configured on this node. The stack can not operate properly in this condition. Please correct and reboot.
2017-01-18 04:57:03.16 DM started
2017-01-18 04:57:02.95 The Node Manager (NM) has started processing.
2017-01-18 04:57:02.15 EPM Started
2017-01-18 04:57:01.83 Changing to watchdog warm reset mode
2017-01-18 04:52:20.87 Slot-1 FAILED (1) Backup lost
2017-01-18 04:52:20.83 Shutting down all processes
2017-01-18 04:52:20.53 Node State[4] = FAIL (Backup lost)
2017-01-18 04:52:20.53 MASTER decided that I am not BACKUP anymore
2017-01-18 04:52:20.53 BACKUP NODE (Slot-1) DOWN

has anyone had a similar problem?
Thx,
Mykhaylo

27 REPLIES 27

-> show debug system-dump slot 1 ===============================================
Slot-1 system dump information
===============================================
core_dump_info storage: 8/3072 used [empty]
failure: kernel oops
reason: NMI1
time: Wed Apr 22 05:00:13 2015
where: extraps_handle_nmi:286
$0 : z0=00000000 at=ffefffff v0=00100000 v1=40100028
$4 : a0=00000001 a1=1000dc01 a2=00000001 a3=00000000
$8 : t0=80000380 t1=00080000 t2=80890a50 t3=2adff000
$12: t4=80000000 t5=00000000 t6=c2006448 t7=00440000
$16: s0=00001d50 s1=808b6ef0 s2=817fc900 s3=00002000
$20: s4=00000001 s5=00100000 s6=808b0000 s7=ffffffbf
$24: t8=004444a4 t9=2ad32f04
$28: gp=86b20000 sp=86b21b60 s8=808c0000 ra=80299c5c
Hi : 00000357
Lo : 00000000
epc : 80299d00 flush_all_zero_pkmaps+0x1d4/0x1ec Tainted: P
Status: 504800c0
Cause : 90800000
86b21b60: 86b19b60 00000000 001200d2 00000000 80630000 808b6048 817f3360 808b0000
86b21b80: 86b21ba8 8060baa0 80634d00 00000002 808b0000 80299ec8 00000a38 80292404
86b21ba0: 883bc624 000000d8 000000d8 ffffffef 00000003 8028824c 817f3360 883bc624
86b21bc0: ffffffff 000000d8 817f3360 8f3b2e00 883bc54c 000d8000 883bc624 86b21cf0
86b21be0: 00001000 00000000 836b3e00 802249fc 00000000 001fffff 00000000 80288474
86b21c00: 883bc54c 000d7000 8f3b2e00 80346ffc 83580718 8034d5e8 000d8000 883bc57c
86b21c20: 00000000 000000d8 883bc57c 86b21cf0 00001000 817f3360 883bc57c 803471e8
86b21c40: fe3ff000 00000000 80630000 817fada0 883bc57c 80299a70 883bc57c 00000000
86b21c60: 00000000 00001000 83580718 817fada0 883bc57c 80346f68 00000000 80564378
86b21c80: 883bc624 86b21cf0 000d7000 00001000 86b21c98 00000003 00001000 86b21cf0
86b21ca0: 000d7000 00001000 00000000 000d8000 00001000 00000000 80564378 883bc624
86b21cc0: 86b21cf0 802892a8 807c37e0 81a0a7e0 86b21e20 86b21e18 00001000 00000000
86b21ce0: 86b21ce8 86b21cec 817f3360 802873bc 86b21e18 00000001 00000000 00001000
86b21d00: 00000000 00000000 883bc624 883bc57c 000d8000 00000000 80564378 802c7b0c
86b21d20: 883bc57c 836b3e00 883bc624 86b21e70 86b21e20 86b21e18 86b21dc8 00000000
86b21d40: 000d8000 80289b98 86b21e20 80213920 00000000 86d96000 00000000 000d8000
log: ... PC 80299d10(flush_all_zero_pkmaps+0x1e4/0x1ec) at 63 seconds.
log: <0>
log: <0>CPU1: NMI at 0x80299d00, liveness (1763835910, 263766010)

Version #0 SMP Tue Jan 7 13:12:45 EST 20 by release-manager@biltmore.extrem Release 2.6.28.9cougar
Call Trace:
@[<80299d00>] flush_all_zero_pkmaps+0x1d4/0x1ec
@[<80299ec8>] kmap_high+0x1b0/0x25c
@[<802249fc>] __kmap+0x60/0x84
@[<80346ffc>] jffs2_do_readpage_nolock+0x3c/0xf4
@[<803471e8>] jffs2_write_begin+0x134/0x334
@[<802892a8>] generic_file_buffered_write+0x124/0x38c
@[<80289b98>] __generic_file_aio_write_nolock+0x2fc/0x60c
@[<8028a35c>] generic_file_aio_write+0x78/0x12c
@[<802b12b4>] do_sync_write+0xe0/0x124
@[<802b1e00>] vfs_write+0xb4/0x158
@[<802b1f9c>] sys_write+0x4c/0xa4
@[<80221b58>] stack_done+0x20/0x3c

[<80292404>] ____pagevec_lru_add+0x1d4/0x210
[<8028824c>] add_to_page_cache_locked+0x7c/0x10c
[<80288474>] grab_cache_page_write_begin+0xe0/0x10c
[<8034d5e8>] jffs2_write_inode_range+0x260/0x37c
[<80299a70>] kunmap_high+0x28/0xe4
[<80346f68>] jffs2_write_end+0x248/0x2a0
[<802873bc>] file_remove_suid+0x1c/0x9c
[<802c7b0c>] file_update_time+0x58/0x140
[<80213920>] ret_from_irq+0x0/0x4
Build directory: /data2/release-manager/v15_3_3_5-patch1-2/summit_rmi

Stacktrace:
reason: NMI0
id: 1429675213
$0 : z0=00000000 at=1000dc00 v0=805f2000 v1=8055bdd0
$4 : a0=00000000 a1=86b18920 a2=00000000 a3=86b18958
$8 : t0=80000380 t1=00080000 t2=00000000 t3=0000657f
$12: t4=9ae88a24 t5=00000001 t6=00000000 t7=2c652000
$16: s0=86b18920 s1=81a06920 s2=00000001 s3=00000000
$20: s4=0000000f s5=807c3920 s6=807c5e98 s7=00000000
$24: t8=01243000 t9=81a06920
$28: gp=807c4000 sp=807c5e88 s8=3030249e ra=80237fb0
Hi : 00095eff
Lo : dc887e00
epc : 80232778 resched_task+0x4/0xb0 Tainted: P
Status: 504800c0
Cause : 10800000
807c5e88: 002840cb 302ea2d4 807c5eb8 fffffbff 1000dc00 807c5ef0 1f72a3ea 86b25a60
807c5ea8: 81a06318 81a06350 86b25a60 81a06350 002840cb 3030249e 00000001 8025ccc8
807c5ec8: 00000000 81a06350 00000002 00000001 86b25a60 8025cd98 00000001 c4959b18
807c5ee8: 00000000 fffffbff 002840cb 3030249e 00000000 81a06318 3b9aca00 8025e250
807c5f08: ffffffff 86c12000 00000000 c4a4c454 002840cb 3030249e 7fffffff ffffffff
807c5f28: 7fffffff ffffffff 002840cb 81a06320 80844014 00000000 00000007 00000001
807c5f48: 00000000 807c2000 80844014 00001da0 00000000 8021b870 1000dc03 00000008
807c5f68: c26cd594 2ad83894 80608e70 80271b40 806299c0 00000022 806299f4 886348a0
807c5f88: 80628fa0 00000007 80628fd4 81a0d000 80878b00 80271c68 00000000 00000000
807c5fa8: 00000000 00000000 00000000 807c5fd0 00000000 00004000 81a0d000 80878b00
807c5fc8: 807c2000 80213b44 00000000 00000000 00000000 00000000 00000000 00000000
807c5fe8: 00000000 00000000 00000000 807c4000 805f3de8 802156e0 1bad2bad 1bad2bad
807c6008: 1bad2bad 1bad2bad 1bad2bad 1bad2bad 1bad2bad 1bad2bad 1bad2bad 1bad2bad
807c6028: 1bad2bad 1bad2bad 1bad2bad 1bad2bad 1bad2bad 1bad2bad 1bad2bad 1bad2bad
807c6048: 1bad2bad 1bad2bad 1bad2bad 1bad2bad 1bad2bad 1bad2bad 1bad2bad 1bad2bad
807c6068: 1bad2bad 1bad2bad 1bad2bad 1bad2bad 1bad2bad 1bad2bad 1bad2bad 1bad2bad
Call Trace:
@[<80232778>] resched_task+0x4/0xb0
@[<80237fb0>] try_to_wake_up+0x12c/0x2d0
@[<8025ccc8>] hrtimer_wakeup+0x1c/0x2c
@[<8025cd98>] __run_hrtimer+0xc0/0xd4
@[<8025e250>] hrtimer_interrupt+0x258/0x384
@[<8021b870>] c0_compare_interrupt+0x50/0xa0
@[<80271b40>] handle_IRQ_event+0x78/0xe4
@[<80271c68>] __do_IRQ+0xbc/0x264
@[<80213b44>] call_handle_irq+0x20/0x3c
@[<802156e0>] do_IRQ+0x9c/0x138
@[<80213920>] ret_from_irq+0x0/0x4
@[<80275bc0>] rcu_pending+0xb8/0x144
@[<80216510>] cpu_idle+0x30/0x90

Aleixo_Gomes
Extreme Employee
show stacking detail , will provide the info on configured stack mac -address
show ports stack-ports rxerrors no-refresh
show ports stack-ports txerrors no-refresh
above two commands will provide if there are any crc errors on stack ports , if they re incrementing then consider , swapping stack cables or reseating stack ports connection.

Swapping stack cable and reseating stack port (1:2) connection seems to resolved CRC errors
GTM-P2G8KFN