2 node X460-48p stack failed to connect to the collector

  • 0
  • 1
  • Question
  • Updated 2 years ago
  • Answered
  • (Edited)
We had an issue this past weekend that caused our stack to reboot node 1. When I looked in the logs I saw the below a few times. What would cause the nodes to go out of sync?

XOS is 15.5.4.2 patch1-5

10/08/2016 19:33:02.41 Slot-1: Booting after System Failure.
10/08/2016 19:33:02.05 Slot-1: Changing to watchdog warm reset mode
10/08/2016 19:16:46.86 Slot-1: Failed to connect to the collector 12.38.14.200:800 with SSL disabled (VLAN Mgmt does not have an IP address.)
10/08/2016 19:09:48.39 Slot-1: Failed to connect to the collector 12.38.14.200:800 with SSL disabled (VLAN Mgmt does not have an IP address.)
10/08/2016 19:07:40.66 Slot-1: BACKUP NODE (Slot-2) DOWN
10/08/2016 19:07:39.97 Slot-1: BACKUP is NOT in SYNC
Photo of DH

DH

  • 1,024 Points 1k badge 2x thumb

Posted 2 years ago

  • 0
  • 1
Photo of Prashanth KG

Prashanth KG, Employee

  • 5,300 Points 5k badge 2x thumb
Hi,

The logs mentioned above are very generic and it would be difficult to identify the exact trigger for reboot with this limited information.

It would be good to collect the logs from the slot 2 as well to know why it rebooted.
You can collect the logs from the other nodes by telnetting to those slot numbers.

If the information available in slot 2 is not providing any leads,we may need to open a GTAC case with the show tech output from the stack.

Hope this helps!
Photo of DH

DH

  • 1,024 Points 1k badge 2x thumb
Thanks for the reply, here is a larger log sample from Slot 2


10/08/2016 19:34:47.10 <Info:HAL.Card.Info> Slot-2: Module in Slot-1 is operational
10/08/2016 19:34:41.12 <Info:HAL.Card.Info> Slot-2: Module in Slot-2 is operational
10/08/2016 19:34:35.89 <Noti:DM.Notice> Slot-2: Slot-1 being Powered ON
10/08/2016 19:34:31.42 <Noti:EPM.system_stable> Slot-2: System is stable. Change to warm reset mode
10/08/2016 19:34:19.47 <Info:EPM.wdg_enable> Slot-2: Watchdog enabled
10/08/2016 19:33:43.50 <Noti:DM.Notice> Slot-2: Setting time to Sun Oct  9 02:53:43 2016
10/08/2016 19:33:39.38 <Noti:DM.Notice> Slot-2: Node State[3] = BACKUP
10/08/2016 19:33:24.91 <Info:DOSProt.Init> Slot-2: DOS protect application started successfully
10/08/2016 19:33:23.35 <Noti:SNMP.Subagent.MstrRestrt> Slot-2: snmpMaster process has been restarted.
10/08/2016 19:33:23.25 <Info:tftpd.info> Slot-2: **** tftpd started *****
10/08/2016 19:33:22.62 <Info:HAL.Card.Info> Slot-2: Module in Slot-1 is inserted
10/08/2016 19:33:22.17 <Info:SNMP.Master.InitDone> Slot-2: snmpMaster initialization complete
10/08/2016 19:33:22.04 <Noti:DM.Notice> Slot-2: Node State[2] = STANDBY
10/08/2016 19:33:22.04 <Info:DM.Info> Slot-2: Node INIT DONE ....
10/08/2016 19:33:21.92 <Info:HAL.Port.Info> Slot-2: Stacking port 1:2 link up at 10Gbps.
10/08/2016 19:33:21.92 <Info:HAL.Port.Info> Slot-2: Stacking port 1:1 link up at 10Gbps.
10/08/2016 19:33:20.96 <Info:telnetd.info> Slot-2: **** telnetd started *****
10/08/2016 19:33:19.30 <Noti:DM.Notice> Slot-2: Slot-2 being Powered ON
10/08/2016 19:33:18.99 <Noti:DM.Notice> Slot-2: Node State[1] = INIT
10/08/2016 19:33:17.59 <Info:HAL.Sys.Info> Slot-2: Hal initialization done.
10/08/2016 19:33:17.13 <Info:HAL.Card.Info> Slot-2: Module in Slot-2 is inserted
10/08/2016 19:33:16.94 <Noti:HAL.Sys.Notice> Slot-2: Module in fan slot 8 is removed
10/08/2016 19:33:16.94 <Noti:HAL.Sys.Notice> Slot-2: Module in fan slot 7 is removed
10/08/2016 19:33:16.94 <Noti:HAL.Sys.Notice> Slot-2: Module in fan slot 6 is removed
10/08/2016 19:33:16.94 <Noti:HAL.Sys.Notice> Slot-2: Module in fan slot 5 is removed
10/08/2016 19:33:16.94 <Noti:HAL.Sys.Notice> Slot-2: Module in fan slot 4 is removed
10/08/2016 19:33:16.94 <Noti:HAL.Sys.Notice> Slot-2: Module in fan slot 3 is removed
10/08/2016 19:33:16.63 <Noti:HAL.Sys.Notice> Slot-2: Module in fan slot 2 is inserted
10/08/2016 19:33:16.54 <Noti:HAL.Sys.Notice> Slot-2: Module in fan slot 1 is removed
10/08/2016 19:33:16.35 <Info:HAL.Port.Info> Slot-2: Stacking port 2:2 link up at 10Gbps.
10/08/2016 19:33:16.35 <Info:HAL.Port.Info> Slot-2: Stacking port 2:1 link up at 10Gbps.
10/08/2016 19:33:15.98 <Info:HAL.Sys.Info> Slot-2: Starting hal initialization ....
10/08/2016 19:33:14.29 <Info:SNMP.Subagent.InitDone> Slot-2: snmpSubagent initialization complete
10/08/2016 19:33:14.18 <Info:nl.init> Slot-2: Network Login framework has been initialized
10/08/2016 19:33:06.78 <Info:telnetd.info> Slot-2: telnetd listening on port 23

10/08/2016 19:32:59.65 <Noti:DM.Notice> Slot-2: DM started
10/08/2016 19:32:59.56 <Noti:NM.StrtProc> Slot-2: The Node Manager (NM) has started processing.
10/08/2016 19:32:58.95 <Noti:EPM.start> Slot-2: EPM Started
10/08/2016 19:32:58.94 <Noti:EPM.UnexpctRebootDtect> Slot-2: Booting after System Failure.
10/08/2016 19:32:58.58 <Noti:EPM.wd_warm_reset> Slot-2: Changing to watchdog warm reset mode
10/08/2016 19:13:12.49 <Warn:EPM.all_shutdown> Slot-2: Shutting down all processes
10/08/2016 19:13:12.49 <Warn:DM.Warning> Slot-2: Slot-2 FAILED (1) Not In Sync
10/08/2016 19:13:12.14 <Erro:DM.Error> Slot-2: Node State[4] = FAIL (Not In Sync)
10/08/2016 19:13:12.14 <Warn:DM.Warning> Slot-2: NM: Old Primary's state is
10/08/2016 19:13:12.12 <Warn:DM.Warning> Slot-2: PRIMARY NODE (Slot-1) DOWN
Photo of Jeremy

Jeremy, Embassador

  • 9,788 Points 5k badge 2x thumb
(Edited)