<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic CPU Health Check has failed in ExtremeSwitching (Other)</title>
    <link>https://community.extremenetworks.com/t5/extremeswitching-other/cpu-health-check-has-failed/m-p/9104#M132</link>
    <description>Hello, a couple days ago we lost communication to an 8 slot 460-48p switch stack, were alerted to it by a netsight alert:&lt;BR /&gt;
&lt;BR /&gt;
    Cpu HealthCheck has failed. Slot ExtremeXOS (Stack)  version 15.3.1.4 v1531b4-patch1-44 by release-manager on Fri Sep 5 16:29:36 EDT  2014 Error Type 7 Action hardwareFail(4) Retries autoRecovery(5)&lt;BR /&gt;
&lt;BR /&gt;
I was able to log into the stack of (8) 460-48p's.  However only slot 1 had a role (Master).  Switches 2 thru 8 had a role of (None).   A reboot cleared the issue up.  I booted into the other partition which has a 16 code (had that planned already).  In looking back at records, we had the same message roughly a year ago and things went down there too. Is there something that I can tweek so that if the slot has a problem that it would recover by itself?&lt;BR /&gt;
&lt;BR /&gt;
Or something that was a known issue maybe with the 15.3.1.4 patch 1-44 code?  Or maybe it's a hardware issue?&lt;BR /&gt;
&lt;BR /&gt;
Thank you&lt;BR /&gt;
&lt;BR /&gt;
Sarah&lt;BR /&gt;
&lt;BR /&gt;
configure slot 1 module X460-48p&lt;BR /&gt;
configure sys-recovery-level slot 1 reset&lt;BR /&gt;
configure slot 2 module X460-48p&lt;BR /&gt;
configure sys-recovery-level slot 2 reset&lt;BR /&gt;
configure slot 3 module X460-48p&lt;BR /&gt;
configure sys-recovery-level slot 3 reset&lt;BR /&gt;
configure slot 4 module X460-48p&lt;BR /&gt;
configure sys-recovery-level slot 4 reset&lt;BR /&gt;
configure slot 5 module X460-48p&lt;BR /&gt;
configure sys-recovery-level slot 5 reset&lt;BR /&gt;
configure slot 6 module X460-48p&lt;BR /&gt;
configure sys-recovery-level slot 6 reset&lt;BR /&gt;
configure slot 7 module X460-48p&lt;BR /&gt;
configure sys-recovery-level slot 7 reset&lt;BR /&gt;
configure slot 8 module X460-48p&lt;BR /&gt;
configure sys-recovery-level slot 8 reset&lt;BR /&gt;
&lt;BR /&gt;
    &lt;BR /&gt;
&lt;BR /&gt;</description>
    <pubDate>Mon, 19 Jun 2017 16:32:00 GMT</pubDate>
    <dc:creator>Sarah_Seidl</dc:creator>
    <dc:date>2017-06-19T16:32:00Z</dc:date>
    <item>
      <title>CPU Health Check has failed</title>
      <link>https://community.extremenetworks.com/t5/extremeswitching-other/cpu-health-check-has-failed/m-p/9104#M132</link>
      <description>Hello, a couple days ago we lost communication to an 8 slot 460-48p switch stack, were alerted to it by a netsight alert:&lt;BR /&gt;
&lt;BR /&gt;
    Cpu HealthCheck has failed. Slot ExtremeXOS (Stack)  version 15.3.1.4 v1531b4-patch1-44 by release-manager on Fri Sep 5 16:29:36 EDT  2014 Error Type 7 Action hardwareFail(4) Retries autoRecovery(5)&lt;BR /&gt;
&lt;BR /&gt;
I was able to log into the stack of (8) 460-48p's.  However only slot 1 had a role (Master).  Switches 2 thru 8 had a role of (None).   A reboot cleared the issue up.  I booted into the other partition which has a 16 code (had that planned already).  In looking back at records, we had the same message roughly a year ago and things went down there too. Is there something that I can tweek so that if the slot has a problem that it would recover by itself?&lt;BR /&gt;
&lt;BR /&gt;
Or something that was a known issue maybe with the 15.3.1.4 patch 1-44 code?  Or maybe it's a hardware issue?&lt;BR /&gt;
&lt;BR /&gt;
Thank you&lt;BR /&gt;
&lt;BR /&gt;
Sarah&lt;BR /&gt;
&lt;BR /&gt;
configure slot 1 module X460-48p&lt;BR /&gt;
configure sys-recovery-level slot 1 reset&lt;BR /&gt;
configure slot 2 module X460-48p&lt;BR /&gt;
configure sys-recovery-level slot 2 reset&lt;BR /&gt;
configure slot 3 module X460-48p&lt;BR /&gt;
configure sys-recovery-level slot 3 reset&lt;BR /&gt;
configure slot 4 module X460-48p&lt;BR /&gt;
configure sys-recovery-level slot 4 reset&lt;BR /&gt;
configure slot 5 module X460-48p&lt;BR /&gt;
configure sys-recovery-level slot 5 reset&lt;BR /&gt;
configure slot 6 module X460-48p&lt;BR /&gt;
configure sys-recovery-level slot 6 reset&lt;BR /&gt;
configure slot 7 module X460-48p&lt;BR /&gt;
configure sys-recovery-level slot 7 reset&lt;BR /&gt;
configure slot 8 module X460-48p&lt;BR /&gt;
configure sys-recovery-level slot 8 reset&lt;BR /&gt;
&lt;BR /&gt;
    &lt;BR /&gt;
&lt;BR /&gt;</description>
      <pubDate>Mon, 19 Jun 2017 16:32:00 GMT</pubDate>
      <guid>https://community.extremenetworks.com/t5/extremeswitching-other/cpu-health-check-has-failed/m-p/9104#M132</guid>
      <dc:creator>Sarah_Seidl</dc:creator>
      <dc:date>2017-06-19T16:32:00Z</dc:date>
    </item>
    <item>
      <title>RE: CPU Health Check has failed</title>
      <link>https://community.extremenetworks.com/t5/extremeswitching-other/cpu-health-check-has-failed/m-p/9105#M133</link>
      <description>Hi Sarah,&lt;BR /&gt;
&lt;BR /&gt;
Did you happen to try logging into one of the non-master nodes during the failure? I'm curious what they saw their role as during this?&lt;BR /&gt;
&lt;BR /&gt;
Also, did you check 'show slot'? I'd like to know what the status of the non-master nodes was.&lt;BR /&gt;
&lt;BR /&gt;
'Show log' from both the master and one of the failed nodes may be helpful as well, but since it was rebooted and the issue was a few days ago, there's a possibility we may have lost the logs during the failure.</description>
      <pubDate>Mon, 19 Jun 2017 20:05:00 GMT</pubDate>
      <guid>https://community.extremenetworks.com/t5/extremeswitching-other/cpu-health-check-has-failed/m-p/9105#M133</guid>
      <dc:creator>BrandonC</dc:creator>
      <dc:date>2017-06-19T20:05:00Z</dc:date>
    </item>
    <item>
      <title>RE: CPU Health Check has failed</title>
      <link>https://community.extremenetworks.com/t5/extremeswitching-other/cpu-health-check-has-failed/m-p/9106#M134</link>
      <description>Hi Brandon,&lt;BR /&gt;
&lt;BR /&gt;
Thanks for the reply. I only did the show stacking command (slot 1 was active and master the rest had number assignments but no role) not the show slot.  I didn't think to try and telnet into the other slots to see.&lt;BR /&gt;
&lt;BR /&gt;
There are some messages still in NVRAM for example from slot 2, they all indicate no master (all slots):&lt;BR /&gt;
&lt;BR /&gt;
06/18/2017 08:08:19.03 &lt;DM.WARNING&gt; Slot-2: Slot-3 FAILED (1) No Master&lt;BR /&gt;
06/18/2017 08:08:19.03 &lt;DM.WARNING&gt; Slot-2: Slot-5 FAILED (1) No Master&lt;BR /&gt;
06/18/2017 08:08:19.03 &lt;DM.WARNING&gt; Slot-2: Slot-7 FAILED (1) No Master&lt;BR /&gt;
06/18/2017 08:08:19.02 &lt;DM.WARNING&gt; Slot-2: Slot-4 FAILED (1) No Master&lt;BR /&gt;
06/18/2017 08:08:19.02 &lt;DM.WARNING&gt; Slot-2: Slot-2 FAILED (1) No Master&lt;BR /&gt;
06/18/2017 08:08:18.49 &lt;DM.WARNING&gt; Slot-2: Slot-1 FAILED (1)&lt;BR /&gt;
06/18/2017 08:08:18.48 &lt;DM.WARNING&gt; Slot-2: Slot-6 FAILED (1) No Master&lt;BR /&gt;
06/18/2017 08:08:18.46 &lt;DM.ERROR&gt; Slot-2: Node State[3] = FAIL (No Master)&lt;BR /&gt;
06/18/2017 08:08:18.46 &lt;DM.WARNING&gt; Slot-2: PRIMARY NODE (Slot-1) DOWN&lt;BR /&gt;
&lt;BR /&gt;&lt;/DM.WARNING&gt;&lt;/DM.ERROR&gt;&lt;/DM.WARNING&gt;&lt;/DM.WARNING&gt;&lt;/DM.WARNING&gt;&lt;/DM.WARNING&gt;&lt;/DM.WARNING&gt;&lt;/DM.WARNING&gt;&lt;/DM.WARNING&gt;</description>
      <pubDate>Mon, 19 Jun 2017 20:21:00 GMT</pubDate>
      <guid>https://community.extremenetworks.com/t5/extremeswitching-other/cpu-health-check-has-failed/m-p/9106#M134</guid>
      <dc:creator>Sarah_Seidl</dc:creator>
      <dc:date>2017-06-19T20:21:00Z</dc:date>
    </item>
  </channel>
</rss>

