What is the fix for X440-24t Temperature Very Hot Warning Error

  • 0
  • 1
  • Question
  • Updated 3 years ago
  • Answered
What is the fix for X440-24t  Temperature Very Hot Warning Error???

This is newly buy X440-24t and this temp err is happening. Please advice.

I saw a lot of post regarding this X440-24t temperature error.  

I have upgraded to Recommended Release 15.6.3.1 patch1-5. Still encounter the issue.

Current State:    OPERATIONALImage Selected:   secondary
Image Booted:     secondary
Primary ver:      15.3.1.4
Secondary ver:    15.6.3.1
                  patch1-5


10/22/2015 05:14:19.58 <Warn:DM.Warning> : Environment warning reported.  Setting Environment LED color to Amber10/22/2015 05:14:19.58 <Erro:DM.Error> : Switch: Temperature (69 C) is reaching maximum limit (70 Celsius). (X440-24t, P/N: 800471-00-14, S/N: 1519N-41648, Rev: 14.0)

* X440-24t.2 # sh temperature 
Field Replaceable Units               Temp (C)   Status   Min  Normal   Max
---------------------------------------------------------------------------
Switch         : X440-24t               38.00    Normal   -10    0-48  55
* X440-24t.3 # 
Photo of Paul

Paul

  • 2,026 Points 2k badge 2x thumb

Posted 3 years ago

  • 0
  • 1
Photo of Bharathiraja, Suresh

Bharathiraja, Suresh, Employee

  • 3,526 Points 3k badge 2x thumb
Hi Paul,

Just to confirm , what is the data center actual temperature ? is this the only device reporting this temperature ?

Thanks,
Suresh.B
Photo of Paul

Paul

  • 2,026 Points 2k badge 2x thumb
room temperature is not more than 30.
may be around 20 to 25 C. 

other x250 switches are also there but no temp issue.
only to those X440-24t switches.

thanks.
Photo of PARTHIBAN CHINNAYA

PARTHIBAN CHINNAYA, Alum

  • 4,382 Points 4k badge 2x thumb
The HW Rev is rev14 from the logs.
I don't think this is a software bug.

Only till Rev 12 we had this problem
Photo of PARTHIBAN CHINNAYA

PARTHIBAN CHINNAYA, Alum

  • 4,382 Points 4k badge 2x thumb
The hardware design is as below
The switch will display temperature =
Room temp + 16

For example if room temp is 30
Then switch will display 30+ 16= 46
Fan starts rotating when this value. Hits 44.5.
Fan. Stops rotating when value falls below 44.5
This is done for hardware fan reliability.

But this is when we started seeing the bug.
This behaviour should have been changed in hard Rev >12.

The software bug had fix in 15.3.2 latest patch ,
15.3.1.4 latest patch and 15.4 latest.

You can confirm by downgrading one of the switches to these versions
Photo of Paul

Paul

  • 2,026 Points 2k badge 2x thumb
Thanks Parthiban,

So you want me to downgrade the switch from Recommended Release 15.6.3.1 patch1-5 to one of those firmware that you suggest?

http://documentation.extremenetworks.com/hw_sw_compatibility/HardwareSoftwareCompatibility/r_recomme...


The built-in Firmware version  15.3.1.4 is getting issue. Thus why i upgrade to 15.6.3.1 patch 1-5

--------------------------

Base on your experience about this Rev:14 should not be the Software BUG error. and what might be the error. Room Temperature is not more than 30 'C. Around 20-25 'C. 

Why Temperature is hit to (69 C). 
other x250 switches are also in the same place and temperature is normal. only happen to that brand new X440-24t switches. 

-Thanks.


Photo of Prashanth KG

Prashanth KG, Employee

  • 5,300 Points 5k badge 2x thumb
Hi Paul,

It seems like the switch is still displaying the hotspot sensor temperature instead of the actual switch temperature. 

Symptoms look similar to this article: 

X440 FAN does not work, node will shutdown in overheat condition 

I would like you to confirm if the fans start rotating when the warning occurs. If it is not. then the warning displayed may be of the temperature of the hotspot sensor as explained in the article. (The switch is not overheated only that the hotspot sensor is reporting high value)

I am little surprised that the warning still says the switch temperature rather than the hotspot sensor in the version 15.6.3.1 patch 1-5. 

10/22/2015 05:14:19.58 <Warn:DM.Warning> : Environment warning reported.  Setting Environment LED color to Amber10/22/2015 05:14:19.58 <Erro:DM.Error> : Switch: Temperature (69 C) is reaching maximum limit (70 Celsius). (X440-24t, P/N: 800471-00-14, S/N: 1519N-41648, Rev: 14.0) 

Because, as per the CR xos0055347 of the article, 

Incorrect temperature warnings in summit switches 

the warning message should indicate the hotspot sensor in the message instead of the switch like in the log below. 

11/06/2014 09:59:50.09 <Erro:DM.Error> : Switch: mainboard hot spot temperature (2 C) is reaching minimum limit (0 Celsius). (X440-48t, P/N: 800473-00-01, S/N: 1150G-00105, Rev: 1.0) 

If possible, could you test in the version 15.5.3.4 and check how the temperature warning is displayed. this might be an additional information that might be helpful for our investigation.

Irrespective of the result, I would suggest opening a GTAC case to check the reason behind the error message. 

Hope this helps! 
Photo of Paul

Paul

  • 2,026 Points 2k badge 2x thumb
Hi Prashanth, thank you for the reply. I will downgrade to 15.5.3.4 and see the switch temperature status error. For that 15.5.3.4 any patch version you prefer?

If still encounter I will contact GTAC. Thanks.

-Paul
Photo of Prashanth KG

Prashanth KG, Employee

  • 5,300 Points 5k badge 2x thumb
Hi Paul,

Thanks for agreeing to test further. 

I did a quick test with 15.6.3.1 patch 1-5 in X440-24t lab switch. 

I see that it is reporting the hotspot temperature separately in the log. 

Current State:    OPERATIONALImage Selected:   primary
Image Booted:     primary
Primary ver:      15.6.3.1
                  patch1-5
Secondary ver:    15.6.3.1 

(debug) X440-24t.6 # 10/24/2015 04:09:00.93 <Erro:DM.Error> Switch: mainboard temperature (-2 C) is reaching minimum limit (-10 Celsius). (X440-24t, P/N: 800471-00-11, S/N: 1336N-45908, Rev: 11.0)10/24/2015 04:09:00.93 <Warn:DM.Warning> Environment warning reported.  Setting Environment LED color to Amber
10/24/2015 04:09:00.94 <Erro:DM.Error> Switch: mainboard hot spot temperature (-2 C) below operating range (0 to 70 Celsius) (X440-24t, P/N: 800471-00-11, S/N: 1336N-45908, Rev: 11.0)

Can you please verify the log in 15.6.3 patch 1-5 again and let me know if the log reports switch temperature or the hotspot sensor?

Test with 15.5.3.4 patch 1-6 as well to be sure. 
Looking forward to the results! 
Photo of Paul

Paul

  • 2,026 Points 2k badge 2x thumb
Hi Prashanth, 

I am still using 15.6.3.1 patch1-5 and seeing the mainboard hot spot temperature.

What is the different between switch temperature and the hotspot sensor? 

What is the error? Is it software bug or hardware fault?  

What is the solution? 


10/24/2015 04:16:52.44 <Noti:DM.Notice> : Switch: mainboard hot spot temperature (66 C) is back to operating range (10 to 69 Celsius). (X440-24t, P/N: 800471-00-14, S/N: 1521N-41334, Rev: 14.0)10/24/2015 04:12:37.48 <Warn:DM.Warning> : Environment warning reported.  Setting Environment LED color to Amber
10/24/2015 04:12:37.48 <Erro:DM.Error> : Switch: mainboard hot spot temperature (69 C) is reaching maximum limit (70 Celsius). (X440-24t, P/N: 800471-00-14, S/N: 1521N-41334, Rev: 14.0)



* X440-24t.7 # sh fans
FanTray information:
 State:                  Operational
 NumFan:                 2
 Fan-1:                  Operational at 11000 RPM
 Fan-2:                  Operational at 11000 RPM

* X440-24t.8 # sh temp
Field Replaceable Units               Temp (C)   Status   Min  Normal   Max
---------------------------------------------------------------------------
Switch         : X440-24t               45.50    Normal   -10    0-48  55
* X440-24t.9 #

Current State:    OPERATIONAL
Image Selected:   secondary
Image Booted:     secondary
Primary ver:      15.3.1.4
Secondary ver:    15.6.3.1
                           patch1-5


Thank you for the support and reply.
Photo of Paul

Paul

  • 2,026 Points 2k badge 2x thumb
Now in the same room with other X250e and X440 switches temperature status.

Why is the different NORMAL Temp range? 

Why X440 Temp is so hot? Room Temp is around 25.

What is watchdog warm reset?

10/23/2015 00:39:21.93 <Warn:EPM.UnexpctRebootDtect> : Booting after System Failure.10/23/2015 00:39:21.53 <Noti:EPM.wd_warm_reset> : Changing to watchdog warm reset mode


X250 is working fine with no issue.

X440-24t.2 # sh temperature
Field Replaceable Units               Temp (C)   Status   Min  Normal   Max
---------------------------------------------------------------------------
Switch         : X440-24t               46.00    Normal   -10    0-48  55



X250e-48p-SW # sh temField Replaceable Units           Temp (C)   Status
----------------------------------------------------
Switch     : X250e-48p              39.50    Normal

Temp Range: -10.00 (Min), 0.00-60.50 (Normal), 62.90 (Max)


X250e-48p firmware version 12.1.3.14

#######################

X250e-48t-SW # sh temp
Field Replaceable Units               Temp (C)   Status   Min  Normal   Max
---------------------------------------------------------------------------
Switch         : X250e-48t              36.50    Normal   -10    0-54  59


X250e-48t running with 15.3.3.5 patch1-6
(Edited)
Photo of Prashanth KG

Prashanth KG, Employee

  • 5,300 Points 5k badge 2x thumb
Hi Paul,

Did you experience any reboot of the X440 switch or are we seeing only the temperature warning? I am gathering information to your questions and will respond shortly. 
Photo of Paul

Paul

  • 2,026 Points 2k badge 2x thumb
Thanks for the fast response and reply.

What is watchdog warm reset?

10/23/2015 00:39:21.93 : Booting after System Failure.10/23/2015 00:39:21.53 : Changing to watchdog warm reset mode
Photo of Prashanth KG

Prashanth KG, Employee

  • 5,300 Points 5k badge 2x thumb
Watchdog is at the kernel level waiting to reset the switch in case of a SW/HW failure. Changing to watchdog reset mode is just an informational message that is logged prior to the reboot of the switch. 

I got the similar log when I powered down and powered up the switch manually. 

10/26/2015 07:44:14.75 <Noti:EPM.start> EPM Started10/26/2015 07:44:14.74 <Warn:EPM.UnexpctRebootDtect> Booting after System Failure.
10/26/2015 07:44:14.34 <Noti:EPM.wd_warm_reset> Changing to watchdog warm reset mode

Can you please clarify if you experienced any switch due to the overheat condition?
Photo of Prashanth KG

Prashanth KG, Employee

  • 5,300 Points 5k badge 2x thumb
What is the different between switch temperature and the hotspot sensor? 

Below is a behaviour in the X440-24t switch: 
Switch operates on two sensors – one to trigger fans(sensor 0) and one to alarm s/w (hotspot sensor). 

Fans will run only when switch temperature readings exceeds 48C
  • Fans will start spinning when temperature sensor (0) reaches 48 degrees (as measured on sensor (0))
  • Expected outcome: fans will run when the temp reaches 48 degrees and bring the temperature down. 

However, in X250, fans are always rotating. So, it may be expected to see the temperature on the X440 switch to be slightly higher. 
Also, the hotspot sensor is always higher than the temperature sensor. 
(Edited)
Photo of Prashanth KG

Prashanth KG, Employee

  • 5,300 Points 5k badge 2x thumb
Looking at the temperature output, the room temperature seem to be on a higher value. Because, even with the fans running, X250e temperature is around 35 to 40. 

Looking at the output of X440-24t, it looks like the fans are also working fine. 

X440-24t.7 # sh fans
FanTray information:
 State:                  Operational
 NumFan:                 2
 Fan-1:                  Operational at 11000 RPM
 Fan-2:                  Operational at 11000 RPM

* X440-24t.8 # sh temp
Field Replaceable Units               Temp (C)   Status   Min  Normal   Max
---------------------------------------------------------------------------
Switch         : X440-24t               45.50    Normal   -10    0-48  55

This does not appear to be a hardware issue. 
If there are no reboots of X440-24t switch, the warning that you notice could be expected considering the switch architecture and the temperature in the environment. 

If there are reboots noticed, we may need to investigate this as it is not expected for the switch to get into the overheat condition if the fans are working. 

Hope this helps! 
(Edited)
Photo of Paul

Paul

  • 2,026 Points 2k badge 2x thumb
Thanks,

I didn't see any reboot on the switches. But it temperature is always around 44 'C and quit hot compare to other and it still in testing stage. I am worries that when i place in production network that X440-24t switches are giving me problems because of that temperature. I am not happy with that brand new X440-24t switches heat issue.

What is the different between x440-p and x440t fans??

I noticed x440-p got 4 fans and fans are always running?
Switch Temperature is around 30'C.

-Paul
Photo of Prashanth KG

Prashanth KG, Employee

  • 5,300 Points 5k badge 2x thumb
Hi Paul,

We certainly understand the frustration.
Unfortunately, the behaviour explained above is exclusive to X440-24t switches. 

Check this article out: 
https://gtacknowledge.extremenetworks.com/articles/Q_A/Is-it-normal-for

For the other switches, the behavior is different and the fans are always running and shows 11000 RPM. 
https://gtacknowledge.extremenetworks.com/articles/Q_A/On-a-X440-the-fan-speed-is-always-listed-as-11000-RPM-even-if-the-temperature-is-below-30C

It is good that the switches are not rebooting and it should not make a difference if more traffic is pumped in. Because, the fans would rotate when the temperature approaches 48 C and it will cool the sensors. 

However, it would be a best practice to position these switches at a place where the temperature do not go so high , especially it would be good not to position these devices on top of other switches.. 

Hope this helps. Let me know if you need any further clarification. 
Photo of Paul

Paul

  • 2,026 Points 2k badge 2x thumb
Hi Prashanth,

Thank you for your time and informative reply. Very much appreciate on your help.
Anyway we have to accept this behavior of X440-24t fans and temperature.

Please advice what is the suggested firmware for this X440-24t ?



Thanks.
Photo of Prashanth KG

Prashanth KG, Employee

  • 5,300 Points 5k badge 2x thumb
Hi Paul,

Current recommended version for X440 switches is 15.6.3.1 patch 1-5 which you have already installed.

Apart from the temperature warning in the logs, if you experience any reboots or any fan issues, please feel free to report it to us. That might need a look in as we expect the fan behaviour of X440-24t not to impact the switch performance.

Thanks!