Extreme 3825i APs keep "downgrading" to older firmware

  • 0
  • 1
  • Problem
  • Updated 1 year ago
  • Solved
  • (Edited)
We have a mixture of 3825 and 3715 APs .  After upgrading to v10 software we updated the AP firmware as usual.  Trouble is we can get all 1,037 APs up to v10 firmware, wait a few days, and about 4 APs will show up with v9 firmware.  Wait another day and the number jumps to 7.  The same 7.  I can set for Controlled upgrade, upgrade these 7, ssh into the AP, see it has v10 software, reboot, log back in to verify v10 software.  I can set the controller to "Always Upgrade APs to default image" which is v10.  No matter what within a couple days an AP or 2 show up with old v9 firmware.  APs I have ssh'd into and seen the v10 as well as having the controller list them as v10.  This cycle of events has repeated 4 times in the last 10 days.

It is as if they are factory resetting themselves for some reason.  Does anyone know what is wrong?  Are they just bad APs that need to be RMA'd?
Photo of Joshua Beddingfield

Joshua Beddingfield

  • 280 Points 250 badge 2x thumb

Posted 1 year ago

  • 0
  • 1
Photo of Hawkins, Bruce

Hawkins, Bruce, Employee

  • 888 Points 500 badge 2x thumb
Are many/any of those APs connecting/registering with their controller(s) over a WAN/VPN??
Photo of Joshua Beddingfield

Joshua Beddingfield

  • 280 Points 250 badge 2x thumb
They are all at schools which the ISP routes back to the controller site as "local" traffic.  But it does have to leave and be routed.  The problem APs are at 3 different locations.  The two controllers are also at two geographically separated location.  We are not having this problem from any APs homed to the second controller.  
Photo of Hawkins, Bruce

Hawkins, Bruce, Employee

  • 888 Points 500 badge 2x thumb
I would just question whether the upgrade is ever truly finishing "successfully" even though you are reporting that you SSH to the affected APs and see the v10 version reported at the CLI.  In my experience, if you leave the MTU set to the default of 1500 for APs that connect back across a WAN/VPN/MetroE/Dark Fiber/Extended LAN etc sort of link ... that will lead to a lot of packet fragmentation which can lead to many things ... instability in the tunnels the AP uses back to both local and foreign controllers ... unreliable passing of the configuration from controller to AP ... and AP firmware upgrades that either take too long to complete ... or complete but the end result is a corrupted image ... or it may never finish at all.  This is usually due to the fact that fragmented packets have to be resent and when resent they sometimes arrive out of order.  Can you please tell me if the MTU for the APs in question are still using the default of 1500 and if so ... resize them appropriately using the following KCS article as a guideline for doing so?  https://gtacknowledge.extremenetworks.com/articles/Solution/IdentiFi-Wireless-AP-s-do-not-have-backu...
Photo of Joshua Beddingfield

Joshua Beddingfield

  • 280 Points 250 badge 2x thumb
I have now set the problem APs to 1400.  Trouble is the primary and backup tunnels were fine.  There are over two dozen APs at each of these locations and no more than 2 failing at any location.

I have changed the MTU settings and I am going through the upgrade process again.  It will be a few days before the problem creeps back up.  Fingers crossed.
Photo of Hawkins, Bruce

Hawkins, Bruce, Employee

  • 888 Points 500 badge 2x thumb
Did you regressively ping test using the -f and -l parameters until you reached a threshold where the pings would go through without being fragemented?? Or did you just take the 1400 value given as an example in the KCS article and apply that without testing? Your network may require a lower MTU or may even allow a higher MTU ... than the example given in the article.  Just trying to make sure you get the best result possible.  There may be something else afoot here ... but getting the MTU set correctly can only improve controller <> AP communication and increase the chances that the upgrade goes smoothly and correctly.
Photo of Joshua Beddingfield

Joshua Beddingfield

  • 280 Points 250 badge 2x thumb
1500 failed.  1400 passed.  (actually 1472 is the magic number here.)
Photo of Erik Auerswald

Erik Auerswald, Embassador

  • 12,782 Points 10k badge 2x thumb
If you used Windows, specifying a size of 1472 bytes for the ping payload actually results in a 1500 bytes IP packet, because the IP and ICMP headers not taken into account with the -l option. Thus the MTU does not seem to be a problem in this case.

If the IP and ICMP headers are excluded from the size option of any given ping program depends on the implementation.
Photo of Hawkins, Bruce

Hawkins, Bruce, Employee

  • 888 Points 500 badge 2x thumb
Awesome.  I usually test 1450 if 1400 works ... and then 1475 if 1450 works ... and if 1475 fails ... just go with 1450. No need to cut it too fine.  I have seen certain environments that needed the MTU to be set below 1000 though to work properly. Glad you tested for your specific environment from the controller's location to the AP's location or vice versa.  After you've performed the Controlled Upgrade to those 7 APs ... please let us know if the firmware version "sticks" better or not.