C5210 HA pair , AP's disassociate from one controller and randomly reattach to the backup wireless controller

  • 0
  • 1
  • Problem
  • Updated 2 years ago
  • Solved
 C5210  We have recently upgraded to  09.21.11.0004 code which we hoped would resolve this issue.

This system has nearly 1000 AP spread across the two controllers.
We are seeing AP's swap from their primary controller to the back up . this is totally random and unpredictable ( so ,so far no packet capture to sniff ) ( 180/500 swap)

We have been advise so far to increase the poll timers, for the AP's. ( WASSP/CAPWAP ) AP >Global Settings> AP Registration > discovery timers

There does not seem to be any underling networking issues ,as we have no other reported  issues or concerns.

Is there a known issue ?
Has anybody else seen this issue and how was it resolved.
Can I priorities the WASSP traffic through the network ( DSCP? )

Regards
Photo of Rod Robertson

Rod Robertson

  • 2,344 Points 2k badge 2x thumb

Posted 2 years ago

  • 0
  • 1
Photo of Ronald Dvorak

Ronald Dvorak, Embassador

  • 48,894 Points 20k badge 2x thumb
Hi Rod,

I've no answers for you but a general question... why even use fast failover.

Does the network design requires fast failover instead of legacy failover.
I'm a big fan of legacy failover and use it for all my customer installations and don't see a problem with it.
How often does it happen that a controller is defect and not longer reachable.... in that rare case I assume it doesn't matter whether you loose one ping or two till the APs switch to the 2nd controller.
Photo of Laura

Laura

  • 1,490 Points 1k badge 2x thumb
I also have a C5210 controller and my APs fail over to the other controller. What is the difference between fast and legacy failover? Is there a way I can turn fast off?
Photo of Rod Robertson

Rod Robertson

  • 2,344 Points 2k badge 2x thumb
Hi
We have inherited this installation ,therefore are reluctant to make many changes.. I used to install the previous version of this , before enterasys bought it from Siemons
Photo of Ronald Dvorak

Ronald Dvorak, Embassador

  • 48,894 Points 20k badge 2x thumb
To enable legacy failover just remove the checkmark for fast failover.

Legacy failover is slower as the AP doesn't has a tunnel to the 2nd controller already established - slow means that you'd loose 1-2 pings during failover... in my experience.

The difference is that legacy failover has two requirements that MUST be fulfilled to allow the AP to authenticate/switch to the second controller.
1) the AP lose connection to the home controller
2) the controllers lose the connection to each other (=availability tunnel down)

Let's talk about the case in which you don't use legacy failiover.
If the APs connect via i.e. ESA0 and the availbility tunnel is configured on i.e. ESA1.
If ESA0 is down (i.e. broken cable) on the home controller the AP is not longer able to communicate with the controller but as ESA1 is still up (=availability tunnel is still up) the AP is not allowed to authenticate/switch to the second controller.

It's very important if you use legacy failover to use the same interface for AP registration also for the availabilty tunnel configuration.
In a "normal" setup with both controller in the same room and are setup for the same subnets that shouldn't be a problem and you are able to use legacy failover.

So the one thing that you need to make sure in the network design is that there is no such case where the AP is not able to reach the AP registration interface but the controllers could reach each other via the availabilty interface.
Photo of Rod Robertson

Rod Robertson

  • 2,344 Points 2k badge 2x thumb
Thanks for the reply , Im not sure that my customer will accept that as a " solution" a work round yes .. 

 I have been looking at changing the AP timers , is there a difference between verion 9 and 10 ?

Also looking at an ACL policy to put the UDP AP WASSP traffic into QP8.. 

 I will talk to my customer about removing the " fast failover option.
Photo of Rod Robertson

Rod Robertson

  • 2,344 Points 2k badge 2x thumb
Hi

We are on version 09.21 are the various timers different in version 10.
Photo of James A

James A, Embassador

  • 6,982 Points 5k badge 2x thumb
What are your timers set to currently? I have had this problem in the past, but it doesn't happen any more. My AP poll timeout is set to 4 seconds, discovery timeout is 3 seconds, detect link failure is 2 seconds.

Just to be sure, the APs aren't rebooting are they? What topology are your clients in, B@AP, B@EWC or routed?
Photo of Rod Robertson

Rod Robertson

  • 2,344 Points 2k badge 2x thumb
We are using bridge at EWC( B@EWC  ) for all AP's ,( Approx 1000 )  I have a meeting next week , with the customer ,to come up with a plan of how we are going to try and resolve the issue..
Photo of Rod Robertson

Rod Robertson

  • 2,344 Points 2k badge 2x thumb
Hi

Our timers were set to default, We had been advised by GTAC to extend the timer to 60 , we have done this for a group of AP's and are now waiting to see what happens.
Photo of Rich Pacheco

Rich Pacheco

  • 964 Points 500 badge 2x thumb
Hi Rod,  I ran in to the same situation a few weeks ago.  I have a pair of  C5210's in HA with 1200+ APs on them.  We broadcast a few SSIDs via both B@AP and B@EWC.  Things were stable for a very long time. A few weeks ago we started seeing the APs bouncing between controllers.  After spending sometime looking and adjusting the timers, we contacted the GTAC and were instructed to upgrade from 09.21.07 to 09.21.12.   That seemed to have resolved the issue.

We are still not sure why is started happening.  We were on the 09.21.07 code for a very long time without issue.

Good Luck
Photo of Rod Robertson

Rod Robertson

  • 2,344 Points 2k badge 2x thumb
  Many thanks, we recently upgraded to  09.21.11.0004 which was an extreme recommendation, going back to the customer and arranging another upgrade, is something I do not look forward to , without a explicit statement from extreme.

Can somebody from extreme comment on this , does upgrading the controllers to 09.21.12.X resolve this issue.
Photo of Scott Whall

Scott Whall, Employee

  • 548 Points 500 badge 2x thumb
@Rich - If your Wireless network was running on 9.21.07 for a long time, then recently Access Points started moving, was there some other change in the network that could have altered the traffic dynamics?

@Rod - We are actively working on all reported cases of APs timing out or moving between their respective controllers.  If you haven't already looked in the Knowledgebase for Poll Timeout articles, you can try this article:

https://gtacknowledge.extremenetworks.com/articles/Solution/IdentiFi-Access-Points-reboot-due-to-Pol...

However, as you noted above, if there are no outstanding problems in the network, and it is a random AP move, then getting a good packet capture from either the AP ethernet port, or the controller port where the AP's register, could be difficult, but it is a necessary piece to help us understand why the APs are moving.

There are no differences in the timers between version 9 and version 10 firmware.  

WASSP packets are already sent with a high priority.
Photo of Rich Pacheco

Rich Pacheco

  • 964 Points 500 badge 2x thumb
Hi Scott,

Working at a university, all of our upgrades/changes were completed before the start of the semester (9/1).  We have been in monitor/fix mode since then without any major issues.  We really try not to make any significant changes (wired or wireless) during the semester unless it's absolutely necessary.
Photo of Rod Robertson

Rod Robertson

  • 2,344 Points 2k badge 2x thumb
Hi
Thanks for this info ,how is the WASSP prioritized DSCP? if so what value.
Photo of Joshua Puusep

Joshua Puusep

  • 2,274 Points 2k badge 2x thumb
After one of our minor controller upgrades on 10.x, we were seeing about 10% of our AP's continuously move between controllers.  I worked with GTAC and the only fix we could come up with was to factory reset each AP, by using the 'cset factory' command via ssh.  I monitored the failover events using tunnel activation messages from syslog and reset each AP that generated an alarm over the course of a week, after which there were no further failovers.  I did not get an explanation as to why this behavior occurs.  We are upgrading our controllers again later this week and I will post if the issue recurs.
Photo of Rich Pacheco

Rich Pacheco

  • 964 Points 500 badge 2x thumb
Thanks for the update.  I'm curious to see if it starts happening again.
Photo of Joshua Puusep

Joshua Puusep

  • 2,274 Points 2k badge 2x thumb
We rolled out 10.11.04.0008 this morning and the AP's have remained stable so far.