Weird Availability Behaviour

  • 0
  • 2
  • Problem
  • Updated 2 years ago
  • Solved
  • (Edited)
Here is my scenario,

2 x C5210 Controllers, both Physical appliances. Running the latest code 10.11.01.0210.

Availability configured using Esa2 Ports on each controller, 10Gb SFP+ modules. Plug directly into their respective core switches in their respective DCs. fast failover is set to 2 seconds, with AP poll time set to 3 seconds (1.5 times as recommended in the documentation) system config and captive portal info is checked for sync.

Same subnet, 172.25.7.0/28, controller 1 is 172.25.7.1, controller 2 is 172.25.7.2. GW is 172.25.7.14.

These are the IPs used in the option 78 DHCP options.

Topology is configured to allow ap registration to these addresses.

We do use the "admin" ports on the controllers, but solely for web access.

The AP discovery works perfectly, the APs appear on both controllers. Currently one is set to allow any APs to connect, the other is only allowed approved APs to connect. this way we can configure the APs on one controller. This seems to work ok, all APs become local automatically to the "allow all" controller and foreign to the "only approved" controller

Our ideal solution is to split the APs between each controller at some point. we did actually start doing this. this is where we found an issue.

With only one controller serving all APs at first, you can configure the AP, change name, location etc. and this then propagates the change to the other controller, the name, location etc is sync'd as we would expect. you can also delete an ap and it will disappear form the other.

This is correct in that controller 2 is the "allow all" controller and controller 1 is the "only approved" controller. so changes made to the APs on controller 2, the info is sync;d to controller 1.

If we reverse this process, it does not work. AP discovery for example, controller 1 has the local APs
but on controller 2, the info is not updated, if you make a change to the name, it does not sync the other way, when you try to delete the AP it wont delete off the other controller! we get issues with AP authorisation failures and sometime we get the CM component complain about not being able to process a request.

"Config Manager has experienced an error which has prevented it from properly processing a request. CM will continue running, however this error may be an indicator of a larger system problem. Error Details:[ERR ] Can't connect to remote controller at 172.25.7.2"

This must be a process issue. the devices are part of the same broadcast domain (VLAN). They can ping eachother, ping the gateway, ping the APs.

Its also fully routable, this is proven with the AP discovery in that the APs belong to different VLANs and can obviously route, proven with the discovery working.

right now we're having to doctor the way it works if you like but this seems like controller 1 has issue talking to controller 2, yet the other way around is fine!

I hope you can understand this scenario, please ask any questions if something is confusing.

I do have some further questions aswell, like do i need mobility enabled with this setup for roaming?
Photo of Ian Broadway

Ian Broadway

  • 1,572 Points 1k badge 2x thumb

Posted 2 years ago

  • 0
  • 2
Photo of Gareth Mitchell

Gareth Mitchell, Extreme Escalation Support Engineer

  • 5,476 Points 5k badge 2x thumb
Ian

This sounds complex, I don't fully understand so I think it would be best if you opened a case with GTAC so we can check logs etc.

Only the local controller can configure the AP (radio/name etc) the foreign controller cannot.

-Gareth
Photo of Ronald Dvorak

Ronald Dvorak, Embassador

  • 45,306 Points 20k badge 2x thumb
1+ like for the detailed description

During a GTAC call last week I was told 2 things that I wasn't aware off...

1) never use the admin/ETH0 port (only for temporarily connections for a service engineer i.e. upgrade) as it could mess up the routing of the EWC
2) you should only have one port configured for mode "physical"

As you've access to the controller via 172.25.7.1+2 you'd disable/disconnet the admin port.
I don't unterstand the real reason to only have one mode physical but let's call it the magic of the internal processing in the controller.

Is there a special reason you'd like to split up the APs on both EWCs = exceed the max APs for EWC#1.
I prefer to delpoy it only to one and have the 2nd controller as standby,

If you look on the dashboard (main page) of both - on the left does it show "Availability: FF" and a green up arrow ?

Regarding mobility - you'd need to provide more details of the topologies used for the WLAN service but in short if you've routed or bridge@EWC I'd say mobility should be enabled.
If you only habe bridge@AP I don't think that you'd need mobility.
Photo of Ian Broadway

Ian Broadway

  • 1,572 Points 1k badge 2x thumb
Hi Gareth,

It is complex, I'm pretty sure its not normal either. Let me see if i can simplify.

If I make APs on controller 1 local and then foreign on 2, any changes made to the APs on controller 1 do not sync with controller 2.

If I remove all the APs, then reverse the process, by making APs on controller 2 local and the foreign on 1, changes will sync to controller 1. with me?

Ronald,

If I'm honest we have worked alongside a UK partner and they have configured it this way. I'd like to think that GTAC would make info like that aware to partners i.e. the admin port. Makes no sense to me. There is only one port in physical mode, the interface of the 172 address.
The AP split design is again the partners recommendation. when you say exceed the maximum APs for a EWC, they can support 1000 APs no?

The dashboard states Availability as "FFO"

We're running services in both B@AP and B@EWC, we're not using any routed services.

We're also are getting weird errors randomly, syntax errors when accessing randomly, tabs in the gui via a browse and then it boots you out and you can't log back in for a period of time. This make no difference if you access via the admin port interface or the physical interface. If i access via the CLI its very slow and then i get messages running commands like below;

Error: invalid socket number
Error: cannot send message
Error: in receiving message.
(Edited)
Photo of Ronald Dvorak

Ronald Dvorak, Embassador

  • 45,306 Points 20k badge 2x thumb
Hey Ian,

I'd advice to open a GTAC ticket and work with Gareth (GTAC UK) on the issues - he'd take a closer look via remote to check the overal "health" of the controller.

There is no general rule to do split AP design or if possible have all APs only home to one controller - it's the preference of the engineer/partner - I like it simple so I try to have all on one if the network requirments allow it.
Photo of Ian Broadway

Ian Broadway

  • 1,572 Points 1k badge 2x thumb
is there anything I can do to check the health, any commands from the cli?
Photo of Gareth Mitchell

Gareth Mitchell, Extreme Escalation Support Engineer

  • 5,476 Points 5k badge 2x thumb
Ian

As stated previously we really need to see the logs, you could engage your partner to see if they can help, and/or raise a case with GTAC.

On the admin port question, please see this article: https://gtacknowledge.extremenetworks.com/articles/Q_A/Should-the-Admin-port-on-an-IdentiFi-wireless-controller-be-used-during-normal-operation

-Gareth
Photo of Ian Broadway

Ian Broadway

  • 1,572 Points 1k badge 2x thumb
Hi Gareth,

I will try this then first, see if the behaviour changes.
Photo of Ian Broadway

Ian Broadway

  • 1,572 Points 1k badge 2x thumb
Hmm still no joy. Was looking good when I made the changes and then deleting the AP from local controller removed it also from the foreign one. However when adding it back, the info of the AP does not sync to the other controller :(. I will raise the issue with my partner and subsequently GTAC.
Photo of Ian Broadway

Ian Broadway

  • 1,572 Points 1k badge 2x thumb
So after a few days of head scratching and investigation, the issue turned out to be "jumbo frames support" was enabled on the controllers yet the physical ports linking the controllers together through the network did not all have jumbo frames enabled. This caused management traffic issues. We didn't have a requirement to use Jumbo frames and it was recommended that this be disabled.

All good now.
Photo of JK

JK

  • 160 Points 100 badge 2x thumb