cancel
Showing results for 
Search instead for 
Did you mean: 

WiNG AP keep losing adopted status

WiNG AP keep losing adopted status

cosmingrosu
New Contributor

Hello,

 

Recently, I’ve noticed that some of our APs (2 or 3 out of 41) are losing their “adopted” status, and then they restart the whole adoption process, which in turn is completed with no errors. I’ve searched the logs and the event history but I found nothing that could help me except for the following logs:

  • 2021-04-27 09:40:31:Received OK from cfgd, adoption complete to 19.F7.E4.0D
  • 2021-04-27 09:40:31:Waiting for cfgd OK, adopter should be 19.F7.E4.0D
  • 2021-04-27 09:40:31:Adoption state change: 'Connecting to adopter' to 'Waiting for Adoption OK'
  • 2021-04-27 09:40:28:Adoption state change: 'No adopters found' to 'Connecting to adopter'
  • 2021-04-27 09:40:28:Try to adopt to 19.F7.E4.0D (cluster master 19.F7.E4.0D in adopters)
  • 2021-04-27 09:40:28:MLCP created VLAN link on VLAN 99, offer from B4-C7-99-F7-E4-0D
  • 2021-04-27 09:40:28:Sending MLCP Request to B4-C7-99-F7-E4-0D vlan 99
  • 2021-04-27 09:40:04:Adoption state change: 'Waiting to retry' to 'No adopters found'
  • 2021-04-27 09:39:54:cfgd notified dpd2 of unadoption, restart adoption after 10 seconds
  • 2021-04-27 09:39:54:Adoption state change: 'Adopted' to 'Waiting to retry'
  • 2021-04-27 09:39:54:Adopter 19.F7.E4.0D is no longer reachable, cfgd notified
  • 2021-04-27 09:39:54:All adopters lost, restarting MLCP
  • 2021-04-27 09:39:53:MLCP link vlan-99 offerer 19.F7.E4.0D lost, restarting discovery

Vlan 99 is the virtual interface we use on APs and controller for management purposes. If I understand correctly, the connection between the AP and the controller on this VLAN keeps failing for some reason. All networking devices are permanently monitored (icmp ping every 5 seconds) and I could found no downtimes on both the AP or the controller, so for all I know, the connection shouldn’t have failed at any moment. There are no differences in configuration between these APs and all the others, so this is a dead end too.

Could you please advise me regarding why these APs keep losing their adopted status?

 

The APs models are AP7522 and AP7532 running firmware 5.9.1.2-006R and the controller model is RFS4000

 

1 ACCEPTED SOLUTION

ckelly
Extreme Employee

Possibly a congestion issue then on that VLAN? Are you able to prioritize the MINT traffic? (EF, DSCP 46)

 

Is it possible that there’s an network MTU limitation between just these 2-3 APs and the controller? (And the limitation doesn’t come into play for the rest of the APs?)

To test MINT MTU, run this:

#ping <controller IP> size 1500 dont-fragment

If the replies fail, then so will the MINT traffic, which is 1500 bytes by default and can be adjusted as needed. The MTU value is changed within the same mint policy section shown below. The command is just: mtu <value>

 

You can also change the MINT priority for MINT devices (controllers and APs).

To change, go into the mint-policy on the controller.

#config

#(config) mint-policy global-default

#(config-mint-policy-global-default)#router packet priority 5

 

There’s also a way to relax the timeout settings that causes an AP to interpret when it has lost its adoption. Check these other things first though.

 

 

 

View solution in original post

5 REPLIES 5

ckelly
Extreme Employee

In this case, there’s likely an issue with MINT traffic since MINT is running at a default size of 1500 bytes. If MINT traffic can’t pass reliably, you WILL have adoption issues, along with other management related problems. The MINT traffic allows the management plane between the controller and APs to work.

Try setting the MINT MTU size to 1464 (1472-8 just to give it a little wiggle room, just in case).

You can do this by:

login to controller:

enable

#config

#mint-policy global-default

(config-mint-policy-global-default)#mtu 1464

commit write

 

You can then test this by performing a MINT PING.

From either the AP or controller (any MINT capable device) run the mint ping command and specify the MINT ID of the destination device. Instead of using ICMP, this PING will test connectivity using the MINT protocol itself.

To get a devices MINT ID, from that device, run the command: show mint id

It’s in the format AA.BB.CC.DD

 

#mint ping <MINT ID>  (There are options after the MINT ID, but not needed for this)

If the MINT PING fails, try lowering that MINT MTU value again another 8 bytes and re-run the MINT PING test.

cosmingrosu
New Contributor


Sorry for the late reply.

I monitored the link between one of the APs where this problem occured, and the controller and there is no congestion on the VLAN access ports or the trunk port. The monitoring history shows the same thing.

However the ping test failed. The maximum MTU size I can set for the ping to succeed is 1472 bytes, which would amount for a total size of the Ethernet frame of 1500 bytes. I see there is an overhead of  28 bytes? and I don’t understand why this much since an Ethernet frame should have a maximum size of 1522 bytes considering the 802.1Q tag. I believe this is why ping fails. I also ran this test from 2 cisco switches using the command “ping <ip> size 1500 df-bit” and the test was successful.

ckelly
Extreme Employee

Possibly a congestion issue then on that VLAN? Are you able to prioritize the MINT traffic? (EF, DSCP 46)

 

Is it possible that there’s an network MTU limitation between just these 2-3 APs and the controller? (And the limitation doesn’t come into play for the rest of the APs?)

To test MINT MTU, run this:

#ping <controller IP> size 1500 dont-fragment

If the replies fail, then so will the MINT traffic, which is 1500 bytes by default and can be adjusted as needed. The MTU value is changed within the same mint policy section shown below. The command is just: mtu <value>

 

You can also change the MINT priority for MINT devices (controllers and APs).

To change, go into the mint-policy on the controller.

#config

#(config) mint-policy global-default

#(config-mint-policy-global-default)#router packet priority 5

 

There’s also a way to relax the timeout settings that causes an AP to interpret when it has lost its adoption. Check these other things first though.

 

 

 

cosmingrosu
New Contributor


Nope, they are exaclty 4 switches apart from each other

GTM-P2G8KFN