Recently, I’ve noticed that some of our APs (2 or 3 out of 41) are losing their “adopted” status, and then they restart the whole adoption process, which in turn is completed with no errors. I’ve searched the logs and the event history but I found nothing that could help me except for the following logs:
- 2021-04-27 09:40:31:Received OK from cfgd, adoption complete to 19.F7.E4.0D
- 2021-04-27 09:40:31:Waiting for cfgd OK, adopter should be 19.F7.E4.0D
- 2021-04-27 09:40:31:Adoption state change: 'Connecting to adopter' to 'Waiting for Adoption OK'
- 2021-04-27 09:40:28:Adoption state change: 'No adopters found' to 'Connecting to adopter'
- 2021-04-27 09:40:28:Try to adopt to 19.F7.E4.0D (cluster master 19.F7.E4.0D in adopters)
- 2021-04-27 09:40:28:MLCP created VLAN link on VLAN 99, offer from B4-C7-99-F7-E4-0D
- 2021-04-27 09:40:28:Sending MLCP Request to B4-C7-99-F7-E4-0D vlan 99
- 2021-04-27 09:40:04:Adoption state change: 'Waiting to retry' to 'No adopters found'
- 2021-04-27 09:39:54:cfgd notified dpd2 of unadoption, restart adoption after 10 seconds
- 2021-04-27 09:39:54:Adoption state change: 'Adopted' to 'Waiting to retry'
- 2021-04-27 09:39:54:Adopter 19.F7.E4.0D is no longer reachable, cfgd notified
- 2021-04-27 09:39:54:All adopters lost, restarting MLCP
- 2021-04-27 09:39:53:MLCP link vlan-99 offerer 19.F7.E4.0D lost, restarting discovery
Vlan 99 is the virtual interface we use on APs and controller for management purposes. If I understand correctly, the connection between the AP and the controller on this VLAN keeps failing for some reason. All networking devices are permanently monitored (icmp ping every 5 seconds) and I could found no downtimes on both the AP or the controller, so for all I know, the connection shouldn’t have failed at any moment. There are no differences in configuration between these APs and all the others, so this is a dead end too.
Could you please advise me regarding why these APs keep losing their adopted status?
The APs models are AP7522 and AP7532 running firmware 126.96.36.199-006R and the controller model is RFS4000
Best answer by Chris Kelly
Possibly a congestion issue then on that VLAN? Are you able to prioritize the MINT traffic? (EF, DSCP 46)
Is it possible that there’s an network MTU limitation between just these 2-3 APs and the controller? (And the limitation doesn’t come into play for the rest of the APs?)
To test MINT MTU, run this:
#ping <controller IP> size 1500 dont-fragment
If the replies fail, then so will the MINT traffic, which is 1500 bytes by default and can be adjusted as needed. The MTU value is changed within the same mint policy section shown below. The command is just: mtu <value>
You can also change the MINT priority for MINT devices (controllers and APs).
To change, go into the mint-policy on the controller.
#(config) mint-policy global-default
#(config-mint-policy-global-default)#router packet priority 5
There’s also a way to relax the timeout settings that causes an AP to interpret when it has lost its adoption. Check these other things first though.