cancel
Showing results for 
Search instead for 
Did you mean: 

adoption lost after 20 seconds. Layer 3 connectivity normal

adoption lost after 20 seconds. Layer 3 connectivity normal

RWCampbell
New Contributor
Randomly it seems that 1/3rd of our AP's have become un-adopted and no longer function. We have found that when restarting the controller, all the AP's reconnect. But a subset of them un-adopt after 20 seconds or so. Perhaps notably the controller is running v5.5. We want to upgrade this to the latest possible, but not sure how to get the software.

As far as we can tell, the main reason they are un-adopted is we cannot ping Mint ping the other device, and presumedly we can't access the MAC. We've compared configs of working AP's and non-working, and they're identical save the normal variables like names and IP's (minor variations). To our knowledge nothing changed to precipitate this change. The system was used normally over the weekend and the specific AP's were not working this morning.

Any idea what would make the layer 2 communication/Mint communication not work?

-----

Below is a CLI story of the main points that seem to be occurring with one of the APs. Below that is one of the AP configs. Any help would be greatly appreciated.


Controller: RFS-6010-1000-WR
Base ethernet MAC address is B4-C7-99-6D-B7-76
Mint ID: 19.6D.B7.76
IP Address: 10.200.17.10

AP: AP-6532-66040-US
Base ethernet MAC address is 84-24-8D-81-9C-88
Mint ID: 4D.81.9C.88
IP Address: 10.200.17.33

# debugs (from controller)

RFS-SW01# sh mint mlcp his

2018-10-25 11:54:15:cfgd unadopted 4D.81.9C.88
2018-10-25 11:54:15:Unadopted 84-24-8D-81-9C-88 (4D.81.9C.88), cfgd not notified
2018-10-25 11:54:15:Unadopting 84-24-8D-81-9C-88 (4D.81.9C.88) because it is unreachable
2018-10-25 11:53:59:Adopted 84-24-8D-81-9C-88 (4D.81.9C.88), cfgd notified

RFS-SW01#ping 10.200.17.33
PING 10.200.17.33 (10.200.17.33) 100(128) bytes of data.
108 bytes from 10.200.17.33: icmp_seq=1 ttl=64 time=3.99 ms
108 bytes from 10.200.17.33: icmp_seq=2 ttl=64 time=0.410 ms
108 bytes from 10.200.17.33: icmp_seq=3 ttl=64 time=0.359 ms
108 bytes from 10.200.17.33: icmp_seq=4 ttl=64 time=0.363 ms

--- 10.200.17.33 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3004ms
rtt min/avg/max/mdev = 0.359/1.281/3.995/1.567 ms
RFS-SW01#mint ping 4D.81.9C.88
MiNT ping 4D.81.9C.88 with 64 bytes of data.
Ping request 1 timed out. No response from 4D.81.9C.88
Ping request 2 timed out. No response from 4D.81.9C.88
Ping request 3 timed out. No response from 4D.81.9C.88

--- 4D.81.9C.88 ping statistics ---
3 packets transmitted, 0 packets received, 100% packet loss
RFS-SW01#

RFS-SW01#show adoption offline
-----------------------------------------------------------------------------------------------------------------------------
MAC HOST-NAME TYPE RF-DOMAIN TIME OFFLINE CONNECTED-TO
-----------------------------------------------------------------------------------------------------------------------------
84-24-8D-81-9C-88 AP23 ap6532 TEMP DC 0:05:27
-----------------------------------------------------------------------------------------------------------------------------

# debugs (from ap)

AP23#show adoption status
Adopted by:
Type : RFS6000
System Name : RFS-SW01
MAC address : B4-C7-99-6D-B7-76
MiNT address : 19.6D.B7.76
Time : 0 days 00:03:07 ago

AP23#show mint mlcp history
2018-10-25 11:53:58:Received 0 hostnames through option 191
2018-10-25 11:53:57:Received OK from cfgd, adoption complete to 19.6D.B7.76
2018-10-25 11:53:56:Waiting for cfgd OK, adopter should be 19.6D.B7.76
2018-10-25 11:53:56:Adoption state change: 'Connecting to adopter' to 'Waiting for Adoption OK'
2018-10-25 11:53:53:Adoption state change: 'No adopters found' to 'Connecting to adopter'
2018-10-25 11:53:53:Try to adopt to 19.6D.B7.76 (cluster master 00.00.00.00 in adopters)
2018-10-25 11:53:52:Received 0 hostnames through option 191
2018-10-25 11:53:52:Adoption state change: 'Disabled' to 'No adopters found'
2018-10-25 11:53:52:DNS resolution completed, starting MLCP
2018-10-25 11:53:52:Adoption enabled due to configuration

AP23#ping 10.200.17.10
PING 10.200.17.10 (10.200.17.10) 100(128) bytes of data.
108 bytes from 10.200.17.10: icmp_seq=1 ttl=64 time=4.53 ms
108 bytes from 10.200.17.10: icmp_seq=2 ttl=64 time=0.355 ms
^C
--- 10.200.17.10 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 0.355/2.443/4.531/2.088 ms
AP23#mint ping 19.6D.B7.76
MiNT ping 19.6D.B7.76 with 64 bytes of data.
Ping request 1 timed out. No response from 19.6D.B7.76
Ping request 2 timed out. No response from 19.6D.B7.76
Ping request 3 timed out. No response from 19.6D.B7.76

--- 19.6D.B7.76 ping statistics ---
3 packets transmitted, 0 packets received, 100% packet loss
AP23#

-----
code:
version 2.3
!
!
ip snmp-access-list default
permit any
!
firewall-policy default
no ip dos tcp-sequence-past-window
alg sip
!
!
mint-policy global-default
!
wlan-qos-policy default
qos trust dscp
qos trust wmm
!
radio-qos-policy default
!
wlan "WMS SSID"
description WMS RF Environment
ssid TEMP-WMS-RF
vlan 1
bridging-mode tunnel
encryption-type tkip-ccmp
authentication-type none
wpa-wpa2 psk 0 XXXXXXXXXX
service wpa-wpa2 exclude-ccmp
!
smart-rf-policy "TEMP DC Smart RF"
sensitivity custom
assignable-power 2.4GHz max 14
assignable-power 2.4GHz min 11
smart-ocs-monitoring client-aware 2.4GHz 1
!
!
management-policy default
no http server
https server
ssh
user admin password 1 XXXXXX role superuser access all
snmp-server community 0 private rw
snmp-server community 0 public ro
snmp-server user snmptrap v3 encrypted des auth md5 0 motorola
snmp-server user snmpmanager v3 encrypted des auth md5 0 motorola
!
profile ap6532 default-ap6532
ip name-server 10.200.16.12
ip name-server 10.200.16.11
ip domain-name TEMP.com
autoinstall configuration
autoinstall firmware
crypto ikev1 policy ikev1-default
isakmp-proposal default encryption aes-256 group 2 hash sha
crypto ikev2 policy ikev2-default
isakmp-proposal default encryption aes-256 group 2 hash sha
crypto ipsec transform-set default esp-aes-256 esp-sha-hmac
crypto ikev1 remote-vpn
crypto ikev2 remote-vpn
crypto auto-ipsec-secure
crypto load-management
crypto remote-vpn-client
interface radio1
wlan "WMS SSID" bss 1 primary
interface radio2
shutdown
interface ge1
ip dhcp trust
qos trust dscp
qos trust 802.1p
interface vlan1
ip address dhcp
ip address zeroconf secondary
ip dhcp client request options all
interface pppoe1
use firewall-policy default
rf-domain-manager capable
logging on
service pm sys-restart
router ospf
!
rf-domain "TEMP DC"
location "TEMP DC"
contact "Velociti Inc."
timezone America/Chicago
country-code us
use smart-rf-policy "TEMP DC Smart RF"
channel-list dynamic
channel-list 2.4GHz 1,6,11
control-vlan 1
!
ap6532 84-24-8D-81-9C-88
use profile default-ap6532
use rf-domain "TEMP DC"
hostname AP23
interface radio1
power 8
interface vlan1
ip address 10.200.17.33/21
!
!
end
17 REPLIES 17

ckelly
Extreme Employee
Robert,
Quickly, regarding the EOSL, there is a phased approach to the products and how engineering resources are applied. At some number of years after a new product is deployed, firmware stops including new FEATURES...but firmware continues to be released for it...as needed. This includes for things major bugs discovered or security related issues found.
The last phase is where all engineering development stops. It's at this point that bugs are no longer fixed...and as far as I know, even security related issues are not addressed.

Regarding what sounds like MINT traffic loop, it would seem that there's a misconfiguration of the MINT levels with the network topology in place.
We don't know what your physical/logical deployment looks like, but Chris Frazee made some excellent suggestions about how to possibly set things up correctly. It just depends on how the controller and APs are actually deployed on the network.

To maybe help you see how your system might be mis-configured and causing loops, here's some quick details of the three ways that systems should be setup.
1) Distributed
2) Centralized
3) Centralized with controller managed RF-Domains

In your config, we see a control-vlan defined in the RF-Domain that the AP is assigned to use. As Chris F. alluded to, this would normally only be defined in a Distributed style deployment (#1) (WiNG controller in the NOC somewhere assigned to its own RF-Domain all by itself - and APs then placed into their own RF-Domains that represent remote stores/sites over a WAN connection).

The control-vlan is the VLAN that you are assigning to an RF-Domain is will be the VLAN that the APs use specifically for their MINT traffic so they can talk to each and pass information to each other. It's not uncommon, nor technically disallowed, for customers to use a regular user data VLAN for the control-vlan (as long as that data VLAN is not a common broadcast domain across multiple sites). But, if you want to do things 'right', you create and assign a separate VLAN at a site and then make that your control-vlan in the RF-Domain config where those APs exist. And to be clear, you can create and use the same VLAN for the control-vlan at multiple remote sites....as long as that VLAN cannot reach between each of the remote sites (is not common to multiple sites). The whole idea is to keep MINT traffic from one remote site from intermingling with another remote site's MINT traffic (Ends up creating very large LSP-DB tables!). This remote site MINT traffic that the APs use to talk to each other is referred to as MINT level-1 traffic. It's VERY chatty - LSPs exchanged between WING devices. At this point, one of the APs at the remote site will be elected the RF-Domain manager and will be the one single AP that forms a connection back to the NOC controller. This AP will also then form a single level-2 MINT connection to the NOC controller. All of that chatty level-1 MINT traffic cannot pass beyond that RF-Domain/site over the level-2 MINT connection back to the NOC controller. The site's LSPs cannot pass over this level-2 MINT connection. So you end up with just a SINGLE level-2 MINT connection from each remote site back to the controller. That's it. All of the other APs at a site get their instructions and pass back their data to the controller via the elected RF-Domain manager AP...and acts as a sort of proxy for each site. This all then creates MINT isolation between the remote sites.

Now....all of this is describing how things work for a Distributed deployment.

If you have a Centralized deployment (#2), that's for situations like a campus or a single building (no remote over-the-WAN AP connections). In the setup, you have just a single RF-Domain defined - and it contains the controller(s) and the APs. That's it. In that case, there's no need for a control-vlan...why? because each AP is going to form its individual MINT level-1 connection back to the controller. This also means then that the controller becomes the RF-Domain manager vs an AP winning the election and being the 'site's RF-Domain manager'. There's no level-2 MINT involved in this scenario. In this case though, it's strongly recommended that if there are more than 100 APs involved that you use IP based level-1 MINT links vs VLAN-based level-1 MINT.

You can also have a Campus style deployment where you NEED to have multiple RD-Domains representing differently buildings on the campus (#3). This is possible too. In this scenario, Each AP once again has their own level-1 MINT link connection back to the controller...but it's always IP-based. Also, differently from the Distributed architecture (#1) where each remote site has an elected RF-Domain manager AP, in this scenario, you configure the controller to be the RF-Domain manager for the different RF-Domains you've created to represent the buildings. This puts more burden on the controller though because now it's having to do all the work/calculations for the different RF-Domains (which is normally done by the elected RF-Domain manager AP at each site) so there is a limit to the number of RF-Domains that a controller can 'manage' and depends on the controller's hardware level. And again, no control-vlan is used in this scenario either, even though APs are operating in their own separate RF-Domains (like #1) because those RF-Domains have been configured to be 'controller-managed'.

So based on all of this, which one seems to fit your actual deployment?
The fact that the RF-Domain in your config that you supplied has a defined control-vlan, that insinuates that you have a Distributed architecture (#1). Does that sound correct based on the description though?

RWCampbell
New Contributor
It's defined, but it's defined to vlan 1, so it's not logically separated. To my understanding it's adopting over L2 and that's the entire problem (so far as we can tell...). AP's are statically assigned L3 addresses and I assume the host entry.

The AP's are local and we can do that, but the working aps have that defined as well, as well as vlan 1 is the default vlan. So I'm not sure that will have an effect, but we can try it.

Velocity was the original installer, and looking over how this was set up, I do think that there were a lot of problems with how they configured these things. Recently we discovered that with this client there were two loops introduced to the network, which the switches dealt with, BUT it seemed that the switches couldn't deal with it when the MINT broadcast was what was coming in. For the last month we've been dealing with periodic episodes with the network being saturated with broadcast packets from this proprietary protocol. Not sure the specific circumstances that caused this intermittent event, but this is why we wanted to get the latest firmware we could for these things.

I'm not a fan of the idea that I need to purchase support to get firmware updates. Maybe major firmware updates with major feature adds. It doesn't seem appropriate to pay for windows updates, and it seems the same in this case. Not trying to argue, but that strikes me as off especially in a day and age when wireless compromises are found/developed so often. Not really relevant to this inquiry, but it is something that I noted.

Finally what are the implications of EOSL? Presumedly firmware is not under development if there are vulnerabilities developed? Is it still supportable under entitlement at all? The end user doesn't have a relationship with velosity anymore, is this something that I can resell to them as their current support consultant? I recently registered with Extreme partner portal.

~Robert

Christopher_Fra
Extreme Employee
Since the rf-domain has control-vlan defined, I assume that the APs are adopting over L3? If adopting over L3, are you using MiNT level 2 with DHCP option 191 or statically assigning controller host entry?

If APs are local to RFS and adopting over L2, I would remove the the control-vlan parameter from the rf-domain.

If APs are remote to RFS and adopting over L3, I would have the RFS in its own/separate rf-domain and leave the APs in the current rf-domain with control vlan (remote VLAN that APs use to get out). Remote site AP should be using MiNT level 2.

We have many useful documents for different deployments and you may want to get a support case generated if under entitlement. If not currently under entitlement, please reach out to your re-seller for entitlement options.

RFS6000/AP6532s have been EOSL for some time and last supported firmware release is v5.9.1.5 for these models. You must be under entitlement to have access to the firmware.
GTM-P2G8KFN