Extreme Networks

RWCampbell · ‎09-16-2019

Randomly it seems that 1/3rd of our AP's have become un-adopted and no longer function. We have found that when restarting the controller, all the AP's reconnect. But a subset of them un-adopt after 20 seconds or so. Perhaps notably the controller is running v5.5. We want to upgrade this to the latest possible, but not sure how to get the software.

As far as we can tell, the main reason they are un-adopted is we cannot ping Mint ping the other device, and presumedly we can't access the MAC. We've compared configs of working AP's and non-working, and they're identical save the normal variables like names and IP's (minor variations). To our knowledge nothing changed to precipitate this change. The system was used normally over the weekend and the specific AP's were not working this morning.

Any idea what would make the layer 2 communication/Mint communication not work?

-----

Below is a CLI story of the main points that seem to be occurring with one of the APs. Below that is one of the AP configs. Any help would be greatly appreciated.

Controller: RFS-6010-1000-WR
Base ethernet MAC address is B4-C7-99-6D-B7-76
Mint ID: 19.6D.B7.76
IP Address: 10.200.17.10

AP: AP-6532-66040-US
Base ethernet MAC address is 84-24-8D-81-9C-88
Mint ID: 4D.81.9C.88
IP Address: 10.200.17.33

# debugs (from controller)

RFS-SW01# sh mint mlcp his

2018-10-25 11:54:15:cfgd unadopted 4D.81.9C.88
2018-10-25 11:54:15:Unadopted 84-24-8D-81-9C-88 (4D.81.9C.88), cfgd not notified
2018-10-25 11:54:15:Unadopting 84-24-8D-81-9C-88 (4D.81.9C.88) because it is unreachable
2018-10-25 11:53:59:Adopted 84-24-8D-81-9C-88 (4D.81.9C.88), cfgd notified

RFS-SW01#ping 10.200.17.33
PING 10.200.17.33 (10.200.17.33) 100(128) bytes of data.
108 bytes from 10.200.17.33: icmp_seq=1 ttl=64 time=3.99 ms
108 bytes from 10.200.17.33: icmp_seq=2 ttl=64 time=0.410 ms
108 bytes from 10.200.17.33: icmp_seq=3 ttl=64 time=0.359 ms
108 bytes from 10.200.17.33: icmp_seq=4 ttl=64 time=0.363 ms

--- 10.200.17.33 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3004ms
rtt min/avg/max/mdev = 0.359/1.281/3.995/1.567 ms
RFS-SW01#mint ping 4D.81.9C.88
MiNT ping 4D.81.9C.88 with 64 bytes of data.
Ping request 1 timed out. No response from 4D.81.9C.88
Ping request 2 timed out. No response from 4D.81.9C.88
Ping request 3 timed out. No response from 4D.81.9C.88

--- 4D.81.9C.88 ping statistics ---
3 packets transmitted, 0 packets received, 100% packet loss
RFS-SW01#

RFS-SW01#show adoption offline
-----------------------------------------------------------------------------------------------------------------------------
MAC HOST-NAME TYPE RF-DOMAIN TIME OFFLINE CONNECTED-TO
-----------------------------------------------------------------------------------------------------------------------------
84-24-8D-81-9C-88 AP23 ap6532 TEMP DC 0:05:27
-----------------------------------------------------------------------------------------------------------------------------

# debugs (from ap)

AP23#show adoption status
Adopted by:
Type : RFS6000
System Name : RFS-SW01
MAC address : B4-C7-99-6D-B7-76
MiNT address : 19.6D.B7.76
Time : 0 days 00:03:07 ago

AP23#show mint mlcp history
2018-10-25 11:53:58:Received 0 hostnames through option 191
2018-10-25 11:53:57:Received OK from cfgd, adoption complete to 19.6D.B7.76
2018-10-25 11:53:56:Waiting for cfgd OK, adopter should be 19.6D.B7.76
2018-10-25 11:53:56:Adoption state change: 'Connecting to adopter' to 'Waiting for Adoption OK'
2018-10-25 11:53:53:Adoption state change: 'No adopters found' to 'Connecting to adopter'
2018-10-25 11:53:53:Try to adopt to 19.6D.B7.76 (cluster master 00.00.00.00 in adopters)
2018-10-25 11:53:52:Received 0 hostnames through option 191
2018-10-25 11:53:52:Adoption state change: 'Disabled' to 'No adopters found'
2018-10-25 11:53:52:DNS resolution completed, starting MLCP
2018-10-25 11:53:52:Adoption enabled due to configuration

AP23#ping 10.200.17.10
PING 10.200.17.10 (10.200.17.10) 100(128) bytes of data.
108 bytes from 10.200.17.10: icmp_seq=1 ttl=64 time=4.53 ms
108 bytes from 10.200.17.10: icmp_seq=2 ttl=64 time=0.355 ms
^C
--- 10.200.17.10 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 0.355/2.443/4.531/2.088 ms
AP23#mint ping 19.6D.B7.76
MiNT ping 19.6D.B7.76 with 64 bytes of data.
Ping request 1 timed out. No response from 19.6D.B7.76
Ping request 2 timed out. No response from 19.6D.B7.76
Ping request 3 timed out. No response from 19.6D.B7.76

--- 19.6D.B7.76 ping statistics ---
3 packets transmitted, 0 packets received, 100% packet loss
AP23#

-----

code:

version 2.3
!
!
ip snmp-access-list default
 permit any
!
firewall-policy default
 no ip dos tcp-sequence-past-window
 alg sip
!
!
mint-policy global-default
!
wlan-qos-policy default
 qos trust dscp
 qos trust wmm
!
radio-qos-policy default
!
wlan "WMS SSID"
 description WMS RF Environment
 ssid TEMP-WMS-RF
 vlan 1
 bridging-mode tunnel
 encryption-type tkip-ccmp
 authentication-type none
 wpa-wpa2 psk 0 XXXXXXXXXX
 service wpa-wpa2 exclude-ccmp
!
smart-rf-policy "TEMP DC Smart RF"
 sensitivity custom
 assignable-power 2.4GHz max 14
 assignable-power 2.4GHz min 11
 smart-ocs-monitoring client-aware 2.4GHz 1
!
!
management-policy default
 no http server
 https server
 ssh
 user admin password 1 XXXXXX role superuser access all
 snmp-server community 0 private rw
 snmp-server community 0 public ro
 snmp-server user snmptrap v3 encrypted des auth md5 0 motorola
 snmp-server user snmpmanager v3 encrypted des auth md5 0 motorola
!
profile ap6532 default-ap6532
 ip name-server 10.200.16.12
 ip name-server 10.200.16.11
 ip domain-name TEMP.com
 autoinstall configuration
 autoinstall firmware
 crypto ikev1 policy ikev1-default 
  isakmp-proposal default encryption aes-256 group 2 hash sha 
 crypto ikev2 policy ikev2-default 
  isakmp-proposal default encryption aes-256 group 2 hash sha 
 crypto ipsec transform-set default esp-aes-256 esp-sha-hmac
 crypto ikev1 remote-vpn
 crypto ikev2 remote-vpn
 crypto auto-ipsec-secure
 crypto load-management
 crypto remote-vpn-client
 interface radio1
  wlan "WMS SSID" bss 1 primary
 interface radio2
  shutdown
 interface ge1
  ip dhcp trust
  qos trust dscp
  qos trust 802.1p
 interface vlan1
  ip address dhcp
  ip address zeroconf secondary
  ip dhcp client request options all
 interface pppoe1
 use firewall-policy default
 rf-domain-manager capable
 logging on
 service pm sys-restart
 router ospf
!
rf-domain "TEMP DC"
 location "TEMP DC"
 contact "Velociti Inc."
 timezone America/Chicago
 country-code us
 use smart-rf-policy "TEMP DC Smart RF"
 channel-list dynamic
 channel-list 2.4GHz 1,6,11
 control-vlan 1
!
ap6532 84-24-8D-81-9C-88
 use profile default-ap6532
 use rf-domain "TEMP DC"
 hostname AP23
 interface radio1
  power 8
 interface vlan1
  ip address 10.200.17.33/21
!
!
end

RWCampbell · ‎09-17-2019

Gentlemen,

Thank you much for engaging so thoroughly on this issue. We were able to get the profile changed and reset the devices so that they'd be adopted via layer 3 communication and they came back online and started functioning for the client.

They're quite thankful that they will not have to use the slow processes in the freezer anymore.

We're going to continue the conversation with the sales support people about the possibility of getting a support entitlement set up for this system. We shall see. Thanks again!

~Robert

ckelly · ‎09-17-2019

The adoption MINT level is controlled primarily by the 'controller host' statement.
So that statement would look like this for the level-1 or 2 setup:

controller host 10.200.17.10 pool 1 level 1 (if you omit the "pool 1 level 1" those values are assumed)
commit write

It's that "level 1" that indicates the MINT level that should be used and how the AP is going to adopt*. This entry could be placed into the AP's Profile or in the AP's 'override' section. Either is fine. Just make sure you understand the difference.

Simply having the controller this host statement automatically means that you're indicating IP-based adoption (Could also be the case if you've setup DHCP Option 191 or DNS-based adoption). If it was VLAN-based adoption, you wouldn't even need to include the controller host statement. The APs would simply locate on their own the controller on the same management VLAN and attempt to adopt.

*To disable layer-2 discovery for the APs (because in order of preference, the APs will 1st look for a controller using layer-2...so if they can find one - even though they have a layer-3 controller host entry, they'll still go ahead and adopt via layer-2) and force them to only adopt via layer-3, go into the APs Profile or its override section and issue the command:
no mint mlcp vlan (mlcp is 'mint link creation protocol)
commit write

So the two things you need are the controller host statement and the negation of mint mlcp vlan.

...and before I forget, make sure that if you have a controller cluster that it's also setup using MINT level-1. In addition, you don't want to have (it's not supported) any sort of mixture of APs or controller cluster MINT levels. If you have APs adopted MINT level-1, then EVERYTHING everywhere should be using MINT level-1. Same with MINT level-2.

(If you do have a controller cluster, you can verify the MINT level used for the cluster by running the command:
show cluster status
Look at the first output labeled: "Protocol Version". It should be "1", meaning cluster is formed using MINT level-1.

Also, ensure that the "controller vlan" option is not being used. This is NOT the same thing as the "control vlan" setting. The "controller vlan" setting is only used is certain situations. (APs and controllers share multiple common VLANs and the APs are adopted using layer 2).

Also, with doing layer-3 adoption (should've specified this bit in the earlier post, sorry) make sure that there are no ACLs between the APs and controller that would block UDP port 24576. This is what MINT will be using. If it's blocked, the APs won't be able to adopt.

RWCampbell · ‎09-17-2019

So when you say " Better option though is IP-based adoption", are you saying with sticking with our current setup of #1 distribution? Right now all devices have a static IP, so if it is okay, we are fine to communicate that way. We want the easiest path to get these adopted.
1) Under the default profile would we add

code:

controller host 10.200.17.10

(the IP is the controller IP)
2) What is the CLI to manually configure each AP with the controller's IP address that are not adopted? A reset might work, as it seems the cfg is being pushed successfully
3) What is the CLI to disable MINT MLCP VLAN?

ckelly · ‎09-17-2019

Okay...so it does sound like #2 is where you want to be.

So at the very least, this is what you want to ensure is setup:
1) Have only one RF-Domain created
2) The controller(s) and APs are all assigned to this one RF-Domain
3) The setup for the RF-Domain does NOT have a defined control-vlan.
4) APs and controller need a common management VLAN. This will allow the APs to automatically discover the controller on the VLAN and adopt (this would be VLAN-based layer-2 discovery and adoption). If you cannot have a common management VLAN, then you have no choice but to implement IP-based adoption (see #5 below).
5) Based on the number of APs (40), it's okay to simply have the APs adopted via layer 2 (VLAN) vs IP-based. (don't need controller managed setup either) Better option though is IP-based adoption (APs will need IP addresses obviously) and you'd need to either manually configure each AP with the controller's IP address so it knows where to go to adopt...or setup DHCP Option 191...or if the APs are already adopted now, you can simply modify the AP Profile to include the controller's IP address and let that config get pushed out to the APs as part of a regular AP config update. At this point, also disable 'MINT MLCP VLAN' on the AP Profiles as well since they won't be using it any longer with IP-based adoption.

Nothing needed on the switches other than the management VLAN and any data VLANs that are required by the WLANs that are operating on the APs.

With this Option #2 Centralized setup, the MINT traffic issue goes away. Each AP is reporting back directly to the controller itself...so it's no longer needed.

RWCampbell · ‎09-17-2019

From what you are saying this is setup as #1 distribution when it should be setup as #2 Centralized. There is 1 controller in a single building with no remote over-the-WAN APs, about 30 APs in total, and 4 to 5 switches connected with fiber. As I mentioned we did not set this up originally and kind of took it over. The issue could for sure be related to a switch loop as we have found a few network loops already. If we were to switch from #1 to #2 Centralized, would all we need to do is remove control-vlan 1 (maybe replace with controller-managed) from all the APs (default profile) and controller? What else needs to change on the controller or APs? Is there any specific switch configurations that would need to be made? Should we still add a different vlan on the switches to remove the noise MiNt traffic? Thanks for all your help!

Extreme Networks

adoption lost after 20 seconds. Layer 3 connectivity normal

adoption lost after 20 seconds. Layer 3 connectivity normal