Header Only - DO NOT REMOVE - Extreme Networks

wing 5.8.5 reports AP's as down but they are not


Userlevel 1
Hi All
We had a power outage last week, some parts of the network stayed up on the UPS and others the UPS's failed after 20 mins. since then the RFS shows that some of the AP's are down, but they have clients connected, I can get onto the AP's being shown as down via the AP's ip address, I have reset the units, but the RFS ( wing 5.8.5 ) still shows as down ?
I have also restarted the RFS units as well.
now Im confused ( Oh no I'm always confused , thats normal )

27 replies

Userlevel 3
Hi Phil the RFS showing some APs are down, is pointing to an adoption issue. If you log into the controllers CLI, ans do a "show adoption offline", do the 'DOWN APs show up?. You can also log directly into the AP and enter "show adoption st". It will tell you the AP is not adopted.

The AP will be able to handle users if all the VLANS are locally bridged. In this configuration, the WLAN does not require any of the CONTROLLER's resource.

The Adoption issue will require some investigation. If the AP is on a VLAN different from the Controller, it will nee an "controller host" entry to point back to the Controller
Userlevel 5
Phil once you're in via putty ( CLI) in the AP's/RFS also share the output of command 'show version' for both.
Userlevel 3
Hi Phil,

As Andy expressed and Rob alluded to. Please verify the AP adoption status and that they show adopted and configured.
Userlevel 4
Additional you can check if you see MINT neighbors.

show mint neighbors

MINT is used for communicating between all WiNG devices. Are all devisees using the same VLAN?
Userlevel 1
Since you had a power outage, I would start with ensuring uplinks/VLANs/trunking is all still properly configured from the switches back to the RFS.
Userlevel 3
Good point Justin, it's also possible that there were uncommitted changes made to the configuration that were lost with the power outage.

Phil, hopefully you have a backup of the configuration and can verify the settings are the same.
Userlevel 1
Hi
This is very strange, So we have a mixture of AP7532's and 7131, split accross two server rooms,
the AP7532's have all come back, but none of the AP7131's
The AP's connect to a Nortel 5520 in either server room and the switches are part of a stack. the AP's sit on VLAN 1 ( flat network )The two RFS units show up in the gui,
I have restarted teh AP7131 and the notel 5520's

The show mint neighbours
rfs7000-Backup(config)*#sh mint neighbors
5 mint neighbors of 70.38.0A.F9:
4D.80.C3.AC (ap7532-MO-Nr-HR) at level 1, best adjacency vlan-1
4D.80.C5.F4 (AP7532-ICT-B4a) at level 1, best adjacency vlan-1
4D.80.C6.24 (ap7532-B4-Stores) at level 1, best adjacency vlan-1
4D.82.BD.80 (ap7532-B4-CommsRoom) at level 1, best adjacency vlan-1
70.81.BE.8E (rfs7000-Primary) at level 1
Even the Primary RFS is no showing as down in the gui from the backup unit

The AP732's seem fine its Just the AP7131 units, although 1 unit is showing and Both RFS units are working
Userlevel 3
Sounds like your cluster could be at issue. Make sure both members of the cluster are present and it is working as expected.

From the cli:

show cluster members[/code]
Userlevel 1
Hi Andrew
the cluster is their
rfs7000-Backup* 70.38.0A.F9 00-15-70-38-0A-F9 False standby 00:01:30 ago
rfs7000-Primary 70.81.BE.8E 00-15-70-81-BE-8E True active self

But I might have a bigger problem, I set the syslog running and this is is what is comming in

Jun 30 16:02:46 172.17.146.105 2017-06-30T16:02:46.139223+01:00 rfs7000-Primary %DATAPLANE-4-DOSATTACK: IPSPOOF ATTACK: Source IP is Spoofed : Src IP : 10.0.0.138, Dst IP: 224.0.0.22, Src Mac: 58-98-35-9D-7A-44, Dst Mac: 01-00-5E-00-00-16, Proto = 2.
Jun 30 16:02:52 172.17.146.105 2017-06-30T16:02:52.698984+01:00 rfs7000-Primary %DEVICE-4-OFFLINE: Device 00-15-70-EB-7D-00(ap7131-2) is offline, last seen:10 minutes ago on switchport -
Jun 30 16:02:52 172.17.146.105 2017-06-30T16:02:52.704019+01:00 rfs7000-Primary %DEVICE-4-OFFLINE: Device 00-24-38-F3-72-00(ap7131-5) is offline, last seen:10 minutes ago on switchport -
Jun 30 16:02:52 172.17.146.105 2017-06-30T16:02:52.710750+01:00 rfs7000-Primary %DEVICE-4-OFFLINE: Device 00-15-70-EB-96-CC(ap7131-4-PC02) is offline, last seen:10 minutes ago on switchport -
Jun 30 16:03:02 172.17.146.105 2017-06-30T16:03:02.784410+01:00 rfs7000-Primary %DATAPLANE-4-ARPPOISON: ARP CACHE POISONING: Conflicting ethernet header and inner arp header :Ethernet Src Mac: 00-11-25-8E-2E-5E, Ethernet Dst Mac: FF-FF-FF-FF-FF-FF, ARP Src Mac: 00-03-FF-09-64-DE, ARP Dst Mac: 00-00-00-00-00-00, ARP Src IP: 172.17.152.4, ARP Target IP: 172.17.144.81 .
Jun 30 16:03:05 172.17.146.105 2017-06-30T16:03:05.921494+01:00 rfs7000-Primary %DATAPLANE-4-ARPPOISON: ARP CACHE POISONING: Conflicting ethernet header and inner arp header :Ethernet Src Mac: 00-11-25-8E-2E-5E, Ethernet Dst Mac: FF-FF-FF-FF-FF-FF, ARP Src Mac: 00-03-FF-9A-2E-5E, ARP Dst Mac: 00-00-00-00-00-00, ARP Src IP: 172.17.148.19, ARP Target IP: 172.17.144.71 .
Jun 30 16:03:07 172.17.146.105 2017-06-30T16:03:07.760199+01:00 rfs7000-Primary %DATAPLANE-4-ARPPOISON: ARP CACHE POISONING: Conflicting ethernet header and inner arp header :Ethernet Src Mac: 00-14-5E-A4-7D-04, Ethernet Dst Mac: FF-FF-FF-FF-FF-FF, ARP Src Mac: 00-03-FF-9D-4A-76, ARP Dst Mac: 00-00-00-00-00-00, ARP Src IP: 172.17.151.26, ARP Target IP: 172.17.144.71 .
Jun 30 16:03:24 172.17.146.105 2017-06-30T16:03:24.183555+01:00 rfs7000-Primary %DATAPLANE-4-ARPPOISON: ARP CACHE POISONING: Conflicting ethernet header and inner arp header :Ethernet Src Mac: 00-11-25-8E-2E-5E, Ethernet Dst Mac: FF-FF-FF-FF-FF-FF, ARP Src Mac: 00-03-FF-09-64-DE, ARP Dst Mac: 00-00-00-00-00-00, ARP Src IP: 172.17.152.4, ARP Target IP: 172.17.144.59 .
Jun 30 16:03:26 172.17.146.105 2017-06-30T16:03:26.885160+01:00 rfs7000-Primary %DATAPLANE-4-ARPPOISON: ARP CACHE POISONING: Conflicting ethernet header and inner arp header :Ethernet Src Mac: 00-11-25-8E-2E-5E, Ethernet Dst Mac: FF-FF-FF-FF-FF-FF, ARP Src Mac: Jun 30 16:02:46 172.17.146.105 2017-06-30T16:02:46.139223+01:00 rfs7000-Primary %DATAPLANE-4-DOSATTACK: IPSPOOF ATTACK: Source IP is Spoofed : Src IP : 10.0.0.138, Dst IP: 224.0.0.22, Src Mac: 58-98-35-9D-7A-44, Dst Mac: 01-00-5E-00-00-16, Proto = 2.
Jun 30 16:02:52 172.17.146.105 2017-06-30T16:02:52.698984+01:00 rfs7000-Primary %DEVICE-4-OFFLINE: Device 00-15-70-EB-7D-00(ap7131-2) is offline, last seen:10 minutes ago on switchport -
Jun 30 16:02:52 172.17.146.105 2017-06-30T16:02:52.704019+01:00 rfs7000-Primary %DEVICE-4-OFFLINE: Device 00-24-38-F3-72-00(ap7131-5) is offline, last seen:10 minutes ago on switchport -
Jun 30 16:02:52 172.17.146.105 2017-06-30T16:02:52.710750+01:00 rfs7000-Primary %DEVICE-4-OFFLINE: Device 00-15-70-EB-96-CC(ap7131-4-PC02) is offline, last seen:10 minutes ago on switchport -
Jun 30 16:03:02 172.17.146.105 2017-06-30T16:03:02.784410+01:00 rfs7000-Primary %DATAPLANE-4-ARPPOISON: ARP CACHE POISONING: Conflicting ethernet header and inner arp header :Ethernet Src Mac: 00-11-25-8E-2E-5E, Ethernet Dst Mac: FF-FF-FF-FF-FF-FF, ARP Src Mac: 00-03-FF-09-64-DE, ARP Dst Mac: 00-00-00-00-00-00, ARP Src IP: 172.17.152.4, ARP Target IP: 172.17.144.81 .
Jun 30 16:03:05 172.17.146.105 2017-06-30T16:03:05.921494+01:00 rfs7000-Primary %DATAPLANE-4-ARPPOISON: ARP CACHE POISONING: Conflicting ethernet header and inner arp header :Ethernet Src Mac: 00-11-25-8E-2E-5E, Ethernet Dst Mac: FF-FF-FF-FF-FF-FF, ARP Src Mac: 00-03-FF-9A-2E-5E, ARP Dst Mac: 00-00-00-00-00-00, ARP Src IP: 172.17.148.19, ARP Target IP: 172.17.144.71 .
Jun 30 16:03:07 172.17.146.105 2017-06-30T16:03:07.760199+01:00 rfs7000-Primary %DATAPLANE-4-ARPPOISON: ARP CACHE POISONING: Conflicting ethernet header and inner arp header :Ethernet Src Mac: 00-14-5E-A4-7D-04, Ethernet Dst Mac: FF-FF-FF-FF-FF-FF, ARP Src Mac: 00-03-FF-9D-4A-76, ARP Dst Mac: 00-00-00-00-00-00, ARP Src IP: 172.17.151.26, ARP Target IP: 172.17.144.71 .
Jun 30 16:03:24 172.17.146.105 2017-06-30T16:03:24.183555+01:00 rfs7000-Primary %DATAPLANE-4-ARPPOISON: ARP CACHE POISONING: Conflicting ethernet header and inner arp header :Ethernet Src Mac: 00-11-25-8E-2E-5E, Ethernet Dst Mac: FF-FF-FF-FF-FF-FF, ARP Src Mac: 00-03-FF-09-64-DE, ARP Dst Mac: 00-00-00-00-00-00, ARP Src IP: 172.17.152.4, ARP Target IP: 172.17.144.59 .
Jun 30 16:03:26 172.17.146.105 2017-06-30T16:03:26.885160+01:00 rfs7000-Primary %DATAPLANE-4-ARPPOISON: ARP CACHE POISONING: Conflicting ethernet header and inner arp header :Ethernet Src Mac: 00-11-25-8E-2E-5E, Ethernet Dst Mac: FF-FF-FF-FF-FF-FF, ARP Src Mac: 00-03-FF-9A-2E-5E, ARP Dst Mac: 00-00-00-00-00-00, ARP Src IP: 172.17.148.19, ARP Target IP: 172.17.144.71 .
Jun 30 16:03:28 172.17.146.105 2017-06-30T16:03:28.723856+01:00 rfs7000-Primary %DATAPLANE-4-ARPPOISON: ARP CACHE POISONING: Conflicting ethernet header and inner arp header :Ethernet Src Mac: 00-14-5E-A4-7D-04, Ethernet Dst Mac: FF-FF-FF-FF-FF-FF, ARP Src Mac: 00-03-FF-9D-4A-76, ARP Dst Mac: 00-00-00-00-00-00, ARP Src IP: 172.17.151.26, ARP Target IP: 172.17.144.71 .
, ARP Dst Mac: 00-00-00-00-00-00, ARP Src IP: 172.17.148.19, ARP Target IP: 172.17.144.71 .
Jun 30 16:03:28 172.17.146.105 2017-06-30T16:03:28.723856+01:00 rfs7000-Primary %DATAPLANE-4-ARPPOISON: ARP CACHE POISONING: Conflicting ethernet header and inner arp header :Ethernet Src Mac: 00-14-5E-A4-7D-04, Ethernet Dst Mac: FF-FF-FF-FF-FF-FF, ARP Src Mac: 00-03-FF-9D-4A-76, ARP Dst Mac: 00-00-00-00-00-00, ARP Src IP: 172.17.151.26, ARP Target IP: 172.17.144.71 .

I have enabled Arp trust on ge1

no stateful-packet-inspection-l2

no sure if this is correct,

This is where I'm clueless or should it be more clueless than usual 😞
Userlevel 1
looking on the RFS events The AP's seem to be avaiable and then drop off

Userlevel 3
Phil,

Are there any other RFS devices connected anywhere on the network besides the two RFS7000s ?

As you said, the 7131 seem to be dropping off the network, so you'd need to track down why this is happening.

I find that the show mint mlcp history command is the most useful for troubleshooting adoption issues, but it takes some time to figure out what all the messages indicate.

You can try the following from the CLI of the Primary RFS:

show mint mlcp history[/code] The command will give you an output of all the mint level handshake going on between devices.

Here's what it looks like on the RFS when an AP is adopted (time goes backwards):

2017-06-14 09:37:10:Adopted 5C-0E-8B-34-E3-28 (0B.34.E3.28), cfgd notified
2017-06-14 09:37:10:Sending MLCP Offer to 0B.34.E3.28 (link_level=1, preferred=0, capacity=144)
2017-06-14 09:37:10:Sending MLCP Offer to 0B.34.E3.28 (link_level=1, preferred=0, capacity=144)
2017-06-14 09:37:10:Sending MLCP Offer to 0B.34.E3.28 (link_level=1, preferred=0, capacity=144)
2017-06-14 09:37:10:Sending MLCP Reply to (00.00.00.00,34,1,227.40.0.0:23566/5C-0E-8B-34-E3-28)
2017-06-14 09:37:10:Sending MLCP Offer to 0B.34.E3.28 (link_level=1, preferred=0, capacity=144)

To get the view from the 7131, you will also need to issue the same command on one of the 7131s that is not adopting. You should be able to SSH to it.

And this is what it looks like from the AP's perspective:

2017-06-14 09:37:10:Received OK from cfgd, adoption complete to 0B.1B.2A.E2
2017-06-14 09:37:10:Waiting for cfgd OK, adopter should be 0B.1B.2A.E2
2017-06-14 09:37:10:Adoption state change: 'Connecting to adopter' to 'Waiting for Adoption OK'
2017-06-14 09:37:10:Adoption state change: 'No adopters found' to 'Connecting to adopter'
2017-06-14 09:37:10:Try to adopt to 0B.1B.2A.E2 (cluster master 0B.1B.2A.E2 in adopters)

Andrew Webster wrote:

Phil,

Are there any other RFS devices connected anywhere on the network besides the two RFS7000s ?

As you said, the 7131 seem to be dropping off the network, so you'd need to track down why this is happening.

I find that the show mint mlcp history command is the most useful for troubleshooting adoption issues, but it takes some time to figure out what all the messages indicate.

You can try the following from the CLI of the Primary RFS:

show mint mlcp history[/code] The command will give you an output of all the mint level handshake going on between devices.

Here's what it looks like on the RFS when an AP is adopted (time goes backwards):

2017-06-14 09:37:10:Adopted 5C-0E-8B-34-E3-28 (0B.34.E3.28), cfgd notified
2017-06-14 09:37:10:Sending MLCP Offer to 0B.34.E3.28 (link_level=1, preferred=0, capacity=144)
2017-06-14 09:37:10:Sending MLCP Offer to 0B.34.E3.28 (link_level=1, preferred=0, capacity=144)
2017-06-14 09:37:10:Sending MLCP Offer to 0B.34.E3.28 (link_level=1, preferred=0, capacity=144)
2017-06-14 09:37:10:Sending MLCP Reply to (00.00.00.00,34,1,227.40.0.0:23566/5C-0E-8B-34-E3-28)
2017-06-14 09:37:10:Sending MLCP Offer to 0B.34.E3.28 (link_level=1, preferred=0, capacity=144)

To get the view from the 7131, you will also need to issue the same command on one of the 7131s that is not adopting. You should be able to SSH to it.

And this is what it looks like from the AP's perspective:

2017-06-14 09:37:10:Received OK from cfgd, adoption complete to 0B.1B.2A.E2
2017-06-14 09:37:10:Waiting for cfgd OK, adopter should be 0B.1B.2A.E2
2017-06-14 09:37:10:Adoption state change: 'Connecting to adopter' to 'Waiting for Adoption OK'
2017-06-14 09:37:10:Adoption state change: 'No adopters found' to 'Connecting to adopter'
2017-06-14 09:37:10:Try to adopt to 0B.1B.2A.E2 (cluster master 0B.1B.2A.E2 in adopters)



On this note... at one time another provider installed a controller that started adopting our APs because we had a part of their network accessible. That's a good angle to search on.
Userlevel 1
Hi, This is the output from one of the AP7131's, Its very strange that the AP7532's seem unaffected. if any ap's were set to controller capable would that affect anything ?

I will drop the output of the RFS later, as they too have now dropped off the network
it goes from bad to worse 😞

ap7131-7-PC01(config)#show mint mlcp history
2017-07-02 20:39:37:Adoption state change: 'Waiting to retry' to 'No adopters found'
2017-07-02 20:39:26:cfgd notified dpd2 of unadoption, restart adoption after 11 seconds
2017-07-02 20:39:26:Adoption state change: 'Adopted' to 'Waiting to retry'
2017-07-02 20:39:26:Adopter 70.38.0A.F9 is no longer reachable, cfgd notified
2017-07-02 20:39:26:All adopters lost, restarting MLCP
2017-07-02 20:39:26:MLCP link vlan-1 offerer 70.38.0A.F9 lost, restarting discovery
2017-07-02 19:52:15:Received OK from cfgd, adoption complete to 70.38.0A.F9
2017-07-02 19:52:15:Waiting for cfgd OK, adopter should be 70.38.0A.F9
2017-07-02 19:52:15:Adoption state change: 'Connecting to adopter' to 'Waiting for Adoption OK'
2017-07-02 19:52:13:Adoption state change: 'No adopters found' to 'Connecting to adopter'
2017-07-02 19:52:13:Try to adopt to 70.38.0A.F9 (cluster master 70.38.0A.F9 in adopters)
2017-07-02 19:51:14:Adoption state change: 'Waiting to retry' to 'No adopters found'
2017-07-02 19:51:05:cfgd notified dpd2 of unadoption, restart adoption after 9 seconds
2017-07-02 19:51:05:Adoption state change: 'Adopted' to 'Waiting to retry'
2017-07-02 19:51:05:Adopter 70.38.0A.F9 is no longer reachable, cfgd notified
2017-07-02 19:51:05:MLCP created VLAN link on VLAN 1, offer from 00-15-70-38-0A-F9
2017-07-02 19:51:05:MLCP VLAN link already exists
2017-07-02 19:51:05:Sending MLCP Request to 00-15-70-38-0A-F9 vlan 1
2017-07-02 19:51:05:All adopters lost, restarting MLCP
2017-07-02 19:42:42:Received OK from cfgd, adoption complete to 70.38.0A.F9
2017-07-02 19:42:42:Waiting for cfgd OK, adopter should be 70.38.0A.F9
2017-07-02 19:42:42:Adoption state change: 'Connecting to adopter' to 'Waiting for Adoption OK'
2017-07-02 19:42:42:Adoption state change: 'Adoption failed' to 'Connecting to adopter'
2017-07-02 19:42:42:Try to adopt to 70.38.0A.F9 (cluster master 00.00.00.00 in adopters)
2017-07-02 19:42:12:Adoption state change: 'Connecting to adopter' to 'Adoption failed': Connection error 145
2017-07-02 19:41:47:Adoption state change: 'Waiting to retry' to 'Connecting to adopter'
2017-07-02 19:41:47:Try to adopt to 70.38.0A.F9 (cluster master 70.38.0A.F9 in adopters)
2017-07-02 19:41:40:MLCP created VLAN link on VLAN 1, offer from 00-15-70-38-0A-F9
2017-07-02 19:41:40:MLCP VLAN link already exists
2017-07-02 19:41:40:Sending MLCP Request to 00-15-70-38-0A-F9 vlan 1
2017-07-02 19:41:40:All adopters lost, restarting MLCP
2017-07-02 19:41:38:MLCP created VLAN link on VLAN 1, offer from 00-15-70-38-0A-F9
2017-07-02 19:41:38:Sending MLCP Request to 00-15-70-38-0A-F9 vlan 1
2017-07-02 19:41:38:MLCP link vlan-1 offerer 70.38.0A.F9 lost, restarting discovery
2017-07-02 19:41:38:cfgd notified dpd2 of unadoption, restart adoption after 9 seconds
2017-07-02 19:41:38:Adoption state change: 'Adopted' to 'Waiting to retry'
2017-07-02 19:40:50:Received OK from cfgd, adoption complete to 70.38.0A.F9
2017-07-02 19:40:50:Waiting for cfgd OK, adopter should be 70.38.0A.F9
2017-07-02 19:40:50:Adoption state change: 'Connecting to adopter' to 'Waiting for Adoption OK'
2017-07-02 19:40:50:Adoption state change: 'Waiting to retry' to 'Connecting to adopter'
2017-07-02 19:40:50:Try to adopt to 70.38.0A.F9 (cluster master 70.38.0A.F9 in adopters)
2017-07-02 19:40:38:MLCP created VLAN link on VLAN 1, offer from 00-15-70-38-0A-F9
2017-07-02 19:40:38:MLCP VLAN link already exists
2017-07-02 19:40:38:Sending MLCP Request to 00-15-70-38-0A-F9 vlan 1
2017-07-02 19:40:38:All adopters lost, restarting MLCP
2017-07-02 19:40:38:MLCP created VLAN link on VLAN 1, offer from 00-15-70-38-0A-F9
2017-07-02 19:40:38:Sending MLCP Request to 00-15-70-38-0A-F9 vlan 1
2017-07-02 19:40:38:MLCP link vlan-1 offerer 70.38.0A.F9 lost, restarting discovery
2017-07-02 19:40:38:cfgd notified dpd2 of unadoption, restart adoption after 12 seconds
2017-07-02 19:40:38:Adoption state change: 'Adopted' to 'Waiting to retry'
2017-07-02 19:39:43:Received OK from cfgd, adoption complete to 70.38.0A.F9
2017-07-02 19:39:42:Waiting for cfgd OK, adopter should be 70.38.0A.F9
2017-07-02 19:39:42:Adoption state change: 'Connecting to adopter' to 'Waiting for Adoption OK'
2017-07-02 19:39:39:Adoption state change: 'No adopters found' to 'Connecting to adopter'
2017-07-02 19:39:39:Try to adopt to 70.38.0A.F9 (cluster master 70.38.0A.F9 in adopters)
2017-07-02 19:39:34:Adoption state change: 'Waiting to retry' to 'No adopters found'
2017-07-02 19:39:29:cfgd notified dpd2 of unadoption, restart adoption after 5 seconds
2017-07-02 19:39:29:Adoption state change: 'Adopted' to 'Waiting to retry'
2017-07-02 19:39:29:Adopter 70.38.0A.F9 is no longer reachable, cfgd notified
2017-07-02 19:39:29:MLCP created VLAN link on VLAN 1, offer from 00-15-70-38-0A-F9
2017-07-02 19:39:29:MLCP VLAN link already exists
2017-07-02 19:39:29:Sending MLCP Request to 00-15-70-38-0A-F9 vlan 1
2017-07-02 19:39:29:All adopters lost, restarting MLCP
2017-07-02 19:38:48:Received OK from cfgd, adoption complete to 70.38.0A.F9
2017-07-02 19:38:48:Waiting for cfgd OK, adopter should be 70.38.0A.F9
2017-07-02 19:38:48:Adoption state change: 'Connecting to adopter' to 'Waiting for Adoption OK'
2017-07-02 19:38:48:Adoption state change: 'Waiting to retry' to 'Connecting to adopter'
2017-07-02 19:38:48:Try to adopt to 70.38.0A.F9 (cluster master 70.38.0A.F9 in adopters)
2017-07-02 19:38:38:MLCP created VLAN link on VLAN 1, offer from 00-15-70-38-0A-F9
2017-07-02 19:38:38:Sending MLCP Request to 00-15-70-38-0A-F9 vlan 1
2017-07-02 19:38:38:MLCP link vlan-1 offerer 70.38.0A.F9 lost, restarting discovery
2017-07-02 19:38:38:cfgd notified dpd2 of unadoption, restart adoption after 10 seconds
2017-07-02 19:38:38:Adoption state change: 'Adopted' to 'Waiting to retry'
2017-07-02 19:38:23:Received OK from cfgd, adoption complete to 70.38.0A.F9
2017-07-02 19:38:23:Waiting for cfgd OK, adopter should be 70.38.0A.F9
2017-07-02 19:38:23:Adoption state change: 'Connecting to adopter' to 'Waiting for Adoption OK'
2017-07-02 19:38:23:Adoption state change: 'Adoption failed' to 'Connecting to adopter'
2017-07-02 19:38:23:Try to adopt to 70.38.0A.F9 (cluster master 00.00.00.00 in adopters)
2017-07-02 19:37:53:Adoption state change: 'Connecting to adopter' to 'Adoption failed': Connection error 145
2017-07-02 19:37:27:Adoption state change: 'Adoption failed' to 'Connecting to adopter'
2017-07-02 19:37:27:Try to adopt to 70.38.0A.F9 (cluster master 00.00.00.00 in adopters)
2017-07-02 19:37:24:Adoption state change: 'Waiting for Adoption OK' to 'Adoption failed': Cluster master is unknown
2017-07-02 19:37:24:Adoption state change: 'Connecting to adopter' to 'Waiting for Adoption OK'
2017-07-02 19:37:24:Adoption state change: 'Adoption failed' to 'Connecting to adopter'
2017-07-02 19:37:24:Try to adopt to 70.38.0A.F9 (cluster master 00.00.00.00 in adopters)
2017-07-02 19:37:21:Adoption state change: 'Waiting for Adoption OK' to 'Adoption failed': Cluster master is unknown
2017-07-02 19:37:21:Adoption state change: 'Connecting to adopter' to 'Waiting for Adoption OK'
2017-07-02 19:37:21:Adoption state change: 'Adoption failed' to 'Connecting to adopter'
2017-07-02 19:37:21:Try to adopt to 70.38.0A.F9 (cluster master 00.00.00.00 in adopters)
2017-07-02 19:37:18:Adoption state change: 'Waiting for Adoption OK' to 'Adoption failed': Cluster master is unknown
2017-07-02 19:37:18:Adoption state change: 'Connecting to adopter' to 'Waiting for Adoption OK'
2017-07-02 19:37:18:Adoption state change: 'Adoption failed' to 'Connecting to adopter'
2017-07-02 19:37:18:Try to adopt to 70.38.0A.F9 (cluster master 00.00.00.00 in adopters)
2017-07-02 19:37:15:Adoption state change: 'Waiting for Adoption OK' to 'Adoption failed': Cluster master is unknown
2017-07-02 19:37:15:Adoption state change: 'Connecting to adopter' to 'Waiting for Adoption OK'
2017-07-02 19:37:12:Adoption state change: 'No adopters found' to 'Connecting to adopter'
2017-07-02 19:37:12:Try to adopt to 70.38.0A.F9 (cluster master 70.38.0A.F9 in adopters)
2017-07-02 19:37:11:Adoption state change: 'Adoption failed' to 'No adopters found'
2017-07-02 19:37:00:MLCP created VLAN link on VLAN 1, offer from 00-15-70-38-0A-F9
2017-07-02 19:37:00:MLCP VLAN link already exists
2017-07-02 19:37:00:Sending MLCP Request to 00-15-70-38-0A-F9 vlan 1
2017-07-02 19:37:00:All adopters lost, restarting MLCP
2017-07-02 19:36:41:Adoption state change: 'Connecting to adopter' to 'Adoption failed': Connection error 145
2017-07-02 19:36:16:Adoption state change: 'Waiting to retry' to 'Connecting to adopter'
2017-07-02 19:36:16:Try to adopt to 70.38.0A.F9 (cluster master 70.38.0A.F9 in adopters)
2017-07-02 19:36:10:MLCP created VLAN link on VLAN 1, offer from 00-15-70-38-0A-F9
2017-07-02 19:36:10:Sending MLCP Request to 00-15-70-38-0A-F9 vlan 1
2017-07-02 19:36:10:MLCP link vlan-1 offerer 70.38.0A.F9 lost, restarting discovery
2017-07-02 19:36:10:cfgd notified dpd2 of unadoption, restart adoption after 6 seconds
2017-07-02 19:36:10:Adoption state change: 'Adopted' to 'Waiting to retry'
2017-07-02 19:34:11:Received OK from cfgd, adoption complete to 70.38.0A.F9
2017-07-02 19:34:10:Waiting for cfgd OK, adopter should be 70.38.0A.F9
2017-07-02 19:34:10:Adoption state change: 'Connecting to adopter' to 'Waiting for Adoption OK'
2017-07-02 19:34:10:Adoption state change: 'Adoption failed' to 'Connecting to adopter'
2017-07-02 19:34:10:Try to adopt to 70.38.0A.F9 (cluster master 00.00.00.00 in adopters)
2017-07-02 19:33:40:Adoption state change: 'Connecting to adopter' to 'Adoption failed': Connection error 145
2017-07-02 19:33:15:Adoption state change: 'Waiting to retry' to 'Connecting to adopter'
2017-07-02 19:33:15:Try to adopt to 70.38.0A.F9 (cluster master 70.38.0A.F9 in adopters)
2017-07-02 19:33:08:MLCP created VLAN link on VLAN 1, offer from 00-15-70-38-0A-F9
2017-07-02 19:33:08:Sending MLCP Request to 00-15-70-38-0A-F9 vlan 1
2017-07-02 19:33:08:MLCP link vlan-1 offerer 70.38.0A.F9 lost, restarting discovery
2017-07-02 19:33:08:cfgd notified dpd2 of unadoption, restart adoption after 7 seconds
2017-07-02 19:33:08:Adoption state change: 'Adopted' to 'Waiting to retry'
2017-07-02 19:29:51:Received OK from cfgd, adoption complete to 70.38.0A.F9
2017-07-02 19:29:51:Waiting for cfgd OK, adopter should be 70.38.0A.F9
2017-07-02 19:29:51:Adoption state change: 'Connecting to adopter' to 'Waiting for Adoption OK'
2017-07-02 19:29:51:Adoption state change: 'Adoption failed' to 'Connecting to adopter'
2017-07-02 19:29:51:Try to adopt to 70.38.0A.F9 (cluster master 00.00.00.00 in adopters)
ap7131-7-PC01(config)#
Userlevel 3
Phil,

Looking at your dump, it appears as if the config being pushed out the APs once they are adopted is "breaking" the connectivity with the RFS. I can clearly see the AP is actually adopting, but then going into Waiting to retry state because it lost contact with the RFS.

The Backup RFS is also doing the adoption, which seems to differ from your show cluster members output of the other day.

Have a close look at the configuration going into the APs with the following command from the RFS:

show run device device_name[/code]

This will show you the final, complete, config that is going to be sent to the device when it adopts, and the profiles are folded down, so you can actually see what the AP is going to get.

Pay close attention to the last block of configuration which will be the device itself, and particularly interface ge1 and vlan 1 configuration, then compare it with the configuration going out to the 7532s using the same command. I suspect that something may have been modified in the 7131's profile hence the breakage you're experiencing.
Userlevel 1
Hi, this is the output from the rfs ( show run deveice ) the last block
I'm not sure what bit I should be looking at.

interface ge1
speed 1000
switchport mode trunk
switchport trunk native vlan 1
no switchport trunk native tagged
switchport trunk allowed vlan 1-10
no cdp receive
no cdp transmit
no lldp receive
no lldp transmit
interface ge2
shutdown
interface vlan1
ip address dhcp
ip address zeroconf secondary
ip dhcp client request options all
interface wwan1
interface pppoe1
use event-system-policy AP-Down
use firewall-policy default
ntp server 172.17.144.150 prefer version 3
ntp server 172.17.144.151 version 3
use role-policy RBFW
email-notification host
email-notification recipient
logging on
controller hello-interval 60 adjacency-hold-time 180
service pm sys-restart
no upgrade opcode auto
no upgrade opcode path
no upgrade opcode reload
traffic-shape enable

It looks OK
Userlevel 1
Looking at the syslog I also see this
%DATAPLANE-4-RAGUARD: RA-GUARD: router advertisement/redirect from/to untrusted port/wlan 0, vlan 1 : Src IP : fe80:0:0:0:217:c5ff:fe99:67a0, Dst IP: ff02:0:0:0:0:0:0:1, Src Mac: 00-17-C5-99-67-A0, Dst Mac: 33-33-00-00-00-01, ICMP type = 134, ICMP code = 0, Proto = 58.

Here its saying untrusted port/wlan 0 - there is no wlan 0 or port 0 i believe

It looks like IP6 which we do not use, is it worth turning the IP6 stuff off ?
Userlevel 3
Phil,

Don't worry about IPv6.

The config looks ok, except the port speed setting. As a general rule, gigabit networks should not use forced speed/duplex settings, in fact the standard mandates auto negotiation.

Your original post mentioned that all this started after a power failure, so I'm guessing some unsaved config changes got lost.

Check AP 7131 vs AP 7532 config differences, as well as the switch-port config of the respective switches they are connected to. The mint MLCP output clearly shows APs getting adopted then dropping immediately afterward, indicating something about the config is breaking the connectivity.

Beyond this, I think some one-on-one troubleshooting and/or opening a case with GTAC is in order.
Userlevel 5
Can you login to one of the APs and provide output of CLI command 'sh addoption history'
Userlevel 1
Hi, this is the output from the the sh adoption history.

the RFS units are up and can see each other, they have ge1 set as a trunk port with Vlan 1 as the native vlan ( currently not tagged ) and allowed vlans are 1 & 10
the network switches have only 2 vlans 1 & 10 and the ports are allowed ( tagall )

The AP/s have the Ge1 set as a trunk port with allowed vlans 1 & 10 ( native vlan untagged ) then there is wlan to vlan map in the wireless for the two wifi networks

The network switches are Nortel currenly ( hoping to move to extreme in the very near future )
but for now its nortel. This has all become a mistry as to what has gone wrong, Myself I think its the network switches, but its proving it. I suppose the other thing I could do is get a spare switch default
then connect the rfs and some AP's in and see what happens ?

-------------------------------------------------------------------------------- --------------------
MAC TYPE EVENT TIME-STAMP REASON
-------------------------------------------------------------------------------- --------------------
00-15-70-81-BE-8E RFS7000 un-adopted 2017-07-04 06:25:17 Adopter 70.81.BE.8E is no longer reachable
00-15-70-81-BE-8E RFS7000 adopted 2017-07-04 06:24:53 N.A.
00-15-70-81-BE-8E RFS7000 un-adopted 2017-07-04 06:24:42 Adopter 70.81.BE.8E is no longer reachable
00-15-70-81-BE-8E RFS7000 adopted 2017-07-04 06:23:50 N.A.
00-15-70-81-BE-8E RFS7000 un-adopted 2017-07-04 06:17:36 Adopter 70.81.BE.8E is no longer reachable
00-15-70-81-BE-8E RFS7000 adopted 2017-07-04 06:17:15 N.A.
00-15-70-81-BE-8E RFS7000 un-adopted 2017-07-04 06:17:10 Receive d reset from switch 70.81.BE.8E, {'reason': 'controller cfgd is not your adopte r due to misadoption'}
00-15-70-81-BE-8E RFS7000 adopted 2017-07-04 06:15:58 N.A.
00-15-70-81-BE-8E RFS7000 un-adopted 2017-07-04 06:09:36 Adopter 70.81.BE.8E is no longer reachable
00-15-70-81-BE-8E RFS7000 adopted 2017-07-04 06:09:20 N.A.
00-15-70-81-BE-8E RFS7000 un-adopted 2017-07-04 06:09:07 Adopter 70.81.BE.8E is no longer reachable
00-15-70-81-BE-8E RFS7000 adopted 2017-07-04 06:08:15 N.A.
00-15-70-81-BE-8E RFS7000 un-adopted 2017-07-04 06:01:58 Adopter 70.81.BE.8E is no longer reachable
00-15-70-81-BE-8E RFS7000 adopted 2017-07-04 06:01:36 N.A.
00-15-70-81-BE-8E RFS7000 un-adopted 2017-07-04 06:01:24 Adopter 70.81.BE.8E is no longer reachable
00-15-70-81-BE-8E RFS7000 adopted 2017-07-04 06:00:33 N.A.
--------------------------------------------------------------------------------
Userlevel 1
This gets stranger by the second, so when I look at the rfdomain it indicates there are 8 devices online, when you select the pie chart it show 8, but when you select the rfdomain from the tree view
it shows 6 and two of those are the RFS7k.
If I go to statistics and offline devices it show the units that are offline with one ap connected to a device that I believe is a wired IP polycom phone , I conneted to the AP with a serial cable, and logged in, it then started scrolling messages about IPSpoof
and then showing IPs that are in our Range
"st Mac: 01-00-5E-00-00-FB, Proto = 17.
Jul 04 13:41:17 2017: %DATAPLANE-4-DOSATTACK: IPSPOOF ATTACK: Source IP is Spoo fed : Src IP : 172.17.152.31, Dst IP: 224.0.0.251, Src Mac: F4-F5-D8-AA-DB-66, D st Mac: 01-00-5E-00-00-FB, Proto = 17.
Jul 04 13:41:17 2017: %DATAPLANE-4-DOSATTACK: IPSPOOF ATTACK: Source IP is Spoo fed : Src IP : 172.17.150.53, Dst IP: 224.0.0.251, Src Mac: 50-65-F3-46-48-62, D st Mac: 01-00-5E-00-00-FB, Proto = 17.
Jul 04 13:41:17 2017: %DATAPLANE-4-DOSATTACK: IPSPOOF ATTACK: Source IP is Spoo fed : Src IP : 172.17.146.137, Dst IP: 224.0.0.252, Src Mac: 00-15-5D-90-CA-61,"

The AP is now powered off.
The syslog is still showing %dataplane-4-DOSATTACK:ipspoof attack : source ip is spoofed 10.0.0.138 then mac etc

I have a know working config from the RFS taken on the 26/5/17 when wifi bridge was setup and working, although the bridge is not in place at present.

I'm not sure what to do now, default the primary and backup, fire the config back in then set the cluster back up ?
Userlevel 3
Phil,

If you shut off the backup RFS, does the system start working properly again?

If it does, then its pretty simple to factory default the backup unit and have it rejoin the cluster.
Userlevel 1
Hi Andrew

I will shut the backup unit down and see what happens

I have run the show mint mlcp history again and it shows the following where it shows link_level=1, preferred=0 what does this relate to ?

2017-07-04 15:54:15:Sending MLCP Offer to 19.6B.76.C0 (link_level=1, preferred=0, capacity=1024)
2017-07-04 15:53:34:Sending MLCP Offer to 4D.80.C5.F4 (link_level=1, preferred=0, capacity=1024)
Userlevel 3
What you are seeing is offers to adopt to the APs.

link_level=1 refers to the mint level, in this case think of level1 = layer2

preferred = 0, no specific preference, not even sure if that flag is used.

capacity=1024 is the maximum (not licensed) capacity that the appliance will support.
Userlevel 1
Still confused as to what has happened. But looking at the online devices I have to that report they are connected to SEP:64167f829866: port1 , This appears to be a mac address of a Polycom VvX201 VOIP phone, I restart the IP phone then refresh the online devices view and then it show the two AP's as being connected to Primary RFS ge1, Then as soon as the phone has booted the ap's show as being connected to the phone again. None of the other online devices are connected to the anything ?
Userlevel 3
Phil,

By what means are you "looking at online devices"? It appears as if some LLDP or CDP packets are perhaps clouding the issue.

I think this issue is going beyond the abilities of troubleshooting in this forum. I think you're going to need some hands-on assistance to get this resolved.

If you have software support on your APs, please reach out to GTAC, they can assist. If you don't have support on your APs, software support is not as expensive as you might think, and in addition to calling GTAC, it also entitles you to the latest version.

Reply