cancel
Showing results for 
Search instead for 
Did you mean: 

Apple Devices Casuing intermittent network outages

Apple Devices Casuing intermittent network outages

Thomas_Randolph
New Contributor
We have recently started rolling Apple devices at one of our locations. Approximately 1200 iPads. They are connecting to via an Aerohive wireless solution. That solution has 1 point of entry to my network at a B5 switch that is also doing layer 3 for that site to our WAN. When the iPads starting ramping up, it has seemed to cause frequent intermittent network outages. Devices on the rest of lan in different vlans stop communicating and then will come back within 5-10 seconds. This happens anywhere from 1-50 minutes during every hour of the day including overnight when no users are onsite. After extensive troubleshooting, we still have not identified the cause of the problem but believe it is possibly something with the B5G124-48P2. From one of my switches at the local site, when the problem happens, I cannot reach my default gateway, which is the IP address of that vlan interface configured on the B5. Since affects all vlans, we suspect the B5. The routing is all static and RIP is disabled. Also when this is happening, the system CPU is not excessive, it is usually around 26-45% but I have seen it spike very shortly up to 60-80%. I have debug logging enable and the have no logs indicating that system resources are taxed. The only recurring log entry I get is "DHCPRELAY[265701448]: relay_main.c(315) 568089 This is from manager 1 %% Request could not be relayed to Server". I have also checked my DHCP server, however it is supporting all 14 of my sites with no issues anywhere else. I checked the mac tables and we usually sit at just under 4000 entries, but according to the specs for the B5 it can support up to 24,000. I appreciate any help or suggestions.
18 REPLIES 18

Thomas_Randolph
New Contributor
We have ran multiple packet captures with nothing standing out. Port utilization is very low. We were told that Bonjour Gateway was disabled with Aerohive. The thing is that I get reports of their switches going down too. Down is relative, they basically cannot get to their Default Gateway for a short period of time.

I have recently seen high switch CPU usage (100% for about half an hour) on EXOS based switches after rebooting connected Aerohive APs. Blocking mDNS from reaching the switch CPU dropped the CPU usage below 100%, but it was still quite high.

Packet captures showed a significant increase in the following three frame / packet types affecting switch CPUs sent by the Aerohive access points:
  1. mDNS requests
  2. Some layer 2 broadcasts probably used for Aerohive AP discovery
  3. Gratuitous ARP replies
Strangely the access points rejected every received mDNS answer with an ICMP Port Unreachable message, but continued sending requests.

The Bonjour gateway was disabled on the Aerohive APs, but the access points generate their own mDNS requests.

It usually takes hours for the switch CPU usage to drop to the normal values observed in the steady state network.

See the GTAC Knowledge article "How can I block mDNS with an ACL using MAC addresses" for info on an ACL to mitigate mDNS impacts on EXOS switches.

Christoph
Contributor
How about port utilisation?
Maybe it's a broadcast or multicast issue?

William_Aguilar
Extreme Employee
It must be AeroHive ... it wouldn't happen with ExtremeWireless . Just kidding. Ultimately the best thing to do is to work with GTAC but one area you could look at is the effect that Apple Bonjour is having on the network. Bonjour is a zero-touch discovery protocol used by Apple devices which is great for the home but it doesn't scale in enterprises. It uses multicast as a discovery mechanism which is very expensive on a wireless network and also puts strain on switches because it has to be processed in software in the CPU (= high CPU utilization). AH has some controls for Bonjour and one thing you could try to do is block all Bonjour at the APs on one of the sites to see if it addresses the issue. Or you can try to drop the multicast at the switch before it is processed in the CPU to see if that helps. Again, the best thing to do is to work with GTAC to isolate the issue but it is something to consider.

Good luck with AH and keep and eye for what we are doing with ExtremeWireless next time you are looking to upgrade your Wi-Fi network.

Thanks,

Will

GTM-P2G8KFN