I'm writing from a College environment where we have just over 850 WiNG APs. We are currently running WiNG 18.104.22.168-007R. We are ~2 weeks into our semester and we have just recently started receiving many complaints about network speeds/reliability/etc...
While digging into this, I stumbled on some extremely high retry percentages (see two examples below):
My questions are:
- What is an acceptable retry percentage?
- Is it reasonable to expect a retry percentage of close to 0 on most (all) devices?
- What could be causing this? Admittedly, I don't actively look at retry percentages, so I don't know if these are "normal" numbers on my network
- Could this be a problem with version 22.214.171.124-007R? These complaints are new (we didn't have them last school year or during the summer), and we did upgrade to 126.96.36.199-007R in mid-July.
Best answer by Chris Kelly
(long response...because there's no easy/short answer to this)
First answer is that 'acceptable' retry rates are generally defined as under 20% for non-critical applications. There is no official value though, just industry accepted norms.
As far as retries being zero, pretty much the ONLY time you'd ever see this is in a thought experiment or a lab environment. In the real world, you're always going see some number of retries. The goal is to simply minimize them to acceptable values for the applications in use (guest traffic vs VoWLAN for example).
In any case though, anything over 25% being seen on a consistent or sustained basis raises a red flag for me (but this value will naturally fluctuate and it's not unusual to periodically spike). Sometimes there's things you can do about it, sometimes not. Some things will simply be out of your control. All you can do is 'fix' the things that you ARE able to control.
To analyze this is and determine root cause is likely not trivial - with the assumption that it's not WiNG code related, which I doubt it is. Retries simply occur which the sender is not able to detect/see an ACK for the frame that it transmitted (for those frame transmissions that actually require an ACK). This failure to see an ACK can be caused by many things - and that is where the challenge lies.
Looking at these two examples though does help to start ruling out some potential causes though.
In both cases, you have extremely high SNR and RSSI, and a very low noise floor. Fantastic. These three things alone would normally dictate a very healthy environment that should allow for maximum data rates to be used.
BUT.....let's take a look at things that can cause retries:
1) Non-802.11 interference (occurring on the same frequencies as your devices, of course) leading to frame corruption which the receiver recognizes. In this case, the original transmitter's frames ends up not being received so the receiver obviously never sends an ACK...or the transmitted ACK is corrupted and the therefore the ACK is interpreted as never being received. In both cases, retries occur.
2) Frame collisions (Typically caused by a hidden node, OBSS situation, or adjacent channel traffic)
3) Extremely high airtime contention - causing the transmitter of the ACK to have to wait SO long to transmit the ACK that the other device finally gives up and interprets the situation as the other device not ever having received the frame and so it 'retries'.'
4) Low transmit power devices - or devices too distant from the AP. In this case, a client device may be operating on the fringe edge of the AP's ability to decode the client's preamble. This would likely lead to many of the ACK frames being received by the AP being too weak and thus not decode-able and therefore the AP thinks the client never ACK'd. This situation is usually in this direction (client to AP) because client devices are MUCH lower powered and have almost no antenna gain - whereas APs can transmit at much higher power levels and have better antennas - overall, much higher EIRP capable. Bottom line, the clients are the weak-link in the chain when it comes to wifi.
- This is also related to the sticky-client problem where a client associates to an AP initially but then as it moves, it doesn't roam properly to the next AP that is much closer. In that scenario, the client is then having to communicate with an AP that is very distant - leading to the problems just described.
6) Client side 802.11 driver issues. It happens.
There's other causes, but they get more corner-case related, but these 6 are really the most common and likely.
In your examples, the Error rates are zero. This indicates to me that there are no issues with corrupted CRC values. If there were, the frames would be discarded and would be considered an error.
The SNR, RSSI, and noise levels seen on the AP are so good that you're not dealing with a 'distance' related issue (are there APs located in the same room as the devices??) At these levels, the devices shouldn't have any issues with even the highest BASIC rates being set.
What can't be accounted for here is CCC (co-channel contention) related problems - basically meaning high channel utilization - airtime is too busy to allow devices to communicate in a timely manner.
- Something I cannot tell from the screenshots is if this is for devices operating on 2.4GHz or 5GHz. If it's 2.4GHz, there's a much higher likelihood that high channel utilization is the problem. And unfortunately, there's not much you can do about it, especially if the clients are not 5GHz capable. At the very least, if your deployment is setup well, ensure that you disable all non-OFDM data rates. Do not allow 1, 2, 5.5, 11 Mbps data rates at all. But if you have devices that need them, those users will be affected. But that also means that those devices are VERY old (pre-11n). Doing this will help ensure that traffic is moving more quickly and will help free-up airtime.
Are the complaints seen everywhere (or are the very high retry rates seen as occurring all over...or are they relegated to just certain areas?)
Seeing these very high retry rates on any of the devices in the **same** area that are using 5GHz? (THIS would be an interesting answer to see)
Forgot to mention one thing - The retry rates can sometime be skewed. WiNG reports the values as a percentage but you can have cases where there is VERY little traffic for a device and have normal retries...which ends up, because of the simple math, looking like there's an issue because the values are so high. Where the values are legitimate though is where there's a normal/decent amount of traffic associated with the device. To see if this is maybe the culprit, you'll have to look at the traffic stats for these devices.