Max,
(long response...because there's no easy/short answer to this)
First answer is that 'acceptable' retry rates are generally defined as under 20% for non-critical applications. There is no official value though, just industry accepted norms.
As far as retries being zero, pretty much the ONLY time you'd ever see this is in a thought experiment or a lab environment. In the real world, you're always going see some number of retries. The goal is to simply minimize them to acceptable values for the applications in use (guest traffic vs VoWLAN for example).
In any case though, anything over 25% being seen on a
consistent or sustained basis raises a red flag for me (but this value will naturally fluctuate and it's not unusual to periodically spike). Sometimes there's things you can do about it, sometimes not. Some things will simply be out of your control. All you can do is 'fix' the things that you ARE able to control.
To analyze this is and determine root cause is likely not trivial - with the assumption that it's not WiNG code related, which I doubt it is. Retries simply occur which the sender is not able to detect/see an ACK for the frame that it transmitted (for those frame transmissions that actually require an ACK). This failure to see an ACK can be caused by many things - and that is where the challenge lies.
Looking at these two examples though does help to start ruling out some potential causes though.
In both cases, you have extremely high SNR and RSSI, and a very low noise floor. Fantastic. These three things alone would normally dictate a very healthy environment that should allow for maximum data rates to be used.
BUT.....let's take a look at things that can cause retries:
1) Non-802.11 interference (occurring on the same frequencies as your devices, of course) leading to frame corruption which the receiver recognizes. In this case, the original transmitter's frames ends up not being received so the receiver obviously never sends an ACK...or the transmitted ACK is corrupted and the therefore the ACK is interpreted as never being received. In both cases, retries occur.
2) Frame collisions (Typically caused by a hidden node, OBSS situation, or adjacent channel traffic)
3) Extremely high airtime contention - causing the transmitter of the ACK to have to wait SO long to transmit the ACK that the other device finally gives up and interprets the situation as the other device not ever having received the frame and so it 'retries'.'
4) Low transmit power devices - or devices too distant from the AP. In this case, a client device may be operating on the fringe edge of the AP's ability to decode the client's preamble. This would likely lead to many of the ACK frames being received by the AP being too weak and thus not decode-able and therefore the AP thinks the client never ACK'd. This situation is usually in this direction (client to AP) because client devices are MUCH lower powered and have almost no antenna gain - whereas APs can transmit at much higher power levels and have better antennas - overall, much higher EIRP capable. Bottom line, the clients are the weak-link in the chain when it comes to wifi.
- This is also related to the sticky-client problem where a client associates to an AP initially but then as it moves, it doesn't roam properly to the next AP that is much closer. In that scenario, the client is then having to communicate with an AP that is very distant - leading to the problems just described.
5) This one is user-induced. If you configure a setup such that the BASIC rates are unrealistically high, the ACK frames (which are management frames) will be sent at the configured BASIC rate(s). So if those BASIC rates aren't achievable or just barely are, then the success rates will be low.
6) Client side 802.11 driver issues. It happens.
There's other causes, but they get more corner-case related, but these 6 are really the most common and likely.
In your examples, the Error rates are zero. This indicates to me that there are no issues with corrupted CRC values. If there were, the frames would be discarded and would be considered an error.
The SNR, RSSI, and noise levels seen on the AP are so good that you're not dealing with a 'distance' related issue (are there APs located in the same room as the devices??) At these levels, the devices shouldn't have any issues with even the highest BASIC rates being set.
What can't be accounted for here is CCC (co-channel contention) related problems - basically meaning high channel utilization - airtime is too busy to allow devices to communicate in a timely manner.
- Something I cannot tell from the screenshots is if this is for devices operating on 2.4GHz or 5GHz. If it's 2.4GHz, there's a much higher likelihood that high channel utilization is the problem. And unfortunately, there's not much you can do about it, especially if the clients are not 5GHz capable. At the very least, if your deployment is setup well, ensure that you disable all non-OFDM data rates. Do not allow 1, 2, 5.5, 11 Mbps data rates at all. But if you have devices that need them, those users will be affected. But that also means that those devices are VERY old (pre-11n). Doing this will help ensure that traffic is moving more quickly and will help free-up airtime.
There's also the potential for an OBSS situation, which can't be seen here (or hidden node). Both cases will lead to two or more devices thinking that the airwaves are clear for them to transmit...at the same time. With this, you end up with collisions in the air...leading to frame corruption. But, back to zero Error rate, that wouldn't seem to be the case here.
Are the complaints seen everywhere (or are the very high retry rates seen as occurring all over...or are they relegated to just certain areas?)
Seeing these very high retry rates on any of the devices in the **same** area that are using 5GHz? (THIS would be an interesting answer to see)
Forgot to mention one thing - The retry rates can sometime be skewed. WiNG reports the values as a percentage but you can have cases where there is VERY little traffic for a device and have normal retries...which ends up, because of the simple math, looking like there's an issue because the values are so high. Where the values are legitimate though is where there's a normal/decent amount of traffic associated with the device. To see if this is maybe the culprit, you'll have to look at the traffic stats for these devices.