Currently i have a very strange problem.
We use EAP-TLS 802.1x Authentication for a internal SSID for notebooks. EWC is installed at the headquarter. 2x AP 3705 installed on the affected branch - we use V9.21.07. NAC Gateway 6.2.0.x installed also in the headquarter and is the RADIUS proxy to the NPS on the Windows AD 2008 Server. This working well over the last years.
Now we change the WAN connection of this branch from MPLS to VPN with IPSec. After this change a lot of internal WLAN clients which connected before without problems are rejected from the NAC Gateway. All other branches working well. At wired switches we use only MAC Auth which is also not affected.
802.1x (identify) - Authentication became stale
After some troubleshooting i realized that if the client roam within the AP to its prefered radio for that roaming event a radius request is triggered. The the first request (to the first radio) is always possitive (accepted) and then the AP internal switch to the prefered radio triggers a RADIUS request which is always rejected - with the above error message.
For a temporary solution i disable radio 1! And then all client can login without problems!
This is very strange.
Why do an switch from radio 2 to radio 1 trigger a radius event. Can i disable this new login request in the AP / EWC config?
If this request is needed why does it become stale and will be rejected?
i am back from my holiday and surprise, surprise some (not very much) problems are solved!
My WLAN Problem is solved!!
The root cause was a bug in the FortiGate OS. It seems that all CAPWAP/WASSP traffic with not handled correctly by the firewall. My co-worker see this during a firewall debug. As a work-around he disable ASIC offloading for CAPWAP Packets and at once all wireless clients are running without problems!
After an update to the lastest FortiOS 5.4.1 all running well.
I do not quite understand the MTU problem. You wrote that you can ping the AP with a 1428B IP packet (1400B ICMP Echo Request data + ICMP header + IP header), and that a Framed-MTU of 1400 is used. That seems to fit.
Additionally you write that authentication works fine with one radio disabled. That suggests that the network is able to transport the certificate.
But then you write that the server certificate cannot be transported to the client.
I would guess that one packet containing part of the certificate is lost on its way from the server (NAC) to the client (AP), ultimately resulting in a reject.
It is interesting that there seems to be reliable packet loss with two back-to-back authentication attempts. As if that crossed some rate limiting threshold.
OK, so the presumed workaround did not actually work around the problem. 😞
I would suggest you use wireshark (or similar) to check the actual size of the RADIUS communication packets, because your test with ping and the Framed-MTU value suggest that the MTU size is not the problem.
If the VPN MTU is the problem, you should be able to see RADIUS packets on the interface leading to the VPN, but not on the other end exiting the VPN. There might even be ICMP Packet Too Big message visible in a packet trace. If the MTU is the problem, no larger packet at all can cross the VPN. You can verify the actual VPN MTU using "ping -f -l SIZE" (on Windows). The generated IP packet will be 28 bytes bigger than SIZE, you can see this in wireshark.
If a packet from RADIUS server to the AP containing part of a certificate is lost, this authentication session can only succeed if the re-transmit timer and count of the RADIUS client match the RADIUS server settings for re-transmits. If a re-transmitted packet arrives after the time allowed by the server, the server will answer with an Access-Reject.
Extreme Control accepts an answer inside a 5s window after sending a packet. If it takes the client longer to request a re-transmit for a lost packet from the server, the authentication session will fail (Access-Reject). This problem exists only with bigger RADIUS messages needing more than one packet (e.g. a certificate), because in most other cases the RADIUS server will treat the re-transmit request as a completely new session.
Anyway, if the problem really is an MTU problem because of the VPN, you might be able to fragment inside the VPN despite the dont-fragment bit. This of course depends on the VPN solution used.