client roaming to prefered radio caused radius authentication event which failed

  • 0
  • 1
  • Question
  • Updated 2 years ago
  • Answered
Currently i have a very strange problem.
We use EAP-TLS 802.1x Authentication for a internal SSID for notebooks. EWC is installed at the headquarter. 2x AP 3705 installed on the affected branch - we use V9.21.07. NAC Gateway 6.2.0.x installed also in the headquarter and is the RADIUS proxy to the NPS on the Windows AD 2008 Server. This working well over the last years.
Now we change the WAN connection of this branch from MPLS to VPN with IPSec. After this change a lot of internal WLAN clients which connected before without problems are rejected from the NAC Gateway. All other branches working well. At wired switches we use only MAC Auth which is also not affected.

Error:
802.1x (identify) - Authentication became stale

After some troubleshooting i realized that if the client roam within the AP to its prefered radio for that roaming event a radius request is triggered. The the first request (to the first radio) is always possitive (accepted) and then the AP internal switch to the prefered radio triggers a RADIUS request which is always rejected - with the above error message.

For a temporary solution i disable radio 1! And then all client can login without problems!

This is very strange.

First question:
Why do an switch from radio 2 to radio 1 trigger a radius event. Can i disable this new login request in the AP / EWC config?
Second Question:
If this request is needed why does it become stale and will be rejected?

 
Thanks for any advices.
Regards
Photo of M.Nees

M.Nees, Embassador

  • 9,262 Points 5k badge 2x thumb

Posted 2 years ago

  • 0
  • 1
Photo of Pala, Zdenek

Pala, Zdenek, Employee

  • 9,078 Points 5k badge 2x thumb
Hi

I would guess the issue is with MTU = check the config for your APs and your VPN

If I remember well the MACauthentication on the EWC does have option to configure if you want the reauth to happen or not. Go to the Wlan service => authentication.

Regards
Photo of M.Nees

M.Nees, Embassador

  • 9,262 Points 5k badge 2x thumb
Hi Zdenek,

i check the MTU from headquarter to the AP with a "ping -f -l 1400 IP-of-the-AP" which is working fine with MTU of 1400. Also test with lower MTU which have no possitive effects.

Within the internal  SSID is use 802.1x Privacy - no MAC Auth.

i can not understand why an inter AP roaming will trigger a complete new authentication request ? And why is the request will on the second run ? The first run to the first radio is always accepted ?

Regards
Photo of Ronald Dvorak

Ronald Dvorak, Embassador

  • 47,040 Points 20k badge 2x thumb
I assume radio preference is enabled and that is the reason the client is switching between radio 1&2 - correct ?

I also vote for a MTU problem.
Photo of M.Nees

M.Nees, Embassador

  • 9,262 Points 5k badge 2x thumb
Yes radio preference on the client is enabled.

But the fact that after disabling radio 1 - to avoid the inter AP roaming the problem is solved speaks against the MTU problem!
I also check the possible MTU size with different "ping -f -l max-packet-size"

Are there any suggestions how to find the root cause ?
Photo of Pala, Zdenek

Pala, Zdenek, Employee

  • 9,078 Points 5k badge 2x thumb
I would try to capture (packet capture) the authentication packets to see why the authentication became stale => I expect that some packets are being lost. The question is where = client to ap, or AP to controller, or controller to radius server. (Can be configured as SITE = AP to radius server directly).
Photo of Pala, Zdenek

Pala, Zdenek, Employee

  • 9,078 Points 5k badge 2x thumb
If your MTU is 1400, what value you have at your AP?
Photo of M.Nees

M.Nees, Embassador

  • 9,262 Points 5k badge 2x thumb
Hi Zdenek,

i testet with "ping -f -l 1400". So an MTU of 1400 Bytes are going through the network - so i configured the AP also with MTU = 1400.

Do i something wrong ?
Photo of Pala, Zdenek

Pala, Zdenek, Employee

  • 9,078 Points 5k badge 2x thumb
Reagrding the reauthentication, I believe it is part of standard that authentication-association to new BSSID means new encryption keys generation. If your client does support OKC then you can enable it.
Photo of M.Nees

M.Nees, Embassador

  • 9,262 Points 5k badge 2x thumb
Oppertunistic Keying is enabled already on this WLAN Service.
Photo of Ronald Dvorak

Ronald Dvorak, Embassador

  • 47,040 Points 20k badge 2x thumb
In my opinon it make sense to see a 2nd 802.1X authentication if radio preference is enabled as the client doesn't roam between the radios - it's a new connection.

I think as a workaround you'd also disable radio preference and enable radio#1 again - I'm pretty sure that will work.
Then enable it only on one AP so you'd troubleshoot the issue with the GTAC.
(Edited)
Photo of M.Nees

M.Nees, Embassador

  • 9,262 Points 5k badge 2x thumb
Hi Ronald,

we configure prefered radio on the client devices - windows driver settings.

I see this is possible via AP "Load groups", but this is not configured.

Regards 
Photo of Frank Veen

Frank Veen

  • 492 Points 250 badge 2x thumb
Did you try enabling fast roaming?

Regards
Photo of Gareth Mitchell

Gareth Mitchell, Extreme Escalation Support Engineer

  • 5,578 Points 5k badge 2x thumb
Matthias

Do you have AP secure tunnel and is NAT involved?

-Gareth
Photo of M.Nees

M.Nees, Embassador

  • 9,262 Points 5k badge 2x thumb
Hi Gareth,

Secure Tunnel is disabled completely. NAT is not involved! 
Customers network is divided in Subnets in 10.x.x.x IP Range. HQ and Branch are connected via IP-Sec Tunnel without any kind of NAT.

Regards
Photo of M.Nees

M.Nees, Embassador

  • 9,262 Points 5k badge 2x thumb
Because we have a downtime based on this issue i open a GTAC Case to solve that - 01232203.
Photo of JK

JK

  • 160 Points 100 badge 2x thumb
Photo of M.Nees

M.Nees, Embassador

  • 9,262 Points 5k badge 2x thumb
Hi Folks,

this problem is still unresolved!!

GTAC tell me this solution:
https://gtacknowledge.extremenetworks.com/articles/Solution/Apple-clients-take-very-long-time-to-get...

i get a wireshark trace of a rejected end-system which emphases this guess:
NPS is not possible to bring the Server certificate to the client! (and then the request is rejected)

The problem of the above solution is that it only works if NPS will accept the RADIUS Request. So clients are still rejected (because of too big MTU). The reduced Framed-MTU will never reaches the problematic clients!

If i debug the RADIUS request on NAC Gateway i see the Framed-MTU value is set to 1400 (Request from EWC).

Can i change this value on the EWC?

My first guess is this is calculated based on the used AP-MTU Size. But after i changed AP MTU to 1300 is see that the Framed-MTU does not changed (1400). So this seems to be fix in the EWC Config. But from my point of view this should be calculated in conjuntion of the set AP MTU!

Regards
Photo of Erik Auerswald

Erik Auerswald, Embassador

  • 12,886 Points 10k badge 2x thumb
Hi Matthias,

I do not quite understand the MTU problem. You wrote that you can ping the AP with a 1428B IP packet (1400B ICMP Echo Request data + ICMP header + IP header), and that a Framed-MTU of 1400 is used. That seems to fit.

Additionally you write that authentication works fine with one radio disabled. That suggests that the network is able to transport the certificate.

But then you write that the server certificate cannot be transported to the client.

I would guess that one packet containing part of the certificate is lost on its way from the server (NAC) to the client (AP), ultimately resulting in a reject.

It is interesting that there seems to be reliable packet loss with two back-to-back authentication attempts. As if that crossed some rate limiting threshold.

HTH,
Erik
Photo of M.Nees

M.Nees, Embassador

  • 9,262 Points 5k badge 2x thumb
Additionally you write that authentication works fine with one radio disabled.
--> i believe that because there was an accept in NAC Manager GUI.
This was a mistake by me.
Photo of Erik Auerswald

Erik Auerswald, Embassador

  • 12,886 Points 10k badge 2x thumb
OK, so the presumed workaround did not actually work around the problem. :-(

I would suggest you use wireshark (or similar) to check the actual size of the RADIUS communication packets, because your test with ping and the Framed-MTU value suggest that the MTU size is not the problem.

If the VPN MTU is the problem, you should be able to see RADIUS packets on the interface leading to the VPN, but not on the other end exiting the VPN. There might even be ICMP Packet Too Big message visible in a packet trace. If the MTU is the problem, no larger packet at all can cross the VPN. You can verify the actual VPN MTU using "ping -f -l SIZE" (on Windows). The generated IP packet will be 28 bytes bigger than SIZE, you can see this in wireshark.

If a packet from RADIUS server to the AP containing part of a certificate is lost, this authentication session can only succeed if the re-transmit timer and count of the RADIUS client match the RADIUS server settings for re-transmits. If a re-transmitted packet arrives after the time allowed by the server, the server will answer with an Access-Reject.

Extreme Control accepts an answer inside a 5s window after sending a packet. If it takes the client longer to request a re-transmit for a lost packet from the server, the authentication session will fail (Access-Reject). This problem exists only with bigger RADIUS messages needing more than one packet (e.g. a certificate), because in most other cases the RADIUS server will treat the re-transmit request as a completely new session.

Anyway, if the problem really is an MTU problem because of the VPN, you might be able to fragment inside the VPN despite the dont-fragment bit. This of course depends on the VPN solution used.

HTH,
Erik
Photo of M.Nees

M.Nees, Embassador

  • 9,262 Points 5k badge 2x thumb
Hi Folks,

i am back from my holiday and surprise, surprise some (not very much) problems are solved!

My WLAN Problem is solved!!

The root cause was a bug in the FortiGate OS. It seems that all CAPWAP/WASSP traffic with not handled correctly by the firewall. My co-worker see this during a firewall debug. As a work-around he disable ASIC offloading for CAPWAP Packets and at once all wireless clients are running without problems!

After an update to the lastest FortiOS 5.4.1 all running well.

Thanks to all that help me with ideas!


Regards   
Photo of Ryan Mathews

Ryan Mathews, Alum

  • 8,988 Points 5k badge 2x thumb
That's a great way to come back from holiday Matthias.  

Thanks for updating the Hub with your good news and valuable information who read in the future.