Load Balancing 802.1x RADIUS traffic to NAC.

  • 0
  • 2
  • Problem
  • Updated 3 years ago
  • Solved
Hi there,

I'm having some issues using LSNAT load balancing with 802.1x RADIUS requests on the S Series or N Series to some NAC appliances at the back end.

With my client switch configured to send RADIUS requests to the VIP address on the S Series, 802.1x auth fails, but MAC auth is fine. The LSNAT load balancing is configured with four NAC appliances as real servers, though only one is "in service" to aid troubleshooting at the moment.

The VIP address of the load balancers are configured as load balancers in NAC manager.

With my client switch configured to send RADIUS requests direct to real IP address of the single NAC appliance the load balancer was configured to use, 802.1x and MAC auth are successful.

I've tried this using B series and D series as client switches, and tried the same LSNAT configuration on the S Series and N Series with identical results. When using the VIP address, 802.1x fails but MAC auth is fine.

NAC Manager shows the following error message when 802.1x auth fails:
“Authentication request became stale, challenge sent, no response received from client (switch 192.168.132.115/end-system).”

Wireshark proves no packets are being dropped between NAC and switch. The final challenge (before the failure) that is sent out by NAC reaches the uplink port on the switch.

It appears that the EAP-TLS communication between client PC and NAC  is breaking down some how.

Has anyone has seen similar issues?

Thanks,
Mark.
Photo of Mark Lamond

Mark Lamond

  • 456 Points 250 badge 2x thumb

Posted 4 years ago

  • 0
  • 2
Photo of Jeremy

Jeremy, Embassador

  • 9,788 Points 5k badge 2x thumb
I see the same issue.  Anyone? 
Photo of Joseph Burnsworth

Joseph Burnsworth

  • 2,328 Points 2k badge 2x thumb
What version of NAC are you running? Also, what version of OS is on the client?
Photo of Jeremy

Jeremy, Embassador

  • 9,788 Points 5k badge 2x thumb
6.3.0.174 (NAC)... Client is latest version of Mac OS X.. 

When I move the switch back to the NAC group not using LSNAT, 802.1x auth works fine.
Photo of Joseph Burnsworth

Joseph Burnsworth

  • 2,328 Points 2k badge 2x thumb
Can you display your LSNAT config?
Photo of Jeremy

Jeremy, Embassador

  • 9,788 Points 5k badge 2x thumb

!

configure terminal

!

 ip slb real-server access unrestricted

!

 ip slb serverfarm DNS

  real 10.3.10.10 port 53

   faildetect probe one ping

   inservice 

   exit

  real 10.3.10.11 port 53

   faildetect probe one ping

   inservice 

   exit

  exit

 ip slb serverfarm NAC_Pool

  real 10.3.10.147 port 1812

   faildetect probe one check_nac

   inservice 

   exit

  real 10.3.10.147 port 1813

   faildetect probe one check_nac

   inservice 

   exit

  real 10.3.10.148 port 1812

   faildetect probe one check_nac

   inservice 

   exit

  real 10.3.10.148 port 1813

   faildetect probe one check_nac

   inservice 

   exit

  exit

 ip slb serverfarm WindowsAuth

  real 10.3.10.10 port 636

   faildetect probe one ping

   inservice 

   exit

  real 10.3.10.11 port 636

   faildetect probe one ping

   inservice 

   exit

  real 10.3.10.12 port 636

   faildetect probe one ping

   inservice 

   exit

  exit

!

 ip slb vserver vDNS

  virtual 192.168.20.20 udp 53

  serverfarm DNS

  udp-one-shot 

  inservice 

  exit

 ip slb vserver NAC_vIP

  virtual 192.168.20.10 udp 1812

  sticky timeout 30

  serverfarm NAC_Pool

  udp-one-shot 

  inservice 

  exit

 ip slb vserver WindowsAuthVIP

  virtual 192.168.20.30 tcp 636

  sticky type sip

  serverfarm WindowsAuth

  udp-one-shot 

  inservice 

  exit

 ip slb vserver WindowsAuthVPI

  exit

!

exit

!

end



You can ignore the WindowsAuth and DNS stuff.. 

Photo of Doug Hyde

Doug Hyde, Technical Support Manager

  • 20,710 Points 20k badge 2x thumb
Photo of Mark Lamond

Mark Lamond

  • 456 Points 250 badge 2x thumb
We ended up getting professional services involved to take a look at the issue.

Looking back through my notes, the reason for the problem is because is because our client certificate could not fit in one packet so was being fragmented across multiple packets. This is something LSNAT couldn't deal with at the time, so fragments were being dropped causing the TLS conversation to fail.

There was a bug fix in the S series firmware v8.31.01.005 which sounds like a similar issue:
"Fragmented packets are not allowed to traverse across an LSNAT6/4 or LSNAT4/6 vserver, the packets will be dropped"

In our case we were using straight LSNAT IPv4 to IPv4 with no IPv4/6 or IPv6/4 conversion.
I've never tried it again since the fix, might give it a go if I have time.

What hardware are you running and what firmware version are you on?

We did try a few things to reduce the size of the EAP packets, but from what i remember our client certificate was just too big.

Here are a few tips on that which were relevant when we had the issue with our NAC version - I would advise treading carefully, lots of potential to break stuff :). Use wireshark/tcpdump on both client side and NAC side to monitor how the packets appear before and after LSNAT. 

I.                    To reduce fragmentation on NAC:

Add the following appliance or appliance group properties and then enforce the appliance:

RADIUS_EAP_TLS_FRAGMENT_SIZE=1200
RADIUS_INNER_EAP_TLS_FRAGMENT_SIZE=1024

This will reconfigure the eap.conf and inner eap configuration files 

II.                  To reduce packet size from client:

Microsoft’s KB on the subject (http://support.microsoft.com/kb/883389):

The Extensible Authentication Protocol (EAP) packets of the RADIUS server are large when some firewall programs drop the UDP fragments to help protect the network. Framed MTU is used with EAP authentication to notify the RADIUS server about the Maximum Transmission Unit (MTU) negotiation with the client. The RADIUS server communicates with the client, so that the RADIUS server does not send EAP messages that cannot be delivered over the network. The default attribute value of the framed MTU for the IAS server is 1,500. You can set the attribute to a minimum of 64 and a maximum of 1,500. To avoid the fragment issues, you can set the attribute value to 1,344.

Thanks,
Mark
(Edited)
Photo of Jeremy

Jeremy, Embassador

  • 9,788 Points 5k badge 2x thumb
We are running 8.32.2.0008 on the S4.  Havn't looked into fragment size but I will take a look at that.  We use NAC as a RADIUS server.
Photo of Joseph Burnsworth

Joseph Burnsworth

  • 2,328 Points 2k badge 2x thumb
Yes, the config looks good. Im sorry I could not help on this. The GTAC gods have spoken, and I would follow the firmware path as they said.

Hope it gets resolved for you
Photo of Jeremy

Jeremy, Embassador

  • 9,788 Points 5k badge 2x thumb
Yeah, we are running the latest 8.32 firmware the S4 can take for our 180 cards.  Using LSNAT makes the response time a LOT slower as well.  I also notice when I authenticate, different parts of the communication will hit different NAC appliances.  When that happens, it doesn't work.  When all the auth stuff hits the one NAC, it works.  Tried sticky sip but that didn't help. 
Photo of Joseph Burnsworth

Joseph Burnsworth

  • 2,328 Points 2k badge 2x thumb
have you tried the leastConnections command in the LSNAT?

I think it should give a more one to one rather than a multiple hit or round robin
Photo of Jeremy

Jeremy, Embassador

  • 9,788 Points 5k badge 2x thumb
Not yet.  I will test it more tomorrow when I get back in.
Photo of Mike D

Mike D, Alum

  • 3,852 Points 3k badge 2x thumb
Hello Jeremy,
Why oneshot?  Can you test without this?

I think you guys are on the right track regarding tying the reals to a specific client.  There is no distributed database - so while its udp, the real server initially auth'ing the client needs to stay with them. Auth has never been my sharpest skill but it seems like the auth process needs to hold at least some state for updates or challenges or other auth magic.

Regards,
Mike
(Edited)
Photo of Mike D

Mike D, Alum

  • 3,852 Points 3k badge 2x thumb
After boning up a bit on the lsnat app:
one shot deletes the binding after 1 second.  Perfect for access level DNS for example.  Normally tearing down a NAT binding would not be an issue to radius but in the radius  load balance application a client needs to stick with the real server used for initial radius auth - there is no sharing of info from one real to another.
So one shot operation is the opposite of what is called for here.  

Try sticky type sip with a timeout of 65k.
(Not tested)


Mike