Header Only - DO NOT REMOVE - Extreme Networks

Switch Feature: RADIUS Dead


Userlevel 6
During a current project i learned that cisco supports a great feature that i missed currently (and a long time before) at all EOS and EXOS switches !

Marking a unreachable RADIUS Server as DEAD and skip them at all following Authenication session. If the RADIUS Server is accessable again use it again.

EXOS and EOS have to beat every Authentication session through the configured timeout and re-tries to use the failover system. If the primary RADIUS failed the complete Authentication process needs more time because of waiting for timeouts....

Cisco achieve this with a "RADIUS Server dead" status.

http://www.cisco.com/c/en/us/products/collateral/ios-nx-os-software/identity-based-networking-servic...

I think EXOS (and maybe EOS) should be enhanced with such a feature.

Regards

4 replies

Userlevel 4
Matthias,

if you change the Radius Algorithm to use both Radius Servers will reduce Radius Request for DOWN radius servers

Slot-1 Stack.1 # configure radius algorithm round-robin

I know we worked on the function to reduce such dead radius calls and switch the main priority to the working radius. Let me check where we are with that. as this make the function you asking for.

regards
Bastian
-
Userlevel 6
Hi Bastian,

thanks for clarifying and (hopefully) help to getting this useful feature ...

Unfortunately RADIUS round-robin feature are only available on recent EXOS and EOS (S/K-Series). SecureStack (B5/C5) does not provide this feature.

Regards
Userlevel 7
Hi,

having suffered (as a user) through a 4h network outage caused by using Cisco's RADIUS Dead setting, I'd say this is a problematic feature. In this case all RADIUS servers were unavailable for a few minutes due to a network outage, resulting in the switches marking them as Dead. The switches then did not try to reach any of the configured RADIUS servers for the configured dead time of 4h. Since the network used dot1X. Because of re-authentication timers expiring, network access was lost for basically everybody.

To avoid this failure case, an implementation of a RADIUS Dead feature should ignore the Dead status if all RADIUS servers are lost.

Best regards,
Erik
Erik Auerswald wrote:

Hi,

having suffered (as a user) through a 4h network outage caused by using Cisco's RADIUS Dead setting, I'd say this is a problematic feature. In this case all RADIUS servers were unavailable for a few minutes due to a network outage, resulting in the switches marking them as Dead. The switches then did not try to reach any of the configured RADIUS servers for the configured dead time of 4h. Since the network used dot1X. Because of re-authentication timers expiring, network access was lost for basically everybody.

To avoid this failure case, an implementation of a RADIUS Dead feature should ignore the Dead status if all RADIUS servers are lost.

Best regards,
Erik

That's an interesting scenario Erik pointed out which should be considered. In a Cisco environment you should most likely use multiple radius server groups in aaa config to avoid this problem - one with dead timers and a fallback group without dead timers. But that's more or less only theorie. In XOS we neither have dead timers nor server groups (as well as a few other neat features).

Reply