Extreme Networks

Keith9 · 5 hours ago

Hello,

We have 802.1x setup to 3 windows NPS servers that verify computer certificate auth then send the radius ok message to the switch that allows it on the network. We have 3 because of redundancy, well what we thought was redundancy. They are all in different facilities connected by a 10 gig ring with another provider as backup.

One of the servers raid controllers died yesterday. Unfortunately this is server 1 on the radius command. Nobody thought anything of it because the network was just fine. Then 28 hours later, machines across the org fell off the network. Switch logs across the org in various locations were spamming "Authentication failed for Network Login 802.1x user host/computername.domain.name.com"

So we got it back by overwriting server 1 with one of the IP addresses we used already in server 2 or 3 per switch... but why does EXOS accept multiple servers if its not smart enough to cycle through them when one doesn't respond?

example:
sh configuration | i netlogin
configure radius netlogin 1 server 10.1.1.1 1812 client-ip 10.70.0.100 vr VR-Default
configure radius netlogin 2 server 10.10.1.1 1812 client-ip 10.70.0.100 vr VR-Default
configure radius netlogin 3 server 10.100.1.1 1812 client-ip 10.70.0.100 vr VR-Default

overwrite server 1's entry
configure radius netlogin 1 server 10.100.1.1 1812 client-ip 10.70.0.100 vr VR-Default

within seconds, pc's are reconnecting to the network.

Keith9 · 3 hours ago

The EXOS switches are working just fine. When server 1 is not responsive, it moves on to server 2, then finally server 3 if both 1 and 2 are not responsive.

What we did was power on the defunct server today, and it was removed from the domain late last night to prevent domain corruption (raid controller issues on 1 of 4 domain controllers).

Well it may have been removed from the domain and just a "workstation" now, but the Windows NPS role was still alive and well. No longer being joined to the domain, it started responding to the EXOS radius switch call outs and trying to proxy these to the domain it no longer had trust in and was denied. The switches were just doing what the server was telling it. We were able to disconnect the nic on the defunct server and disable the NPS service for now while we take notes on its config before rebuilding it. We did corelate that our other two servers have been servicing NPS Radius message from the switches in the last 30 hours or so that the "server 1" has been offline for.

We verified everything with support here. No issue on the Extreme side.

Moral of the story, if you have a radius 802.1x server set and it responds... the switch will listen to it. It doesn't know any better.

View solution in original post

Keith9 · 3 hours ago

The EXOS switches are working just fine. When server 1 is not responsive, it moves on to server 2, then finally server 3 if both 1 and 2 are not responsive.

What we did was power on the defunct server today, and it was removed from the domain late last night to prevent domain corruption (raid controller issues on 1 of 4 domain controllers).

Well it may have been removed from the domain and just a "workstation" now, but the Windows NPS role was still alive and well. No longer being joined to the domain, it started responding to the EXOS radius switch call outs and trying to proxy these to the domain it no longer had trust in and was denied. The switches were just doing what the server was telling it. We were able to disconnect the nic on the defunct server and disable the NPS service for now while we take notes on its config before rebuilding it. We did corelate that our other two servers have been servicing NPS Radius message from the switches in the last 30 hours or so that the "server 1" has been offline for.

We verified everything with support here. No issue on the Extreme side.

Moral of the story, if you have a radius 802.1x server set and it responds... the switch will listen to it. It doesn't know any better.

Extreme Networks

multiple radius netlogin servers - switches did not fail over to secondary or tertiary servers

multiple radius netlogin servers - switches did not fail over to secondary or tertiary servers