cancel
Showing results for 
Search instead for 
Did you mean: 

Timeout during mgmt authentication

Timeout during mgmt authentication

Yoann_Jonard
New Contributor III

Hello everyone,

I'm facing an issue during the management authentication on X435 devices : the time to log in is quite long. (between 10 to 50s)

I took a look at the radius.log file to see that when I initiate the authentication process, the servers, sometimes, show the following :

2025-11-11 15:46:07,210: Error: (90) [etsnac connection_mgr] Unable to read data from auth server socket: 50149 timeout: 10 with error: Resource temporarily unavailable(11), attempting reconnect...
2025-11-11 15:46:07,210: Error: [etsnac connection_mgr] Closing authentication connection: 49, sockfd: 23
2025-11-11 15:46:07,210: Error: [etsnac connection_mgr] Opened authentication connection: 49, sockfd:23, server: 127.0.0.1:1300

Sometimes, it times out on NAC-1 for one time, then the switch fallback to NAC-2 and the authentication proceeds. Sometimes it times out multiple times on NAC-1 before falling back to NAC-2. Sometimes it works instantly...

There are 2 NAC engines, they both show the same output when the issue appears.

There is no firewall between the NAC engines and the LDAP servers. The LDAPS connection is working fine (the "test" button in the LDAP servers menu does not timeout and always show a successful connection). I have found no logs indicating an issue in the LDAP servers.

There are few KB with similar output :

  1. https://extreme-networks.my.site.com/ExtrArticleDetail?an=000068260
  2. https://extreme-networks.my.site.com/ExtrArticleDetail?an=000129170
  3. https://extreme-networks.my.site.com/ExtrArticleDetail?an=000061677

We are running version 25.08.13, so we are not supposed to face the bug from the 2nd KB.

We have removed, then re added the LDAP servers (as in KB 3).

Regarding KB 1, we have not tuned this parameter yet.

 

Thank you for your help !


Yoann Jonard
SIER SARL
Switzerland
4 REPLIES 4

Ryan_Yacobucci
Extreme Employee

Poor SNMP contact can result in authentication slowness. Authentications can require SNMP queries to the device for port information gathering. SNMP contact issues are known to have significant performance impacts to Control processes.

Clearing up SNMP contact issues is typically the first thing we ask in the case where there are performance issues with Control.

I'm glad you were able to find a solution!

Yoann_Jonard
New Contributor III

Hello @Ryan_Yacobucci ,

A follow-up on this matter, and thank you, you put me in the right direction.

Your /var/log/tag.log displayed something interesting : "Duplicate SNMP engine ID" for all the switches that were affected by this issue. It appears that the customer copy pasted the configuration from one switch to the others without changing this parameter.

So before setting the logs to verbose as you mentioned I took care of this issue, and changed the snmp engine ID to unique value accross all switches (I wanted the cleanest log files possible before setting the verbose mode).

Believe it or not, I was not able to reproduce the initial issue anymore after this change... 2 hours in verbose mode showed nothing abnormal, and log in time to the switches improved drastically (2-3s now, on remote sites which is normal).

So I don't know if it's a coincidence or not, and I'll keep monitoring this issue, but for now it looks like it's solved, but I have a hard time understanding why a duplicate snmp engine ID could cause a timeout for authentication...

I do not have the lab infrastructure to reproduce the issue, but if someone is interested, the setup is :

  1. X435 running 33.1.1.31
  2. Control engine running 25.08.13
  3. SNMPv3 being used between X435 and XIQ-SE
  4. LDAP authentication

TL;DR : Duplicate SNMP engine ID caused authentication slowness/timeout

Best regards,


Yoann Jonard
SIER SARL
Switzerland

Ryan_Yacobucci
Extreme Employee

The error message in the tag.log may be indicative or a processing issues within the NAC portion of the product. 
1.
Right click NAC 1 --> Webview --> Status --> Details
Check to see if there are any significant queueing issues.

2. Right click the NAC1 --> Webview --> Diagnostics --> Appliance/Server Diagnostics

Set the following to Verbose: 

Authentication Request processing - RADIUS
Authentication Request Processing - EAC
LDAP
Port Info
Rules Engine Criteria
Rules Engine Authentication
Rules Engine Authorization


Click "OK"

Perform your test to have a delayed authentication, and once completed disable diagnostics. 

The quicker you run this test, the fewer diagnostics you will have to look through. These diagnostics are very chatty. 

Once completed remove the /var/log/tag.log and /var/log/radius/radius.log from the appliance.

Your goal for looking through diagnostics to determine where the slowdown is occurring is to track the socket and ID numbers within the debug.

Start with the RADIUS.log to see the request come in, and then follow it between the tag.log and radius.log to identify where/why it is being hung up.

I would highly recommend you submit this to GTAC for analysis. 

Thanks
-Ryan


 

Hello Ryan,

Will try and keep you posted thank you.

Customer is opening a case with its partner in the meantime.

Regards,


Yoann Jonard
SIER SARL
Switzerland
GTM-P2G8KFN