07-16-2021 09:51 PM
This is an issue that we have been fighting with for almost 3 years now (since we migrated to x460-G2 and x440-G2 switches
All devices are running 30.7.2.1 and we have two sites that are configured the exact same way (shown below):
At one site, we have no issues when we upgrade the Desktop stack (although that site only has 2 switches in the stack). However, when we upgrade the desktop stack at the location shown above (or if there is an extended power outage) PCs lose their IP address (switching to a 169.254.x address) and the only way to resolve the issue is to manually reboot the PC (and if that fails to connect a dumb switch between the PC and the switch, reboot the PC, and then it will regain an IP).
During the most recent power outage, the desktop switches stayed powered on as did the PCs but the Core switch was down for 42 minutes until power was restored (which resulted in 88 PCs losing their addresses). Our DHCP leases are set to 20 days and it seems somewhat unlikely that 88 PCs decided that 42 minute window was the time they needed to attempt to renew their IP. This was also the first time where a power outage of the core switch caused the issue (as the other times were when the Desktop stack was rebooted or some combination of power loss to the desktop stack and core).
The first two times we have this happen were due to extended power outages and we were not running ExtremeControl. Now we are running ExtremeControl and I am not sure if we are seeing the same behaviors from two different causes. We also have some settings configured for DHCP snooping and I’m not sure if that is somehow involved.
One strange thing I did notice is that there appear to be events in XMC during the outage even though there was no connectivity out of the site due to the core being down. In the image below, the red line shows when the power went out (10:20 AM) and then it wasn’t restored until 11:03 AM yet there are events in XMC during that window (including the IP change)
I know that I said that we had this issue before implementing ExtremeControl, but is it possible that we need to increase the radius retries or timeouts because the PCs are attempting to authenticate during the outage, failing, and then being dropped from the network? It is really weird that they wouldn’t just try to renew an address when connectivity to the core (and connected DHCP server) was restored and weirder that the drop in connectivity between the switch they are connected to (which stayed power on during the event) and the core would cause the issue for 88 PCs.
Any help would be greatly appreciated.
Solved! Go to Solution.
07-19-2021 08:44 PM
Hi Stephen,
really weird problem. Have you already been in contact with GTAC?
It is really weird that they wouldn’t just try to renew an address when connectivity to the core (and connected DHCP server) was restored
The clients don’t notice that the connectivity between core-switch and access-switch is restored. I’m not sure how Windows behaves when they couldn’t reach an DHCP-Server and received an APIPA-Address. They might only try to renew their IP-Address on a link-down and link-up event.
Most of the ideas that I had, were already discussed in the other thread, so my only Idea is: Wireshark. Wireshark on the client, on the dhcp-server, on a mirror-port of the link between core- and access-switch.
Best regards
Stefan
07-19-2021 08:44 PM
Hi Stephen,
really weird problem. Have you already been in contact with GTAC?
It is really weird that they wouldn’t just try to renew an address when connectivity to the core (and connected DHCP server) was restored
The clients don’t notice that the connectivity between core-switch and access-switch is restored. I’m not sure how Windows behaves when they couldn’t reach an DHCP-Server and received an APIPA-Address. They might only try to renew their IP-Address on a link-down and link-up event.
Most of the ideas that I had, were already discussed in the other thread, so my only Idea is: Wireshark. Wireshark on the client, on the dhcp-server, on a mirror-port of the link between core- and access-switch.
Best regards
Stefan