Yesterday, we moved from a stack of x440-48t switches to a stack of X440G2-48t-10G4 switches and a large number of systems are unable to connect to the network. They are a mix of IPs set statically via DHCP reservations and others that just use whatever address they pull. They cannot be woken up via a WoL broadcast. The systems that don't wake up can be manually powered on and then need to have the IP address set in Windows to what it is statically set to in DHCP, rebooted, then set to use DHCP again and they can then connect to the network. The switches are running 184.108.40.206 patch1-2. All systems with issues are running Windows 10 and are on a mix of hardware.
We also replaced the core switch that this stack is connected to with a X460G2-24t-10G4 220.127.116.11 patch1-2. A number of months ago, we had attempted to replace just the core switch and we saw this same behavior with systems not being able to connect, so we went back to the old hardware and hoped that replacing the core and the desktop switch would avoid the issue but it did not.
Has anyone heard of this? Is there some setting that we are missing? We do have a policy in place to send traffic on port 4000 (used to WoL) to the correct VLAN which is working, since most systems wake.
Just unplugging a cable vs "disable load sharing x" on each switch (wasn't sure if the ports were then removed from VLANs and needed to be re-added.
I ended up just unplugging one of the cables and the problem still persists, so it isn't an issue with that. There is no difference in the packet captures between the attempts that work with the mini switch in place and the capture from the attempts that fail.
Thinking about the load sharing group...if I just unplug one of the two cables, that will then just limit the traffic to that one port without the need to down the interface while doing the reconfiguring of the group, correct?
I'm not sure what that "mini" switch is doing but it's doing something ?... I understand that you have disabled everything policy/netlogin/etc and are still seeing the same issue. This makes me think it's a bootp relay issue. From your packet captures it looks like the DHCP offers are not being received by the client. Is this true?. In the captures did you see anything different in the discover/offer in the working vs non-working one?
The first thing i would do it make sure the switch is receiving the DHCP offer (It probably is). Then capture and document which port of the lag it is received on when using the "mini" switch and not (could be different ports). Make sure the packets look good. (ingress port mirror would work)
You can also check the "show bootprelay" command to see if the switch CPU is getting the offers. This is not so easy because it's a global counter, but if you can quiet the switch down to just that PC it would give you good data.
Get these things for now and let us know what you find. You can also reboot the netTools process and it might help.