Intermittent trouble - presents itself as DHCP issue

  • 0
  • 1
  • Problem
  • Updated 3 years ago
  • Solved
Hello,
We have a stack of 670-g2's running 16.1.2.14 patch1-1  The problem presents itself on a particular vlan.  Users report not being able to connect (they connect using DHCP).  The issue is reproducable. I connect a computer to a copper SFP in the 670-g2 stack that has been setup on the same subnet, DHCP does indeed NOT work.  I static my computer on the subnet, and am able to connect, it DOES work. The only fix i have found was to completely remove the subnet and re-add it from scratch to the 670-g2 stack.  Then it immediately starts working again.  It seems to happen roughly every couple of weeks, though I can't develop an exact pattern. Other subnets on the stack work just fine.  

(We have had this issue prior with different hardware 670s (non g2) running older... 15 code.   In this case it was a different vlan in question but all the same symptoms)

Setup
Users --> 670g2 stack --> Wan firewall --> Windows DHCP server

To provide additional information, I have access to the wan firewall logs.  Typical / working communications I see the WAN firewall seeing a source of the subnet's ip interface talking to the Windows DHCP server on service dhcp-relay many entries of this.
When the subnet is "broken" instead I see the Windows DHCP server talking to the subnet's ip interface on service bootp (just one entry and nothing else).  So the source is the DHCP server not the interface as normal.

The windows server admin has looked at the DHCP server and can't find anything there (we're not running out of leases, the server can communicate to the 670-g2 stack, etc).  

The subnet looks like from ospf perspective, the network knows about it.

Instead of doing the band-aid fix (remove and re-add the subnet) I'd like to try and get to the bottom of what this could be...

I have opened a case with TAC but figured it would be great to get additional tips / tricks / ideas here on the community.

Thanks in advance,
Sarah
Photo of Sarah Seidl

Sarah Seidl

  • 1,356 Points 1k badge 2x thumb

Posted 3 years ago

  • 0
  • 1
Photo of Mike D

Mike D, Alum

  • 3,852 Points 3k badge 2x thumb

Hello Sarah,

re:presents itself as DHCP issue

It sounds like you have done a pretty thorough debug from each of the major players in the dhcp dialogue.  All of those data points will serve you well as you work through the more confusing facts.    

I have no magic bullet for you - but in my experience the methodical approach you're using for analysis will eventually uncover the missing puzzle piece. 

DHCP protocol symptom.  That's a made-to-order target for a protocol analyzer. If its feasible get a trace of the dhcp traffic. Server side at least - but both client side and server side would improve your odds.   It may not crack the case but will add more solid data to your cause either way.

Regards,

Mike

PS: Glad to see you use the hub community as a resource - tons of bright experienced folks here.  Also - you're in good hands with our TAC.  Its a curious problem description - I hope you'll share the resolution when you get there. 

Photo of Keith Obermeier

Keith Obermeier

  • 430 Points 250 badge 2x thumb
dhcp traffic is layer 2.  Firewalls also generally operate on layer 3.  When there is a hop across layer 3 with dhcp, you usually need a ip-helper address defined so the layer 3 device can relay the layer 2 dhcp broadcast traffic to the dhcp server.  Check to see if your firewall is missing a ip helper specifically for your broken vlan.  

Additionally, a stateful firewall may appear to be the source of intermittent problems because it can block or unblock traffic based on the context of the traffic that happened prior. that suggests that you should focus the troubleshooting on the firewall.
Photo of Sarah Seidl

Sarah Seidl

  • 1,356 Points 1k badge 2x thumb
Thank you