What would you to do mitigate/prevent loops in this scenario?

  • 0
  • 2
  • Problem
  • Updated 11 months ago
  • Solved
I've got a challenging situation and I'd love to hear opinions on the right way to deal with it.

The variables:
Understaffed currently, anything requiring a lot of administrative time/effort will be hard to maintain.
We have a large number of small unmanaged switches that don't send out BPDU packets when plugged in.  These will be replaced, but this will take time due to budget restraints/end of year timing.
Large complex network supporting a manufacturing floor where equipment/workstations move often, far too often.  Sometimes several times in a week.

Last week we had an outage when a helpful person on the floor saw a stray cable lying on a desk and plugged it into a small knockoff 5 port gig switch.  This caused a loop that became very difficult to track down due to other issues.

The stack in question was a 6 switch stack, EAPS ring connected to the core via 20gb LAG ports.

ELRP seems like a good idea, except that requires constant updates of ELRP when vlans are moved between ports when manufacturing moves desks/test equipment/printers around.  This feels really prone to human error.

BPDU guard seems like a good idea, except these chinese knockoff switches don't send out BPDU packets and happily just loop away.

STP doesn't work without BPDU being sent.

Broadcast limits seem reasonable, maybe on the uplink ports?  I've noticed setting broadcast/multicast limits on large stacks (300+ ports) can cause a sustained CPU load that makes me uncomfortable.

Thoughts?  I'd like to have a sustainable solution to this problem that will get us through the next 3 months when I can replace all these little desk switches with managed 430-8 models.
Photo of Ron Prague

Ron Prague

  • 742 Points 500 badge 2x thumb

Posted 11 months ago

  • 0
  • 2
Photo of David Coglianese

David Coglianese, Embassador

  • 6,284 Points 5k badge 2x thumb
You should be able to run ELRP on a "NoLoop" VLAN that you tag on every port. This way when you change the user VLAN you do not need to do anything to the ELRP config.
Photo of Ron Prague

Ron Prague

  • 742 Points 500 badge 2x thumb
I hadn't considered that option David, thank you.
Photo of Patrick Voss

Patrick Voss, Alum

  • 11,574 Points 10k badge 2x thumb
The STP option should work just fine with edge safeguard configured. It waits for it to receive a BPDU back and will shut down the port if it does. This does not require the other side to be configured for STP or send BPDUs. The loop will take care of that automatically.
Photo of Ron Prague

Ron Prague

  • 742 Points 500 badge 2x thumb
Yeah Patrick, that's the behavior I expected to see but in testing on Sunday STP never brought the loop port down and there were not STP events logged.
Photo of Patrick Voss

Patrick Voss, Alum

  • 11,574 Points 10k badge 2x thumb
That's interesting. Can you send us the "show config stp" and "show stp <domain name>" from one of the switches you were testing with?
(Edited)
Photo of Brandon Clay

Brandon Clay, Escalation Support Engineer

  • 13,254 Points 10k badge 2x thumb
Sounds like the edge switches might not forward BPDUs at all. I think David's suggestion might be worth a shot, assuming these edge switches will forward tagged frames.
Photo of Ron Prague

Ron Prague

  • 742 Points 500 badge 2x thumb
I'll be happy to this weekend once production shuts down.

One thing I didn't post, is that in testing BPDUguard worked as expected using an x440-g2-12p to test with as well as a Netgear GS724T, so I think the STP configuration on the stack is fine, I really think this is the knockoff switch that our manufacturing company purchased without involving us.
Photo of EtherMAN

EtherMAN, Embassador

  • 6,688 Points 5k badge 2x thumb
I am curious as to why configuring a hardware based function like rate-limit broadcast or mcast would be driving the CPU up in a large stack... I could see a spike when you do the config because cli and commands are software.  I can't see why the cpu would continue to be high just because you enabled this function.  We use this system wide on all customer facing interfaces and getting a trap directly from the offending port makes it easy to track down.  
Photo of Evan Kuckelheim

Evan Kuckelheim

  • 658 Points 500 badge 2x thumb
I've had the same issue with small unmanaged switch loops. K12 environment here, Sounds very similar to your environment. It appears that the unmanaged switches that were purchased for us do not let tagged packets egress for my STPD tagged vlan. I enabled rate limit to disable ports just on the edge ports not uplink ports.

configure port 1:17 rate-limit flood broadcast 40000 out-actions log trap disable-port
configure port 1:17 rate-limit flood multicast 40000 out-actions log trap disable-port
configure port 1:17 rate-limit flood unknown-destmac 40000 out-actions log trap disable-port

This works great for me in my environment.