Logs showing loop on uplink ports only

  • 0
  • 1
  • Problem
  • Updated 1 year ago
  • Solved
On almost all of my network closets I am getting:
<Warn:ELRP.Report.Message> Slot-1: [CLI:Voice-Vlan:3] LOOP DETECTED : 94547 transmitted, 935607 received, ingress slot:port (1:49) egress slot:port (1:49)

It always shows the vlan we use for ip phones and the uplink ports.  Only difference is sometimes it shows CLI:Voice-Vlan:2 instead of CLI:Voice-Vlan:3.
We have ELRP setup on all ports with 1:49 and 2:49 uplinks excluded.  Those are setup in a LACP LAG, which for many of the switch stacks quit working and won't reconnect.

Is there anyway to find out what other port or ??? is causing the loop?  Bandwidth isn't an issue on any of the switches.  All are less than 1% so there isn't any network storm.
Photo of Jason Hilt

Jason Hilt

  • 400 Points 250 badge 2x thumb

Posted 1 year ago

  • 0
  • 1
Photo of Patrick Voss

Patrick Voss, Employee

  • 11,474 Points 10k badge 2x thumb
If it is showing the same port ingress and egress then you need to follow the path. This is assuming ELRP is configured on the entire network.
Photo of Jason Hilt

Jason Hilt

  • 400 Points 250 badge 2x thumb
The path leads back to the core switch on all of the edge switches.  We aren't running ELRP on the core since it doesn't not handle any connections other than switches.  Wouldn't do any good since all the uplinks would be excluded.
Photo of EtherMAN

EtherMAN, Embassador

  • 6,456 Points 5k badge 2x thumb
Odds are a VOIP phone has lan cable plugged back into a hot lan port or a user brough in a small switch and plugged that into the lan side of the phone and another switch port..so the storm is only 10 /100 Mbs... Look on your edge switches and find a port where TX and RX are maxed out or equal... this would be your loop... finding small loops in a multi-ten gig core network is always fun. 
Photo of Jason Hilt

Jason Hilt

  • 400 Points 250 badge 2x thumb
That was my first thought, but the loop is only showing on the voice vlan, which is tagged, and not the untagged computer lan.  The voice vlan doesn't traverse the PC port on the phone, only the computer vlan does, so if the cable was plugged back into the wall or even a soho switch plugged in the only lan that would connect would be the untagged computer lan.
I'll look through the switches and see if any are running with high bandwidth.
Photo of Brandon Clay

Brandon Clay, Escalation Support Engineer

  • 13,086 Points 10k badge 2x thumb
I've seen this before when a stack gets broken up but not fully reconfigured, and you end up with two stacks that are both using the same MAC address. Then they see each other's ELRP packets (sourced from a MAC that they think they own) and assume it is their own ELRP packet (logging it as a loop).

It may be worthwhile to check the stacks that are reporting this to ensure that they are not using the same MAC address.

Another thing that may be helpful is to turn on ELRP on the core (but do not configure it to disable any ports). That way, you can at least see if it tends to point towards one particular edge switch.
Photo of Jason Hilt

Jason Hilt

  • 400 Points 250 badge 2x thumb
How do you check the mac address they are using?  I am pretty new to Extreme switches.
How do you check if a stack is broken?  I had already checked each stack for bandwidth and the only ports running more than 1% are the IP cameras.  All the other ports are .10 or less and not constant.
Should I run elrp on all 36 vlans or just the voice vlan?
Photo of Brandon Clay

Brandon Clay, Escalation Support Engineer

  • 13,086 Points 10k badge 2x thumb
'show switch | inc MAC' will get the MAC address for you. When I said 'broken', I meant as in split up to be used elsewhere without being reconfigured, not necessarily in a broken state. If you aren't seeing any network issues, I wouldn't expect that there is anything seriously wrong with the stacks.

Regarding ELRP on the core, I'd just do the voice vlan for now. Ultimately, it may not be a bad idea to run it on all the VLANs, but for now I'd stick with the ones that are reporting loops on your edge switches.
Photo of Jason Hilt

Jason Hilt

  • 400 Points 250 badge 2x thumb
In the sh lacp lag 1:49 detail it has this:

Lag   Actor    Actor  Partner           Partner  Partner Agg   Actor
      Sys-Pri  Key    MAC               Sys-Pri  Key     Count MAC
--------------------------------------------------------------------------------
1:49        0  0x0419 00:04:96:9b:37:50       0  0x03f5      2 02:04:96:9a:eb:4f

Enabled          : Yes
LAG State        : Up
Unack count      : 0
Wait-for-count   : 0
Current timeout  : Long
Activity mode    : Active
Defaulted Action : Delete
Fallback         : Disabled
Fallback timeout : 60 seconds
Receive state    : Enabled
Transmit state   : Enabled
Minimum active   : 1
Selected count   : 2
Standby count    : 0
LAG Id flag      : Yes
  S.pri:0   , S.id:00:04:96:9b:37:50, K:0x03f5
  T.pri:0   , T.id:02:04:96:9a:eb:4f, L:0x0419

Port list:

Member     Port      Rx           Sel          Mux            Actor     Partner
Port       Priority  State        Logic        State          Flags     Port
--------------------------------------------------------------------------------
1:49       0         Current      Selected     Collect-Dist   A-GSCD--  1013
2:49       0         Current      Selected     Collect-Dist   A-GSCD--  1014
================================================================================
Actor Flags: A-Activity, T-Timeout, G-Aggregation, S-Synchronization
             C-Collecting, D-Distributing, F-Defaulted, E-Expired

On four different stacks all of the Partner MACs are the same but the Partner Key is different.  Is that normal?
Photo of Jason Hilt

Jason Hilt

  • 400 Points 250 badge 2x thumb
Thank you Brandon Clay!
  I configured ELRP on the core to just log and excluded all ports from disable.  Ended up being a loop from an IP phone on one of our Cisco stacks that connects directly to the core.  Phone was in an empty office so there wasn't anyone to complain about it not working.
Who would have thought that Cisco would still process the ELRP packets on a blocked port.

How do you mark this as solved?
Photo of Brandon Clay

Brandon Clay, Escalation Support Engineer

  • 13,086 Points 10k badge 2x thumb
Awesome! I'm glad to hear you were able to track this down. 

I'll mark the thread as solved.
(Edited)