ELRP + UPM = Number of UPM Events in Queue for execution

  • 0
  • 1
  • Question
  • Updated 3 years ago
  • Answered
Hello!

I want to execute UPM after every LOOP detect without Queue.

Now, I see couple of loops , but UPM executes with Queue.


  • 07/22/2015 16:58:08.98 <Warn:ELRP.Report.Message> [CLI:v3000:4] LOOP DETECTED : 3241 transmited, 177 received, ingress slot:port (3) egress slot:port (1)07/22/2015 16:58:07.98 <Warn:ELRP.Report.Message> [CLI:v3002:6] LOOP DETECTED : 170 transmited, 97 received, ingress slot:port (3) egress slot:port (1)
  • 07/22/2015 16:58:05.98 <Warn:ELRP.Report.Message> [CLI:v3000:4] LOOP DETECTED : 3238 transmited, 171 received, ingress slot:port (1) egress slot:port (3)
  • 07/22/2015 16:58:04.98 <Warn:ELRP.Report.Message> [CLI:v3002:6] LOOP DETECTED : 167 transmited, 91 received, ingress slot:port (3) egress slot:port (1)
  • 07/22/2015 16:58:02.98 <Warn:ELRP.Report.Message> [CLI:v3000:4] LOOP DETECTED : 3235 transmited, 165 received, ingress slot:port (3) egress slot:port (1)
  • 07/22/2015 16:58:01.98 <Warn:ELRP.Report.Message> [CLI:v3002:6] LOOP DETECTED : 164 transmited, 85 received, ingress slot:port (3) egress slot:port (1)
  • 07/22/2015 16:57:59.98 <Warn:ELRP.Report.Message> [CLI:v3000:4] LOOP DETECTED : 3232 transmited, 159 received, ingress slot:port (1) egress slot:port (3)

* exos_vm_sw1.91 # show upm history--------------------------------------------------------------------------------
Exec  Event/              Profile          Port Status Time Launched       
Id    Timer/ Log filter
--------------------------------------------------------------------------------
100   Log-Message(loopdete loopdetect       --- Running 2015-07-22 17:22:57   
99    Log-Message(loopdete loopdetect       --- Pass    2015-07-22 17:21:56   
98    Log-Message(loopdete loopdetect       --- Pass    2015-07-22 17:20:56   
97    Log-Message(loopdete loopdetect       --- Pass    2015-07-22 17:19:56   
96    Log-Message(loopdete loopdetect       --- Pass    2015-07-22 17:18:56   
95    Log-Message(loopdete loopdetect       --- Pass    2015-07-22 17:17:56   
--------------------------------------------------------------------------------
Number of UPM Events in Queue for execution: 10
* exos_vm_sw1.92 # 


* exos_vm_sw1.92 # show upm profile "loopdetect" Created at : 2015-07-22 10:43:41
Last edited at : 2015-07-22 16:30:09

************Profile Contents Begin************
configure vlan $EVENT.LOG_PARAM_1 delete ports $EVENT.LOG_PARAM_7
set var temp $TCL(after [expr 60*1000])
configure vlan $EVENT.LOG_PARAM_1 add ports $EVENT.LOG_PARAM_7 tagged

************Profile Contents Ends*************

Profile State: Enabled
Profile Maximimum Execution Time: 75
Events and ports configured on the profile:
===========================================================
Event                           Port list/Log filter
===========================================================
device-detect               :
device-undetect             :
user-authenticated          :
user-unauthenticated        :
log-message                 :     loopdetect                      
identity-detect             :
identity-undetect           :
identity-role-associate     :
identity-role-dissociate    :
===========================================================
Photo of Pavel Koroteev

Pavel Koroteev

  • 250 Points 250 badge 2x thumb

Posted 3 years ago

  • 0
  • 1
Photo of OscarK

OscarK, ESE

  • 7,912 Points 5k badge 2x thumb
Hello Pavel, what do you mean you want to execute UPM without queueu ? A UPM profile is always run by adding it to the current UPM scheduler and run as soon as possible.
Photo of Pavel Koroteev

Pavel Koroteev

  • 250 Points 250 badge 2x thumb
Ок, then UPM is not what I need.

But what tools I need to run commands in parallel or so ?

Here is what I want to do every time when LOOP detected :

configure vlan $EVENT.LOG_PARAM_1 delete ports $EVENT.LOG_PARAM_7
set var temp $TCL(after [expr 60*1000])
configure vlan $EVENT.LOG_PARAM_1 add ports $EVENT.LOG_PARAM_7 tagged
Photo of OscarK

OscarK, ESE

  • 7,912 Points 5k badge 2x thumb
Hello Pavel, you dont solve the loop, you temporary remove the vlan and add it back after which the loop starts again.I suggest you disable with UPM the port reported by ELRP so the loop will not return.
Photo of Pavel Koroteev

Pavel Koroteev

  • 250 Points 250 badge 2x thumb
Why do you think 'the loop don't solved' after 1 minute, for the time that vlan was removed from port ? It can be very temporarily loop created by some rebooting device for example.

If only one packet received by ELRP why need I shut the port and wait for admin to recover? (Again, it is not edge port, it is some peering, interconnect, and so on. Here are couples or tens vlans, and if one of its loop, the others must work )


It's way better the box to recover automatically by itself , and if loop detected again than block/remove vlan again.

Here is how it works on D-Link switches , flawlessly I think :


18337 2014-03-09 07:51:28 CRIT(2) Port 23 VID 1075 LBD loop occurred. Packet dis                                  card begun
18336 2014-03-09 07:51:28 CRIT(2) Port 23 VID 200 LBD loop occurred. Packet disc
                                  ard begun
18335 2014-03-09 07:51:27 INFO(6) Port 23 VID 1075 LBD recovered. Loop detection
                                   restarted
18334 2014-03-09 07:51:27 INFO(6) Port 23 VID 200 LBD recovered. Loop detection 
                                  restarted
18333 2014-03-09 07:50:28 CRIT(2) Port 23 VID 1075 LBD loop occurred. Packet dis
                                  card begun
18332 2014-03-09 07:50:28 CRIT(2) Port 23 VID 200 LBD loop occurred. Packet disc
                                  ard begun
18331 2014-03-09 07:50:26 INFO(6) Port 23 VID 1075 LBD recovered. Loop detection
                                   restarted
18330 2014-03-09 07:50:26 INFO(6) Port 23 VID 200 LBD recovered. Loop detection 
                                  restarted
18329 2014-03-09 07:49:27 CRIT(2) Port 23 VID 1075 LBD loop occurred. Packet dis
Photo of Grosjean, Stephane

Grosjean, Stephane, Employee

  • 13,284 Points 10k badge 2x thumb
Hi,
can't you just let ELRP block the port?
Photo of Pavel Koroteev

Pavel Koroteev

  • 250 Points 250 badge 2x thumb
No, block port is not appropriate. I think we need to block vlan with a LOOP, and then to unblock it. Not a port with hundreds of vlans and gigabits of traffic.

Or do you think I'm wrong ?
Photo of Grosjean, Stephane

Grosjean, Stephane, Employee

  • 13,284 Points 10k badge 2x thumb
Ok, that makes sense. Then, I assume this is because you have a loop between two edge switches with a direct (miss)connection, and it creates a loop through the core, right? And you don't want to shut the uplink.

If this is the case: Have you excluded the uplink from ELRP? (exclude-list parameter)

Another possibility is to look at 16.1 and the new ELRP capability to shut either ingress or egress port (where loop has been detected).

If that can't help for some specific design consideration, then your approach makes sense, but I wouldn't try to re-add the faulty port that way. Rather send a trap and let someone fix the loop before.

Edit: and if you're using Python, you could also send an email to notify the loop ;)
(Edited)
Photo of Pavel Koroteev

Pavel Koroteev

  • 250 Points 250 badge 2x thumb
If a loop creates on uplink , what to do ?

In 16.1 it's not better. Again, if a loop creates on egress ports , then if I shutdown ingress port, will a loop on egress port disappear ? No, traffic sended to egress port will be looping to me.

Yes, send trap and write log is better than nothing. But I prefer the box to re-add vlan to port by itself , then if loop detected again script will delete it once more, and so on.

Here is how it works on D-Link switches 

18337 2014-03-09 07:51:28 CRIT(2) Port 23 VID 1075 LBD loop occurred. Packet dis                                  card begun
18336 2014-03-09 07:51:28 CRIT(2) Port 23 VID 200 LBD loop occurred. Packet disc
                                  ard begun
18335 2014-03-09 07:51:27 INFO(6) Port 23 VID 1075 LBD recovered. Loop detection
                                   restarted
18334 2014-03-09 07:51:27 INFO(6) Port 23 VID 200 LBD recovered. Loop detection 
                                  restarted
 
Photo of Grosjean, Stephane

Grosjean, Stephane, Employee

  • 13,284 Points 10k badge 2x thumb
Hi Pavel,

You said: "Again, if a loop creates on egress ports , then if I shutdown ingress port, will a loop on egress port disappear ? No, traffic sended to egress port will be looping to me."

I don't understand that part. A loop implies that 2 ports are (wrongly) linked together in the same vlan(s). Shutting down any of the two ports, or some carrier ports (the uplinks) will break the loop. So the idea is to have an ELRP configuration that would let the carrier links open (so that you don't block tens of vlans) and just block one of the two ports creating the loop (that are most likely in a single vlan).

This can be achieved with the exclude-list and eventually with selecting ingress or egress behavior (16.1), depending on the design.

Unless you have a very specific design, that should address most of the use cases. If this is happening on the Core, where every ports are carrying a lot of vlans, a cabling error would impact every vlans as well (either you have several loops in all broadcast domains, or you have a mix of loops and blackholes), so shutting down the port is fine imho.

Now, if you are monitoring that in the Core and you want to react to an edge created loop, the best approach is to shut the port at the edge, not in the core.

If you can't shut the port at the edge because this is not your administrative domain and you are forced to do it only on your carrier port, then your approach makes perfect sense to me. I'm a bit worried to have a long pause in an UPM script, because it pauses every UPM scripts.

Maybe then, a Python App would be a better approach. I need to validate a few things on that.
Photo of Pavel Koroteev

Pavel Koroteev

  • 250 Points 250 badge 2x thumb
You said "I don't understand that part. A loop implies that 2 ports are (wrongly) linked together in the same vlan(s). Shutting down any of the two ports, or some carrier ports (the uplinks) will break the loop."

Let's imagine something simpler. I have only one so-called 'carrier' port on x670 switch, with about ten vlans on it. ELRP shows LOOP on one of them, vlan v1000.
All ten vlans terminated on that x670 and have ip addresses.
Then, if traffic will be looping on that v1000 it'll be real problem for the switch CPU, processes, and maybe that x670 'turns into pumpkin'.
1. There is no another ports to shut by ELRP.
2. Looping traffic is going only one port, back and forth
3. I have some options :
 3.1. to shut the port, but then I'll lost connectivity on all ten vlans
 3.2. to do nothing, but then I'll lost connectivity on all switch
 3.3. to somehow block traffic on v1000 on that port - simplest way to delete vlan from port - ' configure vlan $EVENT.LOG_PARAM_1 delete ports $EVENT.LOG_PARAM_7 ' . then I could send trap/write mail, and wait for admin to reconfig. But, I don't want to wait for human. I want to device try to recover itself.
 3.4. to somehow block traffic on v1000 on that port , wait for 60-300 seconds, try to recover, and so on.


You said "If you can't shut the port at the edge because this is not your administrative domain and you are forced to do it only on your carrier port, then your approach makes perfect sense to me." - Exactly!
Or, if it's in my admin domain, but switch there can't do LOOP detection.

You said "I'm a bit worried to have a long pause in an UPM script, because it pauses every UPM scripts." 

Recently I was thinking UPM can be multithreaded ... And there wont be a queue.

So maybe UPM is not my choice for this deal.
Photo of Grosjean, Stephane

Grosjean, Stephane, Employee

  • 13,284 Points 10k badge 2x thumb
Python App (15.7 and above) can be multithreaded. Maybe that's where you want to look at. Need to figure out a way to detect the loop, maybe still through UPM. Need to validate that.
Photo of Daniel Flouret

Daniel Flouret, Employee

  • 7,470 Points 5k badge 2x thumb
Pavel,

ELRP has two configuration options that can help you.

The first one will let you exclude certain ports that you don't want disabled under any circumstance (uplinks, EAPS ports, etc.). You should add your uplink/trunk ports to this list so a loop in one vlan does not affect tens or hundreds of vlans.
configure elrp-client disable-ports [exclude | include] [  | eaps-ring-ports]

The second option will let you disable a port for a period of time and then re-enable it.
configure elrp-client periodic  ports [ | all] interval  [log | log-and-trap | trap] {disable-port {{duration } | permanent}
In this case, ELRP will disable the offending port for the period of time you configured, and then will re-enable it. If the loop was temporary, then the port will resume its normal activity. If the loop remains, the port will be disabled/enabled again, and again, and again, until some external action is taken.
(Edited)
Photo of Pavel Koroteev

Pavel Koroteev

  • 250 Points 250 badge 2x thumb

Hi, Daniel.


You dont understand what I need, reread please my posts.


1) blocking another port will not help me cause loop is on port you calling uplink, and not on port but rather on one or couple vlans.

You may call ot Downlink as well. The loop creates another device or another network, as you like.


2) 'disable port' is not what I need cause looping is only one vlan. Why must I shut port and thus shut traffic in all vlans ?


It is not discussion " why Pavel do not need what he want. Let's show him that."


It is discussion "How to create what Pavel need"