ExtremeSwitching (EXOS)

Custom python process killed when chaning switch's configuration

  • 1.  Custom python process killed when chaning switch's configuration

    Posted 02-21-2020 12:54

    Hi community.

     


     

    #!/usr/bin/python

    '''This script create a socket in EXOS that listen arp paquets at data plane and send that ARP request to the vlan “foo”.

    We have “silent hosts” that we need to answer ARP request in the vlan foo. Once the silent host send the ARP reply, we can assignate a role and vlan using Identify Manager.
    '''


    import sys, logging
    from exos.api import expkt as socket

    def main():
       
    if not hasattr(sys, 'expy') or not sys.expy:
           
    print "Must be run within expy"
           
    return

        sock = socket.socket(socket.PF_EXOS_PACKET, socket.SOCK_RAW, socket.htons(0x0806))
        sock.setup_filter(
    "filter", action=socket.UP_AND_CONTINUE, ethertype=0x0806)

       
    while True :
            pkt, add = sock.recvfrom(
    8192)
            sock.sendto(pkt,
    vlan_name="foo")

    try:
        main()

    except BaseException, e:
       
    print ("Exception on startup, {}".format(e))

     


     

    I am testing with an X460G2-24t-G4 , 21.1.5.2 patch1-7 and  30.5.1.15 software version, both of them has this issue I will describe.

    I create the “arpClone” process:

     

    * TEST_LAB.69 # create process arpClone python-module arpClone start auto vr VR-Default

    * TEST_LAB.69 # show process "arpClone"

    Process Name     Version  Restart    State             Start Time

    -------------------------------------------------------------------------

    arpClone         User        0    LoadCfg      Fri Feb 21 13:47:47 2020

    * TEST_LAB.70 # debug ems show trace "arpClone" all

    02/21/2020 13:47:48.589265 [1] <arpClone:ipml> Created service group 1

    02/21/2020 13:47:48.589734 [2] <arpClone:ipml> Added service 0x41aff0 to group 1

    02/21/2020 13:47:48.589773 [3] <arpClone:ipml> Added service 0x41aee0 to group 1

    …...

    02/21/2020 13:47:48.747425 [99] <arpClone:dm> Added Object: upm, ID 50000022, Index 34, State READY, Flags 0000, (nil), LEN 104

    ...

     

    The process start ok, is traced , and send the ARP request via socket to the vlan “foo”.

    So far so good!:grinning:

     

     

    But there is a problem…:sweat_smile:

    This process is killed, after any change of configuration’s switch. If create a vlan, for example:

    * TEST_LAB.71 # create vlan 45

    (the prompt is lost about 5 seconds..)

    * TEST_LAB.72 # debug ems show trace "arpClone" all

    ERROR: Process "arpClone" did not respond in time with trace information.

     

    At this moment, process is still working, the socket is sending ARP request to the vlan “foo”.

    But after about 6 minutes, the process is killed and I can show this log:

    * TEST_LAB.72 # debug ems show trace "arpClone" all

    ERROR: Process "arpClone" did not respond in time with trace information.

    * TEST_LAB.74 # show process "arpClone"

    Process Name     Version  Restart    State             Start Time

    -------------------------------------------------------------------------

    arpClone         User        0    LoadCfg      Fri Feb 21 13:47:47 2020

    02/21/2020 10:52:47.26 <Warn:cm.database> Configuration database unlocked

    02/21/2020 10:52:47.25 <Erro:DM.Error> Process arpClone Failed

    02/21/2020 10:52:47.24 <Warn:cm.database> Configuration database locked

    02/21/2020 10:52:47.24 <Erro:EPM.proc_conn_lost> Connection lost with process arpClone

    02/21/2020 10:52:45.76 <Crit:Kern.Alert> 776cc15c  00000000

    ...

    02/21/2020 10:52:45.50 <Crit:Kern.Alert> 776cc13c  00000000

    02/21/2020 10:52:45.49 <Crit:Kern.Alert> Code:

    02/21/2020 10:52:45.49 <Crit:Kern.Alert>

    02/21/2020 10:52:45.49 <Crit:Kern.Alert> Process expy pid 4203 died with signal 6

    02/21/2020 10:52:45.40 <Warn:EPM.proc_kill> Process arpClone ID 4203 killed

    02/21/2020 10:52:45.40 <Erro:EPM.Msg.timer_thread> Because the main (2012753728) thread of process 4203, has not responded within 41 periods of 10 seconds, the process will be terminated.

     

     

    How may I solve this? Did I forget to configure anything about process to refresh process’s information when changing the config?

     

    Any one can help me?

     

    Thanks and best regards.

    Daniel Delgado,