LACP problem between Extreme network stack and Juniper VCP.

  • 0
  • 1
  • Problem
  • Updated 2 years ago
  • Not a Problem

Hi,


We are currently facing a problem with a port sharing configuration.
On one end we have a X670 stack (2 switches) and on the other end we have two
Juniper switches in a virtual chassis (VCP).

The config is pretty straight forward. Extreme:
enable sharing 1:37 grouping 1:37,2:37 algorithm address-based L3_L4 lacp

Juniper:
xe-0/1/0 {

description "008";

ether-options {

802.3ad ae2;

}

}

xe-2/1/0 {

description "009";

ether-options {

802.3ad ae2;

}

}

ae2 {

description "to Mica IT";

aggregated-ether-options {

lacp {

active;

periodic slow;

}

}

The problem with this configuration is that port 2:37 (connected to port xe-0/1/0) frequently drops out of the LAG without the interface going down:

12/12/2016 14:59:51.01 <Info:LACP.RemPortFromAggr> Slot-1: Remove port 2:37 from aggregator
12/12/2016 14:59:51.00 <Info:vlan.dbg.info> Slot-1: Port 2:37 is Down, remove from aggregator 1:37
12/12/2016 14:59:51.00 <Info:vlan.msgs.portLinkStateDown> Slot-1: Port 2:37 link down - remote fault
12/12/2016 14:59:26.39 <Info:LACP.AddPortToAggr> Slot-1: Add port 2:37 to aggregator
12/12/2016 14:58:59.20 <Info:LACP.RemPortFromAggr> Slot-1: Remove port 2:37 from aggregator
12/12/2016 14:57:26.14 <Info:LACP.AddPortToAggr> Slot-1: Add port 2:37 to aggregator

The things i’ve tried to fix this problem:
- Created a new LAG (ports 1:36 and 2:36) > no difference.
- Swapped the optics > no difference the problem still persists on port 2:37.
- Swapped fibers > no difference the problem still persists on port 2:37 but problem to the other port (xe-2/1/0) on the Juniper side.

The problem always seem to be on the "second" port on the Extreme stack.
Does anybody has any idea what could go wrong in the scenario?

Photo of dilu

dilu

  • 224 Points 100 badge 2x thumb

Posted 2 years ago

  • 0
  • 1
Photo of Steven Lin

Steven Lin, Employee

  • 2,286 Points 2k badge 2x thumb
Hi dilu,
Could you please provide the "show ports 2:37 information detail" and "debug hal show optic-info port 2:37 ddmi"output?
Photo of dilu

dilu

  • 224 Points 100 badge 2x thumb

Hi,

Offcourse:


# debug hal show optic-info ddmi slot 2 port 37

Port                       37
SFP or SFP+:               SFP+
Signal:                    present
TX Fault:                  no
SFP/SFP+ Vendor:           FLEXOPTIX
SFP/SFP+ Part Number:      P.1396.10
SFP/SFP+ Serial Number:    F820KJQ
SFP/SFP+ Manufacture Date: 160719
SFP/SFP+ Type:             SFP/SFP+
Connector:                 LC
Type:                      LR
Supported:                 yes
Wavelength:                1310

GBIC supports DDMI.  MonitorType: 68
Optic is Internally Calibrated
Temperature High Alarm        :  95 C (0x5f00)
Temperature High Warning      :  90 C (0x5a00)
Temperature                   :  24 C (0x1846)
Temperature Low Warning       :  -20 C (0xffffec00)
Temperature Low Alarm         :  -25 C (0xffffe700)
Temperature Status            :  Normal
Voltage High Alarm            :  3800 V (0x9470)
Voltage High Warning          :  3700 V (0x9088)
Voltage                       :  3306 V (0x812c)
Voltage Low Warning           :  2900 V (0x7148)
Voltage Low Alarm             :  2800 V (0x6d60)
Voltage Status                :  Normal
Tx Bias High Alarm            :  90000 uA (0xafc8)
Tx Bias High Warning          :  80000 uA (0x9c40)
Tx Bias                       :  28832 uA (0x3850)
Tx Bias Low Warning           :  3000 uA (0x5dc)
Tx Bias Low Alarm             :  2000 uA (0x3e8)
Tx Bias Status                :  Normal
Tx Power High Alarm           :  1778 uW (0x4577)
Tx Power High Warning         :  1412 uW (0x372d)
Tx Power                      :  559 uW (0x15df)
Tx Power Low Warning          :  251 uW (0x9d0)
Tx Power Low Alarm            :  199 uW (0x7cb)
Tx Power Status               :  Normal
Rx Power High Alarm           :  1258 uW (0x312d)
Rx Power High Warning         :  1122 uW (0x2bd4)
Rx Power                      :  641 uW (0x190e)
Rx Power Low Warning          :  31 uW (0x13c)
Rx Power Low Alarm            :  25 uW (0xfb)
Rx Power Status               :  Normal
Temperature High Alarm Int   :  0
Temperature Low Alarm Int    :  0
Temperature High Warning Int :  0
Temperature Low Warning Int  :  0
Tx Bias High Alarm Int       :  0
Tx Bias Low Alarm Int        :  0
Tx Bias High Warning Int     :  0
Tx Bias Low Warning Int      :  0
Tx Power High Alarm Int      :  0
Tx Power Low Alarm Int       :  0
Tx Power High Warning Int    :  0
Tx Power Low Warning Int     :  0
Rx Power High Alarm Int      :  0
Rx Power Low Alarm Int       :  0
Rx Power High Warning Int    :  0
Rx Power Low Warning Int     :  0
exCalRxPower [0.000000] [0.000000] [0.000000] [1.000000] [0.000000]
exCalTx_Islope 256 exCalTx_Ioffset 256
exCalTx_PWRslope 256 exCalTx_PWRoffset 0
exCalTempSlope 256 exCalTempOffset 0
exCalTempSlope 256 exCalTempOffset 0
aux1 0 aux2 0
status 0x10



# show ports 2:37 information detail

Port:   2:37
        Virtual-router: VR-Default
        Type:           SF+_LR
        Random Early drop:      Unsupported
        Admin state:    Enabled with  10G full-duplex
        Link State:     Active, 10Gbps, full-duplex
        Link Ups:       9        Last: Mon Dec 12 15:51:00 2016
        Link Downs:     8        Last: Mon Dec 12 14:59:51 2016

        VLAN cfg:
                 Name: VLAN39, 802.1Q Tag = 39, MAC-limit = No-limit, Virtual router:   VR-Default
                       Port-specific VLAN ID: 2007
                 Name: VLAN4000, 802.1Q Tag = 4000, MAC-limit = No-limit, Virtual router:   VR-Default
                       Port-specific VLAN ID:  317,  318,  319,  320, 3001, 3002, 3003, 3004,
                                              3012, 3013, 3026, 3031
        STP cfg:

        Protocol:
        Trunking:       Cfg master port is 1:37

        EDP:            Enabled

        ELSM:           Disabled
        Ethernet OAM:           Disabled
        Learning:       Enabled
        Unicast Flooding:       Enabled
        Multicast Flooding:     Enabled
        Broadcast Flooding:     Enabled
        Jumbo:          Disabled
        Flow Control:   Rx-Pause: Enabled       Tx-Pause: Disabled
        Priority Flow Control: Disabled
        Reflective Relay:       Disabled
        Link up/down SNMP trap filter setting:  Enabled
        Egress Port Rate:       No-limit
        Broadcast Rate:         No-limit
        Multicast Rate:         No-limit
        Unknown Dest Mac Rate:  No-limit
        QoS Profile:    None configured
        Ingress Rate Shaping :          Unsupported
        Ingress IPTOS Examination:      Disabled
        Ingress 802.1p Examination:     Enabled
        Ingress 802.1p Inner Exam:      Disabled
        Ingress 802.1p Priority:        0
        Egress IPTOS Replacement:       Disabled
        Egress 802.1p Replacement:      Disabled
        NetLogin:                       Disabled
        NetLogin port mode:             Port based VLANs
        Smart redundancy:               Enabled
        Software redundant port:        Disabled
        IPFIX:   Disabled               Metering:  Ingress, All Packets, All Traffic
                IPv4 Flow Key Mask:     SIP: 255.255.255.255            DIP: 255.255.255.255
                IPv6 Flow Key Mask:     SIP: ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff
                                        DIP: ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff

        Far-End-Fault-Indication:       Disabled
        Shared packet buffer:           default
        VMAN CEP egress filtering:      Disabled
        Isolation:                      Off
        PTP Configured:                 Disabled
        Time-Stamping Mode:             None
        Synchronous Ethernet:           Unsupported
        Dynamic VLAN Uplink:            Disabled
        VM Tracking Dynamic VLANs:      Disabled


Photo of Steven Lin

Steven Lin, Employee

  • 2,286 Points 2k badge 2x thumb
Hi Dilu,
I suggest to replace SFP if you have spare part to see whether resolve the issue.
Below is the log explain of  "link down - remote fault"

https://gtacknowledge.extremenetworks.com/articles/Q_A/what-is-the-difference-between-local-fault-an...
Photo of Henrique

Henrique, Employee

  • 10,302 Points 10k badge 2x thumb
Hi Dilu, 

Could you please share the outputs for "show lacp counters"?

Did you try to change the Extreme side to passive mode since Juniper side is configured as active?

To change the LACP activity-mode:

                 configure sharing port lacp activity-mode passive

Default is active mode.

Any chance to create a lag using just 1 port (port 2:37) and check if the same issue occurs?
Photo of dilu

dilu

  • 224 Points 100 badge 2x thumb

Hi Guys,

A little update, things got really strange .
What i did:
- Removed the sharing config on port 1:37.
- Removed all vlans on port 1:37.
- Added the sharing config again (same config).
- Added the vlans again (same vlan).

After this point everything was running stable so i tried to reboot the stack
to see if everything would run smoothly after a reboot, after that the problems
started again this time on all sharing ports (2:42 and 2:37):

<Info:LACP.AddPortToAggr> Slot-1: Add port 2:37 to aggregator
<Info:LACP.AddPortToAggr> Slot-1: Add port 2:42 to aggregator
<Info:LACP.RemPortFromAggr> Slot-1: Remove port 2:37 from aggregator
<Info:LACP.RemPortFromAggr> Slot-1: Remove port 2:42 from aggregator
<Info:LACP.AddPortToAggr> Slot-1: Add port 2:37 to aggregator
<Info:LACP.AddPortToAggr> Slot-1: Add port 2:42 to aggregator

The config is still the same and 1:42,2:42 was running stable for 7 months:
enable sharing 1:42 grouping 1:42,2:42 algorithm address-based L3_L4 lacp
enable sharing 1:37 grouping 1:37,2:37 algorithm address-based L3_L4 lacp

So at this point is did the same “trick” by removing VLANS, and ports from sharing
ports ect until it was stable again (and still is after 2 weeks).

While this is not a perfect solution things were stable for a while, until yesterday.
We had a change in our OSPF (broadcast) environment (not Extreme OSPF) after that we
encountered very strange behaviour. After a lot of troubleshooting we eventually narrowed
down the problem to the same stack which had problems before with LACP.
When i disabled port 2:42 everything immediately came back to life, when i did the
opposite (enabling 2:42 and disabling 1:42) things again did not work.
So i am starting to second guess the second node in the stack, does anybody has
an idea why this is all happening?

This is the network layout (stack 1 and 2 are MLAG peers, stack 3 is connected through mlag).


Photo of Steven Lin

Steven Lin, Employee

  • 2,286 Points 2k badge 2x thumb
Hello dilu,
Please open a case to GTAC with the detail problem description, you will get a better support and analysis, below is GTAC user guide you could find how to open cases from here.
http://extrcdn.extremenetworks.com/wp-content/uploads/2015/02/GTAC-Users-Guide_v6.pdf
Photo of Aleixo Gomes

Aleixo Gomes, Employee

  • 334 Points 250 badge 2x thumb
Hi Dilu , 

case created 01269755.
please provide requested logs, 
Photo of Stachal

Stachal

  • 380 Points 250 badge 2x thumb
This may not be the issue but something to check I only say this because I have seen this recently and it was the same very Odd behavior.    LR Optics so your using single mode 1310nm... verify the fiber and patches are all Single mode(generally yellow not aqua)..  You will get link running single mode over multimode fiber however you will have very unpredictable behavior.   Just ran into an issue where the patch panel of fiber was mislabeled single mode when in fact it was multimode and everyone was scratching their heads as to why it would randomly drop. 

Clean your fiber before you terminate it always.. single mode is more susceptible to dust.  Also if you have OTDR test out the fiber to verify loss/bends/kinks.    And one more thing to check is using single mode LR optics if the distance is short you may have to put a fiber optic attenuator in place to reduce reflection depending on the light level readings.  
Photo of Stachal

Stachal

  • 380 Points 250 badge 2x thumb
What do the logs indicate on the Juniper? can you post those as well.
Photo of Tripathy, Priya Ranjan

Tripathy, Priya Ranjan, ESE

  • 2,306 Points 2k badge 2x thumb
Dilu,

I could see already the ospf and lACP issues are dealt with the opened case with GTAC and assisted by my colleague Aleixo Gomes. Whereas the ospf issue found to be not with the extreme device though. Let us follow up the LACP issue with that opened case itself. But if you need here the broader audience to address the juniper site logs then as suggested by Stachal please do provide them here.