Header Only - DO NOT REMOVE - Extreme Networks
Question

LACP problem between Extreme network stack and Juniper VCP.


Hi,

We are currently facing a problem with a port sharing configuration.
On one end we have a X670 stack (2 switches) and on the other end we have two
Juniper switches in a virtual chassis (VCP).

The config is pretty straight forward. Extreme:
enable sharing 1:37 grouping 1:37,2:37 algorithm address-based L3_L4 lacp

Juniper:
xe-0/1/0 {

description "008";

ether-options {

802.3ad ae2;

}

}

xe-2/1/0 {

description "009";

ether-options {

802.3ad ae2;

}

}

ae2 {

description "to Mica IT";

aggregated-ether-options {

lacp {

active;

periodic slow;

}

}

The problem with this configuration is that port 2:37 (connected to port xe-0/1/0) frequently drops out of the LAG without the interface going down:

12/12/2016 14:59:51.01 [i] Slot-1: Remove port 2:37 from aggregator
12/12/2016 14:59:51.00 [i] Slot-1: Port 2:37 is Down, remove from aggregator 1:37
12/12/2016 14:59:51.00 [i] Slot-1: Port 2:37 link down - remote fault
12/12/2016 14:59:26.39 [i] Slot-1: Add port 2:37 to aggregator
12/12/2016 14:58:59.20 [i] Slot-1: Remove port 2:37 from aggregator
12/12/2016 14:57:26.14 [i] Slot-1: Add port 2:37 to aggregator

The things i’ve tried to fix this problem:
- Created a new LAG (ports 1:36 and 2:36) > no difference.
- Swapped the optics > no difference the problem still persists on port 2:37.
- Swapped fibers > no difference the problem still persists on port 2:37 but problem to the other port (xe-2/1/0) on the Juniper side.

The problem always seem to be on the "second" port on the Extreme stack.
Does anybody has any idea what could go wrong in the scenario?

10 replies

Userlevel 4
Hi dilu,
Could you please provide the "show ports 2:37 information detail" and "debug hal show optic-info port 2:37 ddmi"output?
Hi,

Offcourse:

# debug hal show optic-info ddmi slot 2 port 37

Port 37
SFP or SFP+: SFP+
Signal: present
TX Fault: no
SFP/SFP+ Vendor: FLEXOPTIX
SFP/SFP+ Part Number: P.1396.10
SFP/SFP+ Serial Number: F820KJQ
SFP/SFP+ Manufacture Date: 160719
SFP/SFP+ Type: SFP/SFP+
Connector: LC
Type: LR
Supported: yes
Wavelength: 1310

GBIC supports DDMI. MonitorType: 68
Optic is Internally Calibrated
Temperature High Alarm : 95 C (0x5f00)
Temperature High Warning : 90 C (0x5a00)
Temperature : 24 C (0x1846)
Temperature Low Warning : -20 C (0xffffec00)
Temperature Low Alarm : -25 C (0xffffe700)
Temperature Status : Normal
Voltage High Alarm : 3800 V (0x9470)
Voltage High Warning : 3700 V (0x9088)
Voltage : 3306 V (0x812c)
Voltage Low Warning : 2900 V (0x7148)
Voltage Low Alarm : 2800 V (0x6d60)
Voltage Status : Normal
Tx Bias High Alarm : 90000 uA (0xafc8)
Tx Bias High Warning : 80000 uA (0x9c40)
Tx Bias : 28832 uA (0x3850)
Tx Bias Low Warning : 3000 uA (0x5dc)
Tx Bias Low Alarm : 2000 uA (0x3e8)
Tx Bias Status : Normal
Tx Power High Alarm : 1778 uW (0x4577)
Tx Power High Warning : 1412 uW (0x372d)
Tx Power : 559 uW (0x15df)
Tx Power Low Warning : 251 uW (0x9d0)
Tx Power Low Alarm : 199 uW (0x7cb)
Tx Power Status : Normal
Rx Power High Alarm : 1258 uW (0x312d)
Rx Power High Warning : 1122 uW (0x2bd4)
Rx Power : 641 uW (0x190e)
Rx Power Low Warning : 31 uW (0x13c)
Rx Power Low Alarm : 25 uW (0xfb)
Rx Power Status : Normal
Temperature High Alarm Int : 0
Temperature Low Alarm Int : 0
Temperature High Warning Int : 0
Temperature Low Warning Int : 0
Tx Bias High Alarm Int : 0
Tx Bias Low Alarm Int : 0
Tx Bias High Warning Int : 0
Tx Bias Low Warning Int : 0
Tx Power High Alarm Int : 0
Tx Power Low Alarm Int : 0
Tx Power High Warning Int : 0
Tx Power Low Warning Int : 0
Rx Power High Alarm Int : 0
Rx Power Low Alarm Int : 0
Rx Power High Warning Int : 0
Rx Power Low Warning Int : 0
exCalRxPower [0.000000] [0.000000] [0.000000] [1.000000] [0.000000]
exCalTx_Islope 256 exCalTx_Ioffset 256
exCalTx_PWRslope 256 exCalTx_PWRoffset 0
exCalTempSlope 256 exCalTempOffset 0
exCalTempSlope 256 exCalTempOffset 0
aux1 0 aux2 0
status 0x10

# show ports 2:37 information detail

Port: 2:37
Virtual-router: VR-Default
Type: SF+_LR
Random Early drop: Unsupported
Admin state: Enabled with 10G full-duplex
Link State: Active, 10Gbps, full-duplex
Link Ups: 9 Last: Mon Dec 12 15:51:00 2016
Link Downs: 8 Last: Mon Dec 12 14:59:51 2016

VLAN cfg:
Name: VLAN39, 802.1Q Tag = 39, MAC-limit = No-limit, Virtual router: VR-Default
Port-specific VLAN ID: 2007
Name: VLAN4000, 802.1Q Tag = 4000, MAC-limit = No-limit, Virtual router: VR-Default
Port-specific VLAN ID: 317, 318, 319, 320, 3001, 3002, 3003, 3004,
3012, 3013, 3026, 3031
STP cfg:

Protocol:
Trunking: Cfg master port is 1:37

EDP: Enabled

ELSM: Disabled
Ethernet OAM: Disabled
Learning: Enabled
Unicast Flooding: Enabled
Multicast Flooding: Enabled
Broadcast Flooding: Enabled
Jumbo: Disabled
Flow Control: Rx-Pause: Enabled Tx-Pause: Disabled
Priority Flow Control: Disabled
Reflective Relay: Disabled
Link up/down SNMP trap filter setting: Enabled
Egress Port Rate: No-limit
Broadcast Rate: No-limit
Multicast Rate: No-limit
Unknown Dest Mac Rate: No-limit
QoS Profile: None configured
Ingress Rate Shaping : Unsupported
Ingress IPTOS Examination: Disabled
Ingress 802.1p Examination: Enabled
Ingress 802.1p Inner Exam: Disabled
Ingress 802.1p Priority: 0
Egress IPTOS Replacement: Disabled
Egress 802.1p Replacement: Disabled
NetLogin: Disabled
NetLogin port mode: Port based VLANs
Smart redundancy: Enabled
Software redundant port: Disabled
IPFIX: Disabled Metering: Ingress, All Packets, All Traffic
IPv4 Flow Key Mask: SIP: 255.255.255.255 DIP: 255.255.255.255
IPv6 Flow Key Mask: SIP: ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff
DIP: ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff

Far-End-Fault-Indication: Disabled
Shared packet buffer: default
VMAN CEP egress filtering: Disabled
Isolation: Off
PTP Configured: Disabled
Time-Stamping Mode: None
Synchronous Ethernet: Unsupported
Dynamic VLAN Uplink: Disabled
VM Tracking Dynamic VLANs: Disabled
Userlevel 4
Hi Dilu,
I suggest to replace SFP if you have spare part to see whether resolve the issue.
Below is the log explain of "link down - remote fault"

https://gtacknowledge.extremenetworks.com/articles/Q_A/what-is-the-difference-between-local-fault-an...
Userlevel 6
Hi Dilu,

Could you please share the outputs for "show lacp counters"?

Did you try to change the Extreme side to passive mode since Juniper side is configured as active?

To change the LACP activity-mode:

configure sharing port lacp activity-mode passive

Default is active mode.

Any chance to create a lag using just 1 port (port 2:37) and check if the same issue occurs?
Hi Guys,

A little update, things got really strange .
What i did:
- Removed the sharing config on port 1:37.
- Removed all vlans on port 1:37.
- Added the sharing config again (same config).
- Added the vlans again (same vlan).

After this point everything was running stable so i tried to reboot the stack
to see if everything would run smoothly after a reboot, after that the problems
started again this time on all sharing ports (2:42 and 2:37):

[i] Slot-1: Add port 2:37 to aggregator
[i] Slot-1: Add port 2:42 to aggregator
[i] Slot-1: Remove port 2:37 from aggregator
[i] Slot-1: Remove port 2:42 from aggregator
[i] Slot-1: Add port 2:37 to aggregator
[i] Slot-1: Add port 2:42 to aggregator

The config is still the same and 1:42,2:42 was running stable for 7 months:
enable sharing 1:42 grouping 1:42,2:42 algorithm address-based L3_L4 lacp
enable sharing 1:37 grouping 1:37,2:37 algorithm address-based L3_L4 lacp

So at this point is did the same “trick” by removing VLANS, and ports from sharing
ports ect until it was stable again (and still is after 2 weeks).

While this is not a perfect solution things were stable for a while, until yesterday.
We had a change in our OSPF (broadcast) environment (not Extreme OSPF) after that we
encountered very strange behaviour. After a lot of troubleshooting we eventually narrowed
down the problem to the same stack which had problems before with LACP.
When i disabled port 2:42 everything immediately came back to life, when i did the
opposite (enabling 2:42 and disabling 1:42) things again did not work.
So i am starting to second guess the second node in the stack, does anybody has
an idea why this is all happening?

This is the network layout (stack 1 and 2 are MLAG peers, stack 3 is connected through mlag).

Userlevel 4
Hello dilu,
Please open a case to GTAC with the detail problem description, you will get a better support and analysis, below is GTAC user guide you could find how to open cases from here.
http://extrcdn.extremenetworks.com/wp-content/uploads/2015/02/GTAC-Users-Guide_v6.pdf
Userlevel 1
Hi Dilu ,

case created 01269755.
please provide requested logs,
Userlevel 2
This may not be the issue but something to check I only say this because I have seen this recently and it was the same very Odd behavior. LR Optics so your using single mode 1310nm... verify the fiber and patches are all Single mode(generally yellow not aqua).. You will get link running single mode over multimode fiber however you will have very unpredictable behavior. Just ran into an issue where the patch panel of fiber was mislabeled single mode when in fact it was multimode and everyone was scratching their heads as to why it would randomly drop.

Clean your fiber before you terminate it always.. single mode is more susceptible to dust. Also if you have OTDR test out the fiber to verify loss/bends/kinks. And one more thing to check is using single mode LR optics if the distance is short you may have to put a fiber optic attenuator in place to reduce reflection depending on the light level readings.
Userlevel 2
What do the logs indicate on the Juniper? can you post those as well.
Userlevel 4
Dilu,

I could see already the ospf and lACP issues are dealt with the opened case with GTAC and assisted by my colleague Aleixo Gomes. Whereas the ospf issue found to be not with the extreme device though. Let us follow up the LACP issue with that opened case itself. But if you need here the broader audience to address the juniper site logs then as suggested by Stachal please do provide them here.

Reply