Problems with port Sharings X670


Userlevel 2
Hello,

There's something odd happenings here, and more than that was the solution of it.

We've a sharing with a Juniper MX80, and there're two 10G ports connect on, the 47 and 48 (Extreme).
After issuing the command:

configure sharing 47 add ports 45,46

The hashings seens to not work properly.
The traffic droped significantly, the two added ports (45,46) seems to be more affected than the 47,48.

Then comes the odd "solution", after disabling the sharing and adding again with the master becoming port 45, the problem was solved.

It's a problem to be checked, because it's the second time that I'm having the same problem.
I'm using the version 15.6.3.1 v1563b1-patch1-5.
X670.

17 replies

Userlevel 3
Hi,

you mean traffic from X670 to MX80 OR from MX80 to X670 ?
What hash algo. do you use ?

--
Jarek
Userlevel 2
Jarek wrote:

Hi,

you mean traffic from X670 to MX80 OR from MX80 to X670 ?
What hash algo. do you use ?

--
Jarek

From MX to X670.
I'm using custom and I did tried all of them.
I know it's strange because the hash is due to TX traffic.
We tried to do some changes in Juniper but nothing could solve this.
Only the disable/enable sharing solved.
Userlevel 3
Jarek wrote:

Hi,

you mean traffic from X670 to MX80 OR from MX80 to X670 ?
What hash algo. do you use ?

--
Jarek

Juniper MX 80 has a Trio chipset and can only balance per-flow not per packet.
If you have high traffic flow on one port it will stay on that port.
Userlevel 2
Jarek wrote:

Hi,

you mean traffic from X670 to MX80 OR from MX80 to X670 ?
What hash algo. do you use ?

--
Jarek

Well but why, after disabling and enabling the extreme sharing the traffic flows well?
And packet loss stoped
Userlevel 3
Jarek wrote:

Hi,

you mean traffic from X670 to MX80 OR from MX80 to X670 ?
What hash algo. do you use ?

--
Jarek

I think the flows (active connections) are dropped because of sharing reconfiguration and all new are well balanced. Have you try add a port and wait about 10-15 min,
and then check the ports utilization?
Userlevel 3
Jarek wrote:

Hi,

you mean traffic from X670 to MX80 OR from MX80 to X670 ?
What hash algo. do you use ?

--
Jarek

Where do you observe packet loss ? On X670 or Juniper ?
Userlevel 2
Jarek wrote:

Hi,

you mean traffic from X670 to MX80 OR from MX80 to X670 ?
What hash algo. do you use ?

--
Jarek

It's hard to say,the counters are not showing the drop. But BGP sessions are dropped and some packet loss (ping) are noted.
Userlevel 3
Jarek wrote:

Hi,

you mean traffic from X670 to MX80 OR from MX80 to X670 ?
What hash algo. do you use ?

--
Jarek

BGP session probably are dropped because of of sharing reconfiguration.

Have you try add a port and wait about 10-15 min,
and then check the ports utilization?

When do you observe packet loss ?
After you add a new port or after change the master port for LAG ?
--
Jarek
Userlevel 2
Jarek wrote:

Hi,

you mean traffic from X670 to MX80 OR from MX80 to X670 ?
What hash algo. do you use ?

--
Jarek

Hello Jarek,
We tried for 5 minutes, we couldn't perform more than that.
I observe after adding the two ports to the LAG.
Userlevel 3
Jarek wrote:

Hi,

you mean traffic from X670 to MX80 OR from MX80 to X670 ?
What hash algo. do you use ?

--
Jarek

You send ping to Juniper interface ?
If yes, have you try to ping something that is after the Juniper ?
Maybe cpu on Juniper is busy...

If I understand correctly, you have two port in LAG 47 and 48. 47 is master port.
When you add 45 and 46 to this LAG, you see problems with traffic.
Do you change the hash algorithm ?
Do you use L3_L4 hash or custom hash ?I ask because bellow you have pasted show's with two of them.

--
Jarek
Userlevel 2
Jarek wrote:

Hi,

you mean traffic from X670 to MX80 OR from MX80 to X670 ?
What hash algo. do you use ?

--
Jarek

Hello Jarek,
Yes, we sent the ping to directly to customers.
No High cpu was found on Juniper.

Thats right, when add the ports 45 and 46 I see the packet loss.
Yes I tried all of them, even fixed and customs.
Userlevel 3
Jarek wrote:

Hi,

you mean traffic from X670 to MX80 OR from MX80 to X670 ?
What hash algo. do you use ?

--
Jarek

Any info in logs on X670 ?
Did you check congestion counter on ports?
What shows: debug hal show congestion (hit a few times)

Did you check CPU utilization on X670 after you add a port to the LAG?

--
Jarek
Userlevel 2
Jarek wrote:

Hi,

you mean traffic from X670 to MX80 OR from MX80 to X670 ?
What hash algo. do you use ?

--
Jarek

No infos.
Yes, I did check that, but nothing incremented.
I don't have the problem now, because after re-creating the lag it stoped.
No, I haven't see CPU usage.
Userlevel 6
Can you send the "show sharing" output and explain how you are determining the traffic is being affected?
Userlevel 2
Patrick Voss wrote:

Can you send the "show sharing" output and explain how you are determining the traffic is being affected?

Load Sharing Monitor
Config Current Agg Ld Share Ld Share Agg Link Link Up
Master Master Control Algorithm Group Mbr State Transitions
==============================================================================
8 8 LACP L3_L4 8 Y A 3
L3_L4 9 Y A 3
27 27 LACP L3_L4 27 Y A 1
L3_L4 28 Y A 1
29 29 LACP L3_L4 29 Y A 1
L3_L4 30 Y A 1
L3_L4 31 Y A 1
47 45 LACP L3_L4 45 Y A 1
L3_L4 46 - R 0
L3_L4 47 Y A 2
L3_L4 48 Y A 2
==============================================================================

Below I enabled just one port to see if the problems happens.
After that the traffic became this:

Port Link Link Rx Peak Rx Tx Peak Tx State Speed % bandwidth % bandwidth % bandwidth % bandwidth
================================================================================
45 A 10000 1.59 2.96 0.99 1.09
46 R 0 0.00 0.00 0.00 0.00
47 A 10000 26.33 38.23 34.27 50.37
48 A 10000 24.27 40.51 43.32 63.29

Port 45 has a much less traffic then the others ports.
We have about 5 lacp on this switch with the Juniper, none had this kind of problem.

The only behaviour was about using predecessors ports of the master sharing port.
As I said before, after disabling/enabling the sharing and set the port 45 as Master the traffic became ok.

Here's the output now:

X670 # sh sharingLoad Sharing Monitor
Config Current Agg Ld Share Ld Share Agg Link Link Up
Master Master Control Algorithm Group Mbr State Transitions
==============================================================================
8 8 LACP L3_L4 8 Y A 4
L3_L4 9 Y A 3
27 27 LACP L3_L4 27 Y A 1
L3_L4 28 Y A 1
29 29 LACP L3_L4 29 Y A 1
L3_L4 30 Y A 1
L3_L4 31 Y A 1
45 45 LACP custom 45 Y A 2
custom 46 Y A 2
custom 47 Y A 2
custom 48 Y A 2
==============================================================================
Link State: A-Active, D-Disabled, R-Ready, NP-Port not present, L-Loopback
Load Sharing Algorithm: (L2) Layer 2 address based, (L3) Layer 3 address based
(L3_L4) Layer 3 address and Layer 4 port based
(custom) User-selected address-based configuration
Custom Algorithm Configuration: ipv4 L3-and-L4, crc-32 upper
Number of load sharing trunks: 4

X670 # sh ports 45-48 utilization bandwidth
Port Link Link Rx Peak Rx Tx Peak Tx
State Speed % bandwidth % bandwidth % bandwidth % bandwidth
================================================================================
45 A 10000 15.95 15.95 44.13 44.13
46 A 10000 15.75 15.75 38.12 38.12
47 A 10000 15.87 38.23 4.10 51.62
48 A 10000 15.99 40.51 5.93 63.29
================================================================================
> indicates Port Display Name truncated past 8 characters
Link State: A-Active, R-Ready, NP-Port Not Present, L-Loopback
Userlevel 7
Patrick Voss wrote:

Can you send the "show sharing" output and explain how you are determining the traffic is being affected?

Based on this, it looks like what Jarek said is probably what happened. When the sharing was disabled on the X670, the LACP would drop causing the ports to be removed from the LAG on the MX80 as well.

When it came back up, the existing flows from the MX80 were re-hashed including the new ports.

-Brandon
Userlevel 2
Patrick Voss wrote:

Can you send the "show sharing" output and explain how you are determining the traffic is being affected?

I agree with that, but why when I do the same thing with other switchs this problem doesn't occur?
I mean, the thing is the master port number, if you add some ports that's not the sequential the same behavior appears.
I tested that with another switch and another MX, the MX had other firmware.
But I had the same symptoms

Reply