Port Load sharing

  • 0
  • 1
  • Problem
  • Updated 3 years ago
  • Solved
  • (Edited)
After i enable Lacp .. port is not sharing. Utilization is high on Link 1 only.
I want to use load balance on both links.
Please kindly see the below output and advice thanks.

* I try L2 static, L3_L4 and LACP. all the same ports are not load balance. *

enable sharing 1 grouping 1,4

enable sharing 1 grouping 1,4 L3_L4

enable sharing 1 grouping 1,4 algorithm address-based L3_L4 lacp


CoreSW # sh port 1,4  utilization bandwithPort     Link    Link   Rx             Peak Rx       Tx            Peak Tx
         State   Speed  % bandwidth    % bandwidth   % bandwidth   % bandwidth
================================================================================
Link_1> A       1000     82.06         94.94         48.23           55.72
Link_2> A       1000      0.25          0.30          1.34            1.57
================================================================================
          > indicates Port Display Name truncated past 8 characters
          Link State: A-Active, R-Ready, NP-Port Not Present, L-Loopback

CoreSW #sh sharing 
Load Sharing Monitor
Config    Current    Agg       Ld Share    Ld Share  Agg   Link    Link Up
Master    Master     Control   Algorithm   Group     Mbr   State   Transitions
==============================================================================
     1      1        LACP      L3_L4       1          Y      A        0
                                        L3_L4       4          Y      A        0
==============================================================================
Link State: A-Active, D-Disabled, R-Ready, NP-Port not present, L-Loopback
Load Sharing Algorithm: (L2) Layer 2 address based
                        (L3_L4) Layer 3 address and Layer 4 port based
Number of load sharing trunks: 1


Core2 # sh lacp lag 1
Lag   Actor    Actor  Partner           Partner  Partner Agg   Actor            
      Sys-Pri  Key    MAC               Sys-Pri  Key     Count MAC              
--------------------------------------------------------------------------------
1           0  0x03e9 00:04:96:34:b2:e1       0  0x03e9      2 00:04:96:34:b2:e0

Port list:

Member     Port      Rx           Sel          Mux            Actor     Partner 
Port       Priority  State        Logic        State          Flags     Port    
--------------------------------------------------------------------------------
1          0         Current      Selected     Collect-Dist   A-GSCD--  1001     
4          0         Current      Selected     Collect-Dist   A-GSCD--  1012     
================================================================================
Actor Flags: A-Activity, T-Timeout, G-Aggregation, S-Synchronization
             C-Collecting, D-Distributing, F-Defaulted, E-Expired
Photo of Paul

Paul

  • 2,026 Points 2k badge 2x thumb

Posted 4 years ago

  • 0
  • 1
Photo of Jarek

Jarek

  • 2,398 Points 2k badge 2x thumb
What model do you use in this example ? (Summit X250, 460, BD ? )

LACP is between  two extreme switches or server and 1 extreme switch ? (other vendor ?)

What EXOS is on the switch ?

--
Jarek
Photo of Paul

Paul

  • 2,026 Points 2k badge 2x thumb
Core2 # sh sharing 
Load Sharing Monitor
Config    Current    Agg       Ld Share    Ld Share  Agg   Link    Link Up
Master    Master     Control   Algorithm   Group     Mbr   State   Transitions
==============================================================================
     1      1        LACP      L3_L4       1          Y      A        0
                               L3_L4       4          Y      A        0
==============================================================================
Link State: A-Active, D-Disabled, R-Ready, NP-Port not present, L-Loopback
Load Sharing Algorithm: (L2) Layer 2 address based
                        (L3_L4) Layer 3 address and Layer 4 port based
Number of load sharing trunks: 1


************************************************************************************
************************************************************************************

Core2 # sh lacp 

LACP Up                             : Yes
LACP Enabled                        : Yes
System MAC                          : 00:04:96:34:b2:e0
LACP PDUs dropped on non-LACP ports : 19  

Lag        Actor    Actor   Partner            Partner  Partner  Agg   
           Sys-Pri  Key     MAC                Sys-Pri  Key      Count 
--------------------------------------------------------------------------------
1              0    0x03e9  00:04:96:34:b2:e1      0    0x03e9   2
================================================================================



Core2 # sh lacp lag 1

Lag   Actor    Actor  Partner           Partner  Partner Agg   Actor            
      Sys-Pri  Key    MAC               Sys-Pri  Key     Count MAC              
--------------------------------------------------------------------------------
1           0  0x03e9 00:04:96:34:b2:e1       0  0x03e9      2 00:04:96:34:b2:e0

Port list:

Member     Port      Rx           Sel          Mux            Actor     Partner 
Port       Priority  State        Logic        State          Flags     Port    
--------------------------------------------------------------------------------
1          0         Current      Selected     Collect-Dist   A-GSCD--  1001     
4          0         Current      Selected     Collect-Dist   A-GSCD--  1012     
================================================================================
Actor Flags: A-Activity, T-Timeout, G-Aggregation, S-Synchronization
             C-Collecting, D-Distributing, F-Defaulted, E-Expired

************************************************************************************
************************************************************************************

Core2 # sh lacp lag 1 detail 

Lag   Actor    Actor  Partner           Partner  Partner Agg   Actor            
      Sys-Pri  Key    MAC               Sys-Pri  Key     Count MAC              
--------------------------------------------------------------------------------
1           0  0x03e9 00:04:96:34:b2:e1       0  0x03e9      2 00:04:96:34:b2:e0

Up               : Yes
Enabled          : Yes
Unack count      : 0
Wait-for-count   : 0
Current timeout  : Long
Activity mode    : Active
Defaulted Action : Delete
Receive state    : Enabled
Transmit state   : Enabled
Selected count   : 2
Standby count    : 0
LAG Id flag      : Yes
  S.pri:0   , S.id:00:04:96:34:b2:e0, K:0x03e9
  T.pri:0   , T.id:00:04:96:34:b2:e1, L:0x03e9

Port list:

Member     Port      Rx           Sel          Mux            Actor     Partner 
Port       Priority  State        Logic        State          Flags     Port    
--------------------------------------------------------------------------------
1          0         Current      Selected     Collect-Dist   A-GSCD--  1001     
4          0         Current      Selected     Collect-Dist   A-GSCD--  1012     
================================================================================
Actor Flags: A-Activity, T-Timeout, G-Aggregation, S-Synchronization
             C-Collecting, D-Distributing, F-Defaulted, E-Expired

************************************************************************************
************************************************************************************
Core1# show ports sharing 
Load Sharing Monitor
Config    Current    Agg       Ld Share    Ld Share  Agg   Link    Link Up
Master    Master     Control   Algorithm   Group     Mbr   State   Transitions
==============================================================================
     1      1        LACP      L3_L4       1          Y      A        0
                               L3_L4       12         Y      A        0
==============================================================================
Link State: A-Active, D-Disabled, R-Ready, NP-Port not present, L-Loopback
Load Sharing Algorithm: (L2) Layer 2 address based
                        (L3_L4) Layer 3 address and Layer 4 port based
Number of load sharing trunks: 1


currently i don't have the log for Core1, I will provide for the Core1 also. Thanks.
Photo of Jarek

Jarek

  • 2,398 Points 2k badge 2x thumb
Paul you have on core2 ports 1,4 in lacp and on core1 ports 1 and 12?
Photo of Paul

Paul

  • 2,026 Points 2k badge 2x thumb
Hi jarek, Yes correct. It make any different please.
Photo of Jarek

Jarek

  • 2,398 Points 2k badge 2x thumb
From show sharing I see that Link Up Transitions in both switches are 0.
Can you show from both:
sh lacp counters
sh configuration | inc shari

I asked about ports, because I had understood that you set on both switches:
enable sharing 1 grouping 1,4 algorithm address-based L3_L4 lacp
Photo of Paul

Paul

  • 2,026 Points 2k badge 2x thumb
Thanks Jarek,

Core 1
enable sharing 1 grouping 1, 12 algorithm address-based L3_L4 lacp

Core 2
sh configuration | inc shari
enable sharing 1 grouping 1, 4 algorithm address-based L3_L4 lacp

will provide the command output tomorrow. thank.
Photo of PARTHIBAN CHINNAYA

PARTHIBAN CHINNAYA, Alum

  • 4,382 Points 4k badge 2x thumb
Try these steps:
If
1.configure sharing address-based custom

check if it load shares

else
configure sharing address-based custom hash-algorithm [xor | crc-16 | crc-32]
try CRC-16----If it doesnt help try CRC-32

One or the other way changing hashing must help.
Photo of Paul

Paul

  • 2,026 Points 2k badge 2x thumb
Hi PARTHIBAN,

I will try tomorrow morning. Thanks a lot for the information. Thanks.
Photo of Paul

Paul

  • 2,026 Points 2k badge 2x thumb
Hi Parthiban,

I try this command and not supported in XOS 15.3.3.5 model X450a-24x.

"configure sharing address-based custom"

Is there any way to change the CRC to customize the hash for X450a-24x? Thanks.

BRgds,
Photo of Drew C.

Drew C., Community Manager

  • 40,858 Points 20k badge 2x thumb
Load sharing is not the same as load balancing.  EXOS does load sharing (link aggregation), but not load balancing.  As Parthiban and Jarek have mentioned, you can adjust the hash algorithm to attempt to better spread the traffic across the links.

I'd like to try to understand your use case.  Is there a problem with the traffic favoring one link over another, or is it just a preference to have it balanced?  What type of traffic is between these two switches?  Using the L3_L4 algorithm, you could see one link more saturated than the other if the IP address and protocol port is the same for a majority of the traffic.  If you use the L2 algorithm, it will be based on the MAC address of the source and destination systems - if they are the same, only one link will be chosen.
(Edited)
Photo of Drew C.

Drew C., Community Manager

  • 40,858 Points 20k badge 2x thumb
Hi Paul,
I found your case number in our system and will let the case owner know about this thread and the urgency.

We're here to help, no worries on being "messy" :)

You could try to add a 3rd link to the port group - depending on how many streams there are, the additional link could make a difference in the way they are hashed.  If you could upgrade the links to 10G (via XGM module for X450a), then this problem would likely clear itself.  Stacking the two switches could also be an option, but would be considerably more difficult to implement since the network is already configured.

-Drew
Photo of Paul

Paul

  • 2,026 Points 2k badge 2x thumb
Hi Drew,

Could you please explain more about upgrade the links to 10G via XGM module for X450a?

current model is 24-port 1000BASE-X SFP plus 4-port 10/100/1000BASE-T.
Anyway can we upgrade to 10G uplink?

BRdgs,
Paul
Photo of Drew C.

Drew C., Community Manager

  • 40,858 Points 20k badge 2x thumb
Hi Paul,
You'll need to purchase a module and optics for installing into the slot on the back of each switch.  For the X450a, I would get the XGM2-2sf.  It looks like this:

Once installed, it's a simple configuration to move over to the new 10G link(s).
Photo of Paul

Paul

  • 2,026 Points 2k badge 2x thumb
Thanks Drew, 

It's very informative and a good solution for users.
Is it 2x 10G port ? 
Since we are now in this stage. 
Please kindly share the step on simple configuration.
We need the down time for this module to installation right. 
(Edited)
Photo of Drew C.

Drew C., Community Manager

  • 40,858 Points 20k badge 2x thumb
Hi Paul,
Yes - that module is 2x 10G SFP+ ports.  You will need to power off each switch to install it.

The configuration is easy to do, but will require looking at the existing VLAN config for the current LAG ports.  I feel like that is outside the scope of the community here and would be best left to GTAC to help.  If you're already planning on purchasing the modules and optics (be sure to get Extreme Networks Certified Optics!), go ahead and open a new case with GTAC to request help moving the configuration from the two LAG ports to the two new ones - that's the best way to prepare.

-Drew
Photo of rbrt_weiler

rbrt_weiler

  • 834 Points 500 badge 2x thumb
To add to Drews comment: LACPs main use case is to add/improve redundancy. That it does some kind of load sharing is a nice benefit, not more. If real load balancing is necessary some kind of L3 routing protocol, for example OSPF, would be a solution. Some routing protocols (if not all; I am no routing expert) offer "equal path load balancing" which actually looks at the link usage and actively distributes the load.
Photo of PARTHIBAN CHINNAYA

PARTHIBAN CHINNAYA, Alum

  • 4,382 Points 4k badge 2x thumb
As far as I know .Multicast traffic will not be load shared in a lag .
It will always take one link in a lag.There were known PD's
may be this feature should have been implemented in new exos release but not till 15.3 I guess
Photo of Drew C.

Drew C., Community Manager

  • 38,546 Points 20k badge 2x thumb
This statement is misleading.  Multicast can be hashed and load shared in a lag, but because much of the traffic is the same source and destination, a single stream can "stick" to one link.  Different streams should hash to different links, unless some aspect of the traffic puts them together due to hashing.
Photo of PARTHIBAN CHINNAYA

PARTHIBAN CHINNAYA, Alum

  • 4,382 Points 4k badge 2x thumb
I am 100% sure there was a PD which said multicast traffic will not load share in a lag.

In 15.6 Concepts guide it is documented that this is supported 

Link Aggregation AlgorithmsSummitStack supports address-based load sharing. (This platform does not support port-based load
sharing.)
The following are the types of traffic to which addressed-based algorithms apply and the traffic
components used to select egress links:
• Layer 2 frames and non-IP traffic—The source and destination MAC addresses.
• IPv4 and IPv6 packets
• L2 algorithm—Layer 2 source and destination MAC addresses. Available on SummitStack and all
Summit family switches.
• Broadcast, multicast, and unknown unicast packets (not configurable)—Depends on traffic type:
• IPv4 and IPv6 packets—The source and destination IP addresses.
• Non-IP traffic—The source and destination MAC addresses.
You can control the field examined by the switch for address-based load sharing when the load-sharing
group is created by using the following command:
Photo of Drew C.

Drew C., Community Manager

  • 38,546 Points 20k badge 2x thumb
I can't find it at the moment, but that doesn't mean it doesn't exist...

Since this is an X450a, the last supported version is 15.3.x.  Paul, I would suggest first making sure you're on the most recent software for your cores.  At the moment, that is EXOS v15.3.4.6-patch1-8.  If there's a software bug, it will hopefully be resolved by updating.  TAC will need to manage the case beyond that.
Photo of Paul

Paul

  • 2,026 Points 2k badge 2x thumb
Hi Drew,

We are not able to purchase a module XGM2-2sf for the X450a. They said it's EOL. Any idea for this where to order please. 

Thanks.
Photo of Drew C.

Drew C., Community Manager

  • 38,546 Points 20k badge 2x thumb
Hi Paul,
There are other outside outlets for purchasing older gear that you can find by doing some Googling.  You should be able to find an option in your region that carries this part and the optics.  If not, your local Extreme Networks sales team may have some other recommendations for upgrade.  Sorry I can't be of further help on this question right now.
Photo of Venko Velev

Venko Velev

  • 70 Points
Hello guys,

i can share some experience regarding this case. I have made a sharing group with 6x10G ports between a stack topology with 4 nodes and a X670 switch. The problem was that the first and the second ports were going at peak time up to flat 10G and the other ports at that time were loaded at 5G for example. My XOS version is 15.3.2.11.
In my case the solution was :
1. change the custom address-based algorithm from XOR to CRC-16
2. disable the sharing group and creating new one with address-algorithm custom

Output from the device:


configure sharing address-based custom hash-algorithm crc-16
disable sharing 30
enable sharing 30 grouping 30-35 algorithm address-based custom lacp

Hope this was useful !

Best regards,

Venko Velev
Photo of Manish S

Manish S

  • 224 Points 100 badge 2x thumb
@ Venko Velev

So I assume that after you did the mentioned changes you were able to literally do a load-balance on the links in the LAG group.

Regards,
Manish
Photo of Harkanwaljeet Singh

Harkanwaljeet Singh

  • 794 Points 500 badge 2x thumb
Hi Guys,
This whole thread is quite informative and helped me to correct my understanding on Load sharing on Extreme x450 switches.

But i have couple of questions.

Brief:
In my case, two x450a connected to each other; a static load-sharing (2 physical ports, address based L2) is running and only one of the port is 80% utilized while the 2nd port in LAG has less than 5%. I can understand (or better word is assume) that majority of the traffic has same source and destination MAC, though I do not have wireshark or other logs to support this statement.

Problematic scenario:
There were massive outages reported and during troubleshooting on these Extreme switches, it was found that utilization of one port in load sharing touched 100% but for the 2nd port, it remained same as earlier.

Few of activities performed then:
1) disable/enable of 2nd port on both switches but no change.
2) Disable/enable sharing on one switch
2) added 3rd link in LAG which started carrying the traffic but the older 2nd port of LAG still exhibited same behavior.
-------
We then received an update that the issue is resolved. I am now not sure if 2nd or 3rd activity or some activity at any of the other nodes (Not Extreme switches but Application servers- yes there were few activities carried out on Application servers by other teams during the same time) solved the issue.
------

3) We then changed the configuration of LAG from L2 to L3_L4. Traffic pattern and nodes are under observation now.

Now comes the real question for two configurations ((sharing L2 as well L3_L4).

1) What will the behavior in case utilization of one of the ports in Load sharing touch 100%? 
2) Will the extra traffic be dropped or shifted to the 2nd port?
3) How to check if the over utilized port in LAG is dropping traffic?

Your inputs are much awaited on this case.

Thanks
Harkanwal
(Edited)
Photo of Grosjean, Stephane

Grosjean, Stephane, Employee

  • 13,676 Points 10k badge 2x thumb
Traffic would be dropped.
Changing the hashing algo to l3_l4 is a good try.
If most of your traffic is between 2 end-systems, you need to find some entropy in the headers to help load-balancing. L4 usually helps there.
Photo of Prashanth KG

Prashanth KG, Employee

  • 5,300 Points 5k badge 2x thumb
In addition to comments from Stephane, the packets at the port level can be monitored using the command, "show port <port number> congestion". 
Photo of Sunil Pratap Singh

Sunil Pratap Singh

  • 60 Points
Router---------Extreme switch(X450a-48t)--------(Port aggregation)-------Extreme Switch (Summit48si),

On X450a-48t (enable sharing 2:2 grouping 2:2-4 algorithm address-based L3_L4) and on Summit48si (enable sharing 1 grouping 1,2,3 algorithm address-based )

We are facing congestion on all aggregated ports (X450a-48t) while not able to see on Summit48si. There is sufficient bandwidth. What could be the problem
Photo of Jarek

Jarek

  • 2,398 Points 2k badge 2x thumb
Hi,

congestion counter increase when switch buffer on  egress have insufficient space for packets.

Mostly traffic flows from router to access switch and because of that you see congestion only on X450a.
For example you have about 500 Mb egress traffic per 1G port.
Internet traffic is not constant, some time we see micoroburst - we try send more than 1G or near 1G

What you see after: 'show port buffer' ?

What you can do:
- enable flow control
- if you use qosprofiles - delete  unused qosprofiles
- tune port buffer - don't change if you don't know what are you doing :)

--
Jarek
(Edited)