MLAG vs Stack what am i missing

  • 2
  • 1
  • Question
  • Updated 3 years ago
  • Answered
Can't seem to wrap my head around the reason for using MLAG vs Stacking, I'm planning to use 2 x670's as a 10gigabit aggregation hub for our remote sites, but i want redundancy, so the idea was run 1 fiber to each of the x670's and run mlag so if 1 x670 fails, tada still up and working...

But then i got to thinking if i stack those 2 x670's and use a standard lag group from 1:1 and 2:1 to the remote site, isn't it IDENTICAL, but i get the benefit of not dealing with the mlag, not having to deal with managing 2 core switches, and still keep the same load balancing, same redundancy, same resilience and high availability?

I feel like theirs got to be something here I'm missing
Photo of Chris Chance

Chris Chance

  • 140 Points 100 badge 2x thumb
  • unsure

Posted 3 years ago

  • 2
  • 1
Photo of Paul Russo

Paul Russo, Alum

  • 9,694 Points 5k badge 2x thumb
Hey Chris

From a function perspective your are right the failover of the links is the same.

The reason why MLAG is better is that it allows you to do software upgrades of the core. If it is a stack then you have to down the whole stack taking down both links. MLAG has more management as you mention but if redundancy is what you need it is a better design.

I hope that helps.

P
Photo of Mrxlazuardin

Mrxlazuardin

  • 1,474 Points 1k badge 2x thumb
I'm interested with this and will have the likely question since my user ask me about it too. The cause is, the full capacity stacking needs additional parts to buy. Any pros vs cons and comparison between stacking and MLAG? What about the compatibility with other brand products and end points (ex. servers)?
Photo of Grosjean, Stephane

Grosjean, Stephane, Employee

  • 12,592 Points 10k badge 2x thumb
Hi,

MLAG is as transparent as Stacking for any other device for L2/VRRP. To it, this is just a LAG.
MLAG offers a natural local switching compared to Stacking, ie the ISC should be used for unicast traffic only for failover of links (if dual-homing only), while Stacking may use more intensively the stacking links.
MLAG offers also more resiliency, each node has its own Control Plane.
But 2 nodes to manage, and when it comes to L3/MPLS, this is really like 2 routers.
Photo of Chris Chance

Chris Chance

  • 140 Points 100 badge 2x thumb
got to agree a pro-con or comparison between the 2 would be great, beyond just the sw upgrade fact
Photo of eyeV

eyeV

  • 2,484 Points 2k badge 2x thumb
When we have chosen between MLAG and stack, we done lots of tests. We've chosen an MLAG for some reason, one of them is more efficient redundancy between two switches. It's a bit hard to explain, because of my poor English, but I'm going to try.

We use two x670 + VIM4-40G4X-1. These boxes are geographically separated. Then we use stack we combine two 40G ports to one logical stack port, so we have 2 stack ports.



So... if one 40G ports goes down, S1 for instance, all StackPort1 goes down. But if we use MLAG, we still have three working ports. Pretty bad explanation, guys... I've tried...
(Edited)
Photo of Ed McGuigan

Ed McGuigan

  • 490 Points 250 badge 2x thumb
I have wondered the same about my location where the decision has generally been to run two separate systems and MLAG to them.

In actual fact, we seem to have gotten ourselves in a mess by not carefully duplicating config on our parallel routers and there is a need to be very committed to this aspect if one chooses MLAG over stacking. I have also seen upgrading of stacks be problematic so it would be scary to have one stack and hit problems with an upgrade.
Photo of Drew C.

Drew C., Community Manager

  • 37,366 Points 20k badge 2x thumb
I wanted to mention that this is great topic for discussion.  Hopefully others will pop in and contribute!
Photo of Daniel Flouret

Daniel Flouret, Employee

  • 7,470 Points 5k badge 2x thumb
Stephane mentioned one important difference between MLAG and stacking:

"MLAG offers a natural local switching compared to Stacking, ie the ISC should be used for unicast traffic only for failover of links (if dual-homing only), while Stacking may use more intensively the stacking links."

Let me translate his words into drawing...

You have switches C1 and C2 stacked together. The stack connects to switch A through LAG1 and to switch B through LAG2.

Traffic flowing from switch A to switch B  should (hopefully) be equally balanced amongst all links in LAG1 and LAG2.

We should expect something like this:

Some (half?) of the flows would traverse the stacking links.

If you use MLAG instead of stacking you would see this traffic pattern:

The ISC has filters that prevent unicast traffic from traversing it, except when there's a link failure.

If the link between C1 and B happened to fail, this is what you would see:

BTW, in case you were wondering, this is also the traffic pattern you'd see if you were using stacking and had the same link failure...
Photo of Mrxlazuardin

Mrxlazuardin

  • 1,474 Points 1k badge 2x thumb
Hi Daniel,

Do you mean that MLAG cannot be used for load sharing like standard LAG?

Best regards,
Photo of Grosjean, Stephane

Grosjean, Stephane, Employee

  • 12,592 Points 10k badge 2x thumb
Hi,

why are you saying that? The end-systems are using regular LAGs, and load-sharing happens normally. If you look at the middle picture from Daniel's post, A has a LAG and traffic is flowing through it on both links.
Photo of Ed McGuigan

Ed McGuigan

  • 490 Points 250 badge 2x thumb
I think the point with the first diagram is that the two switches function as one logical switch. Traffic flows from A along one of the MLAG ports into one side of this logical switch. That traffic must now flow to B and the logical switch sees that it has a LAG going to B. It takes no account of the fact that one port in the LAG is on the local slot and the other on the other slot and just splits the traffic. This results in traffic transiting the stacking links that are viewed as the "backplane" of the logical switch.
Photo of Paul Russo

Paul Russo, Alum

  • 9,694 Points 5k badge 2x thumb
This has been a great discussion.  I just want to make a few points.

From the edge device's point of view it has a LAG that it will hash the traffic out to the core the same way whether it uses MLAG or stack.  To it there is no difference and if a link or MLAG switch fails it will move it's connections no differently than if it was going to a stack. 

The ISC link is only used for redundancy as Daniel drew or for any device that is not dual homed into both switches.  If the core is done correctly there should be little traffic going across this link.

Lastly in order to get the load distribution across both cores in a L3 environment you need to have the two MLAG switches as Active/ Active.  This is explained in the concepts guide.  Today this is done with having a ACL on the ISC links that blocks the VRRP updates.  Going forward, may be in 15.7 can't remember, you will be able to do this without an ACL

In my opinion if you are looking for redundancy in the network you really should be using MLAG.  The cost and equipment is the same, assuming you are not using stacking ports.  What I mean by that is that if you use 460s or 670s or any other switch type and use native 10G/40G ports to connect them together for MLAG your cost is the same for stacking.  The ports are either set as regular 10G ports or stacking ports. 

Hope that helps
P
Photo of Ed McGuigan

Ed McGuigan

  • 490 Points 250 badge 2x thumb
You've got me thinking now Paul. We have a layer 3 environment and we don't have VRRP blocked across the ISC as best as I remember. So we are talking about both routers believing they are the Master and that not being problematic.

Will need to read the concepts guide.
Photo of Paul Russo

Paul Russo, Alum

  • 9,694 Points 5k badge 2x thumb
Hey Ed yes. and if you are not in active/active then all of the routed traffic going to the slave will go across the ISC>

P
Photo of Daniel Flouret

Daniel Flouret, Employee

  • 7,470 Points 5k badge 2x thumb
For those of you familiar with EOS, a similar problem can be found in VSB (Virtual Switch Bonding), the virtual "stacking" functionality available in S-Series, K-Series and 7100 Series switches.


EOS allows the management of user traffic accross bonding ports (the equivalent to the stacking ports) with the command
set lacp outportLocalPreference [none | weak | strong | all-local]
to encourage a chassis-bonded switch to use local egress ports on a LAG, where
None = Do not prefer LAG ports based on chasis
Weak = Use a weak preference towards ports on the local chassis
Strong = Use a strong preference towards ports on the local chassis
All-local = Force all packets onto local chassis ports, if possible
(Edited)
Photo of Grosjean, Stephane

Grosjean, Stephane, Employee

  • 12,592 Points 10k badge 2x thumb
It is worth to note that on certain platform and exos version, you can also do a port-based LAG, meaning you can have a local port used in a stack.
Photo of Carsten Buchenau

Carsten Buchenau

  • 888 Points 500 badge 2x thumb
Great thread & comments, thanks guys!

I have 2 additional thoughts:

1) Dual-homed: Any device connected to the MLAG enabled switches must be dual-homed aka connected to both switches. Taking Daniel's image from just above, that server on the right hand side must not be connected to only one of the 2 switches if they were not stacked but in an MLAG relationship. Otherwise, use a stack.

2) Bigger picture: All examples above are very simple. I believe MLAG gets really interesting when we have a 2-tier environment and more. We are currently working on a 2-Datacenter setup with 2 Cores in each, and several ToR combos (always 2 per rack) behind the Cores. Configuring at least the Cores as MLAG, and possibly the ToRs, prevents us from using Spanning Tree plus gives us the feature of 0-downtime firmware upgrades - and we can still configure each Core as individual router / VRRP instance.

Hope that makes sense....
Photo of eyeV

eyeV

  • 2,484 Points 2k badge 2x thumb
Hi, everybody!

I've got a classic MLAG topology like Daniel posted.




I've shutdown port between top router and one of the MLAG switches for some reasone. So... I see strange situation. All bottom S switches tend to send more traffic to X1 switch. Why?

It would be great if someone suggest something.
Photo of Bill Stritzinger

Bill Stritzinger, Alum

  • 6,016 Points 5k badge 2x thumb
The sending switch determines the distribution based on the hash selected.  What are switches lags configured for?  Have you tried changing the has to see if you can get get better distribution?  What is the makeup of the traffic?

Bill
Photo of eyeV

eyeV

  • 2,484 Points 2k badge 2x thumb
Bottom switches have one line about LAG in their configs.
enable sharing 25 grouping 25-26 algorithm address-based L2

I can't change hash, because it's temporary situation. We are going to "normalize" it tommorow.
(Edited)
Photo of Daniel Flouret

Daniel Flouret, Employee

  • 7,470 Points 5k badge 2x thumb
eyeV,

This is a WILD guess.

When traffic is sent from S switches to the router, it gets evenly balanced (sort of...) among the LAG members going to the X switches.

But when traffic is sent from the router to the S switches it gets to X1 and this switch will send traffic to the S switches using its local LAG links and not through X2 links, which would require traversing the ISC. Thus, you'll see a lot more traffic on the links connecting to X1.

I don't know if this is exactly what is happenning, but as my italian grandmother used to say...

se non è vero, è ben trovato (If it is not true, it is a good story). Hahahaha
(Edited)
Photo of eyeV

eyeV

  • 2,484 Points 2k badge 2x thumb
Hi guys! It was my mistake. Sorry )