EXOS: X670V-48x Link up delay ??

  • 1
  • 1
  • Question
  • Updated 2 years ago
  • Answered
We have 2x X670V-48x Stacks (with 2 Switches each) for redundant Servers connections. We use both hardware servers (windows 2008/2012) and virtual servers(vmware 5.5)). The redundancy mechanism is triggered by port Link status ! (All people know that is not best - but it cannot be changed). The regarding Servers are connected via 1GB TP GBICs (original Extreme parts).

We have some trouble during server redundancy tests !
+ If we power down the switches (pulling power cable) - the expected switch over to backup server run fast (4 seconds) - perfect!
+ If we reboot the switch (warmstart via CLI) the server need a lot of more time (nearly 1 minute) to switch over to the backup server!

The problem is that if the switch was rebooted the switching logic is shutting down fast by the software but the port Link is UP (gbic). Only at the end of the switch shutdown process the GBIC lost the LINK Up (= port down). Now the server recognize that the link failed and switch over. At startup the same problem -  the switch link up the GBIC first  but the switching logic (EXOS processes) are not running for 100%. So the server think that the link is OK and tried to communicate - but this failed because the switch needs time to startup. This behaviour get more worse because the X670V need a lot of time to boot (regarding to X670-G2).

My question is now - can we control the GBIC link state? In my szenario it would be the best if the link (maybe through a delay of x seconds) come up only if the switch is booted completly. And if the switch was rebooted, shutdown the link immediately (not at the end of the shutdown process).

Any ideas or feedback if you are fighting with the same problems ?
Photo of M.Nees

M.Nees, Embassador

  • 9,126 Points 5k badge 2x thumb

Posted 2 years ago

  • 1
  • 1
Photo of Andreas Larsen

Andreas Larsen

  • 72 Points
Maybe use a eventscript ? That triggers on reboot command ?
Would be nice to see a official answer. I think that the process of shutdown should be syncronous with powering of the ports. Not only logically shutting them down.
Photo of M.Nees

M.Nees, Embassador

  • 9,126 Points 5k badge 2x thumb
Hi Extreme Guys,

any statement to my issue ?!

Regards
Photo of Stephen Williams

Stephen Williams, Employee

  • 8,838 Points 5k badge 2x thumb
I'm not sure anything can be done about the shutdown delay on link down or the startup delay.  The server doesn't have a sticky failover setting?

You could disable the port --> save the config --> then enable the port and leave the config unsaved.  If the switch reboots the port will stay down until you enable it.

UPM could work.
Photo of M.Nees

M.Nees, Embassador

  • 9,126 Points 5k badge 2x thumb
Hi Stephen,

the steps you recommend are maybe for the lab not for productive customer environment (if we want that the customer buy extreme switches again ...).

The main problem is that at startup the Link comes up to early (the switch is not  fully booted) and at shutdown the link goes down to late (Switching processes are killed, but link is still up ).

So the issue can only be avoid at shutdown with config commands but not at startup !


Any better suggestions. I think i am not the only customer that have this kind of problems !

Regards,
Matthias
Photo of EtherMAN

EtherMAN, Embassador

  • 6,456 Points 5k badge 2x thumb
I seems you are asking for a option that would be the same for our servers... I can pretty much guarantee if you issue a restart on any server or pc you will see the same results where the hardware layer one link comes up before the software is fully loaded and passing traffic...Switches and routers are the same the OS will always take longer to load then the layer ports coming up... 

One thought to help if I recall by default hardware checks are going on as part of the boot up process so not sure if you could trim the time down by killing that... It is on by default if I recall... 

seems you only option on planned reboots would be to disable the ports....save... reboot... enable the ports ... save ... This would help you on planned things but not on a random software failure where the switch reboots on it's on.  

Good luck 
Photo of André Herkenrath

André Herkenrath, Employee

  • 1,942 Points 1k badge 2x thumb
Did you think of enabling the ports on the EMS Message "AAA available" ?
Photo of Prashanth KG

Prashanth KG, Employee

  • 5,300 Points 5k badge 2x thumb
Hi Mathias,

I was about to test with a script to avoid the link up delay during the boot up. However, I realised from the logs that the switch becomes operational first and then the links come up. 

09/24/2015 17:39:40.47 <Info:HAL.Sys.Info> PSU-1 EDPS-300AB A-S7 is present.09/24/2015 17:39:39.36 <Info:vlan.msgs.portLinkStateUp> Port 1 link UP at speed 1 Gbps and full-duplex
09/24/2015 17:39:36.80 <Info:vlan.msgs.portLinkStateDown> Port 1 link down
09/24/2015 17:39:36.42 <Info:vlan.dbg.info> Media BASET is inserted into Port 3
09/24/2015 17:39:36.42 <Info:vlan.dbg.info> Media BASET is inserted into Port 1
09/24/2015 17:39:36.00 <Info:vlan.msgs.portLinkStateUp> Port 1 link UP at speed 1 Gbps and full-duplex
09/24/2015 17:39:35.49 <Info:vlan.msgs.portLinkStateUp> Port Mgmt link UP at speed 1 Gbps and full-duplex
09/24/2015 17:39:35.16 <Info:HAL.Card.Info> Switch is operational
09/24/2015 17:39:30.38 <Noti:EPM.system_stable> System is stable. Change to warm reset mode
09/24/2015 17:39:27.34 <Info:EPM.wdg_enable> Watchdog enabled

Ofcourse, I tested with X460.. However, I do not expect much change in the behaviour of  X670 either. Are we using any additional software features like MLAG/LACP on these ports? 
Photo of M.Nees

M.Nees, Embassador

  • 9,126 Points 5k badge 2x thumb
Hi Prashanth,
many thanks to you that you care about my problem!

You are right for your lab environment.

But we are using X670V which need a log time to boot (because G1). The problem will be emphases through several additional things:
+ Stacking
+ mLAG
+ Problems with 1GB TP Links (Problem with GBICs and Link which do not come up reliable
GTAC Ticket: 01200294)

I will open a GTAC Case to provide you a show tech of the regarding switch.

Another thing!
I the switch tells you in the log "Switch is operational" this surely true for "naked" EXOS but NOT sure for a the daemon like LAGs, MLAGs, Routing, etc - all the actives daemons which managed additional functions. You can see this if you setup you labw with end-systems and let them ping each other.

Generally i think in EXOS need me originally claim:
"Link up delay of X seconds" OR "a depency to an EMS event"!

Regards
Photo of M.Nees

M.Nees, Embassador

  • 9,126 Points 5k badge 2x thumb
Generally i think in EXOS we need my originally claim:
"Link up delay of X seconds" OR "a depency to an EMS event"!
Photo of M.Nees

M.Nees, Embassador

  • 9,126 Points 5k badge 2x thumb
GTAC Case 01201431 is opened!
Photo of M.Nees

M.Nees, Embassador

  • 9,126 Points 5k badge 2x thumb
One other thought - regarding link up on a switch:

On an edge Switch/Link the port link up have to be very fast because of the end-system boot up - this avaoid problems with Windows boot up. in this case is will ne not so important if the switch is fully operational - mainly the NIC is up and the windows boot in normal way.

On an server Switch/Link it is better the link comes up first when the switch is 100% opertional. It does not matter if i need a few seconds/minutes more time. If the Link comes up and the switch does not work fully this is very bad.


The question is how can we address/serving these two opposed demands with EXOS ??
Photo of Grosjean, Stephane

Grosjean, Stephane, Employee

  • 12,552 Points 10k badge 2x thumb
for link up, you can use a script to disable it for a while (1mn?), to be sure the switch is "fully" Up. This script can be a Python process (15.7+) or CLI Scripting launched from autoexec.xsf, for example. If the script is static it's easy to do, if you need something more dynamic, and you know how to find what port should be disabled, that sounds doable without too many effort.
Photo of M.Nees

M.Nees, Embassador

  • 9,126 Points 5k badge 2x thumb
Hi Stephane,

can you provide me an example how can we do that:
1) switch starting (planed or unplaned) - server links down (disabled)
2) wait 180 seconds
3) ena server links

If possible CLI Script via autoxec.xsf prefered.

Regards
Photo of Stephen Williams

Stephen Williams, Employee

  • 8,838 Points 5k badge 2x thumb
I tried using a UPM script to disable the port and it prevented the port from being enabled or linking up untill i rebooted the switch.  This is probably because the port wasn't fully configured from bootup when the UPM script changed it.  It could work but I'm not sure.  Let us know if you get it working.  I was using the log message "switch is operational."
Photo of Esa Kuusisto

Esa Kuusisto

  • 310 Points 250 badge 2x thumb
How about LACP? We have plenty of X670Vs. There is problem with 1G base-t which should be fixed in newest software releases. No problem with 10G connections. What is your software version?
Photo of M.Nees

M.Nees, Embassador

  • 9,126 Points 5k badge 2x thumb
Hi Esa,
LACP will be my prefered method but is not possible because Vmware needs an additional license for this function.
Maybe possible in future.
Photo of M.Nees

M.Nees, Embassador

  • 9,126 Points 5k badge 2x thumb
Hi,

i solved my problem completely though a autoexec.xsf script!

Works fine!
### autoexec.xsf Skript for automatic Link Up Delay 

# Set a regular script timeout
configure cli script timeout 600
enable cli scripting

create log entry "Disabling ports 1:1-24"
disable port 1:1-24

# wait a few minutes for the switch fully to come up
create log entry "Waiting 3 minutes for switch to come up (Link Up Delay)"
set var temp $TCL(after [expr 3 * 60 * 1000])
create log entry "Waiting done"

create log entry "Enabling all ports"
enable port all

disable cli scripting