Need to PXE boot on X650 on a 2x10GbE LACP LAG

  • 0
  • 1
  • Question
  • Updated 4 years ago
Hi All ; 

We are currently attempting to boot a 2x10GbE server with Intel 82559 NICs with Boot Agent (XE). We have both NICs set to PXE boot. We get an IP address from DHCP on one of the NICs, but we cannot get the boot file pxelinux.0 over TFP, it just hangs.

If we put a machine on the same VLAN it gets the file no issues, its the LACP LAG that it seems to choke on. 

Whats the proper switch config (single switch, no stack) that allows a PXE boot from a LACP LAG. Ive tried a huge number of things and cant figure it out. 


Photo of brendan

brendan

  • 90 Points 75 badge 2x thumb

Posted 4 years ago

  • 0
  • 1
Photo of rbrt_weiler

rbrt_weiler

  • 834 Points 500 badge 2x thumb
Hi Brendan,

I would not recommend to connect the server with two links while booting, as LACP has to be configured on the host as well - and I doubt that Intels Boot Agent does LACP by default.
  1. Connect the server with one NIC only.
  2. Install the operating system via TFTP.
  3. Configure LACP in the operating system once it is installed.
  4. Connect the second NIC of the server.
At least that is the way we did it for the last few years. And even in that case you should calculate a few seconds of offline time once you connect the second NIC (LACP initialization etc.).

-Robert
Photo of brendan

brendan

  • 90 Points 75 badge 2x thumb
Hi Robert ; 

Thanks for the quick reply ; 

I netboot a ramdisk image over the network for this cluster. We don't use the local disk in permanent way in this cluster. 

Other solutions have been with other switches around LACP fallback, where the channel falls back to single ports until LACP PDUs are recieved .

This feature is known as "port-channel lacp fallback" on one vendor, on another its:

"ether-options 802.3ad lacp force-up" ; "aggregrated-ether-options lacp active" and "aggregated-ether-options periodic fast"

on another its ; 

"lacp ungroup member-independent port-channel <1-128>"

on another its ; 

lacp suspend-individual ; no lacp graceful-convergence

etc. 

I figured this is a fairly common use case these days with cloud-style architectures this should be available in ExtremeXOS. 



Photo of rbrt_weiler

rbrt_weiler

  • 834 Points 500 badge 2x thumb
If I understand the Concept Guide correctly that's what EXOS does by default.

All ports configured in a LAG begin in an unselected state. Based on the LACPDUs exchanged with the remote link, those ports that have a matching key are moved into a selected state. If there is no matching key, the ports in the LAG remain in the unselected state.

Active mode can by configured.

configure sharing <port> lacp activity-mode [active | passive]

Have you tried booting the server with one NIC only? Maybe it's not a LACP issue at all.
Photo of brendan

brendan

  • 90 Points 75 badge 2x thumb
Hi Robert ; 

I will try booting without LAG. I think the issue I am seeing is both ports are "active" which can lead to L2 confusion because PXE ROMs are neither .1q (this is a non-tagged VLAN so thats not the issue) or able to deal with LACP. 

Thanks for the help. Will update.
Photo of Schmidt, Louis

Schmidt, Louis, Employee

  • 60 Points
I've run into this issue in another area, and here's a GTAC built script that will help.  Note the "period" variable -- given how fast BOOTP usually times out, I'd change this to something lower like 10.

From Oscar Koot: 
I used Clearflow with the rule-count-true function to ensure the script/command is only run when the delta changes (first time no lacp packets are received and first time lacp packets are received).
 
This would be a policy like that.
Create 1 ACL for each LAG and only apply to master port.
 
Policies at Policy Server:
Policy: cnt_lacp
entry LACP1 {
if match all {
    ethernet-destination-address 01:80:c2:00:00:02 ;
    ethernet-type 0x8809 ;
}
then {
    count LACPpkt ;
    permit  ;
}
}
entry LACP_detect {
if match all {
    delta LACPpkt > 0 ;
    period 40 ;  
}
then {
}
}
entry LACP_notdetect {
if match all {
    delta LACPpkt == 0 ;
    period 40 ;
}
then {
}
}
entry CF_enasharing {
if match all {
    rule-true-count LACP_detect == 1 ;
    period 40 ;
}
then {
    syslog Enable_Sharing info ;
     cli "either run command here to enable sharing or do that with script that does additional checks"
}
}
entry CF_dissharing {
if match all {
    rule-true-count LACP_notdetect == 1 ;
    period 40 ;
}
then {
    syslog Disable_Sharing info ;
    cli "either run command here to disable sharing or do that with script that does additional checks"
 
}
}
Number of clients bound to policy: 1
Client: acl bound once