03-27-2020 09:45 AM
i have reoccuring packet loss of 20s-2m and need some help debugging.
we are using the following components and configuration:
4x VSP 8600 Version 22.214.171.124
200x Summit x440G2 Version 126.96.36.199
The VSP are configured in an spbm cloud. Two VSPs per location are configured as smlt-cluster.
Every x440G2 is connected via 2 x 10GBit interfaces to two of the VPSs with LACP. The VSP-Uplinks to thex440G2 are SMLT/RSMLT trunks and fa enabled.
Routing is solely done on the VSP switches. The management-vlan gets tagged to the uplink ports via FA.
With icinga im monitoring the management interfaces of the x440G2. a few times a day the switches are not pingable for a short time and create an alarm. often times these occur on many switches at the same time.
i’m guessing the management vlan, which is preconfigured on the switches and not defined via FA management-vlan configuration, loses its assignment to one of the VSP-Uplinks. Are there any timeouts that can be set, so that the vlan never disappears.
can you guys give me a hint what could cause these problems or how i could go about debbuging them?
04-02-2020 03:45 PM
If you do configure the FA mgmt VLAN on the VSP8600 FA-enabled ports (or MLT interfaces) at least the Mgmt VLAN will be permanently plumbed on the VSP side (and the XOS should dynamically do the same on the port where it sees the VSP FA Server).
If instead it is the XOS switches which are configured with the VLAN + I-SID (NSI) then the VLAN is dynamically asked by the XOS FA Proxy switch, and then plumbed on both sides of the uplinks (VSP and XOS) if the VSP FA Server accepts the binding. If for any reason the XOS switched did stop requesting that mgmt VLAN via FA LLDP signalling, then the VSP would remove the mgmt VLAN after a timeout period.
There is no harm in doing both (setting the FA mgmt VLAN on the VSP side + configuring the VLAN and I-SID on the XOS side) even if they pertain to the same VLAN. Generally the 1st approach is preferred.
03-30-2020 07:44 PM
First of all the issue seems quite severe and recurrent, I would open a call at GTAC to get help on solving this.
With the provided info, there is not so much to say. As first check I would ensure that no spanning tree is active on the SMLT. Only use SLPP as loop protection protocol on the uplinks towards the edges with different rx thresholds.
Have a look at the logs on edges and cores, something should pop-up.