Header Only - DO NOT REMOVE - Extreme Networks

link down - Local Fault


I have been seeing an issue in my virtual environment of randomly losing connectivity to iSCSI LUNs. My hosts are showing a loss of connectivity, I looked at the logs on my X670V-48t and i found what looked to be the same time frame of the port going down stating :
05/22/2016 19:23:28.19 [i] Slot-1: Port 1:21 link down - Local fault
05/22/2016 19:23:57.64 [i] Slot-1: Port 1:21 link UP at speed 10 Gbps and full-duplex
05/22/2016 19:23:58.99 Slot-1: Configuration mismatch detected by DCBX (Baseline v1.01) for the PFC TLV on port 1:21.
05/22/2016 19:23:59.99 Slot-1: Configuration mismatch resolved by DCBX (Baseline v1.01) for the PFC TLV on port 1:21.

I need a little insight on what is possibly going on here.

8 replies

Userlevel 5
Jason,

If we look at the first message we see that the link went down and was detected on the local port. The port detects the link fault on its TX or RX pair based on IEEE 802.3ae-2002.

The final message is based on LLDP reporting that the configuration mismatch has been resolved (once the port comes back up). The configuration now matches, and the DCB feature should be operating properly.

I would focus on ruling out any layer1 issues such as cabling and port at this time. The port appears to go down briefly and then comes back up.
Sorry but " The port detects the link fault on its TX or RX pair based on IEEE 802.3ae-2002." is a pretty vague statement, especially when I don't have access to view the standard as they want it to be bought for over $1000. How about a little more clarification on that. What defines a fault, for this and how is it defined local ?
I understand the concept of the remote as the remote device sends that signal, but how does it know that the fault is local.
This happened several times several minutes apart. I am not currently seeing this issue the cabling has not changed or been touched. I was not seeing this issue before as it seems to just appeared. It has also happened on some other ports as well, so i don't really think that I have had a bunch of cables just go bad, 1 cable ok but 10, I don't believe that.
Do you have another suggestion of a possible direction to look at to troubleshoot.
Userlevel 5
So it appears that the issue is happening on multiple ports. Could you provide the following information:

1. It is the same ports affected each time?
2. Are the ports part of a Link Aggregation Group?
3. How frequent the issue is occurring?
4. The version of EXOS installed

More information on the fault detection process can be found in the following article:

https://gtacknowledge.extremenetworks.com/articles/Q_A/what-is-the-difference-between-local-fault-an...

Isolating the issue and ruling out any layer 1 issue is generally a first step in troubleshooting. If multiple ports are affected, please isolate the issue by moving one of the problematic links to another port to see whether the problem follows. Also check for any errors on the port using the command show port rx errors and show port tx errors.
Userlevel 4
- Local Fault indicates loss of signal detected on the receive data path of a local port
- Remote Fault indicates a fault on the transmit path

The ports affected are the same set of ports each time.
None of the ports are part of an aggregation.
It happened to each of the ports 3 times in a 2hr period.
XOS version is 15.3.3.5

This was happening to 9 different ports which were attached to 3 different hosts, 3 ports per Host.
Host A - 1:3 mgmt, 2:26 iSCSI, 2:18 VM data (mulitple vlans)
Host B - 1:38 mgmt, 1:21 iSCSI, 2:8 VM data (multiple vlans)
Host C - 2:21 mgmt, 2:14 iSCSI, 1:12 VM data (multiple vlans)

2 of the host, Host A and Host B were affected enough that they needed to be rebooted. Host C for some reason did not show any signs of distress.

Here is the full log from the time frame:
Userlevel 7
Jason Weems wrote:

The ports affected are the same set of ports each time.
None of the ports are part of an aggregation.
It happened to each of the ports 3 times in a 2hr period.
XOS version is 15.3.3.5

This was happening to 9 different ports which were attached to 3 different hosts, 3 ports per Host.
Host A - 1:3 mgmt, 2:26 iSCSI, 2:18 VM data (mulitple vlans)
Host B - 1:38 mgmt, 1:21 iSCSI, 2:8 VM data (multiple vlans)
Host C - 2:21 mgmt, 2:14 iSCSI, 1:12 VM data (multiple vlans)

2 of the host, Host A and Host B were affected enough that they needed to be rebooted. Host C for some reason did not show any signs of distress.

Here is the full log from the time frame:

Jason, are you still having link troubles?
Jason Weems wrote:

The ports affected are the same set of ports each time.
None of the ports are part of an aggregation.
It happened to each of the ports 3 times in a 2hr period.
XOS version is 15.3.3.5

This was happening to 9 different ports which were attached to 3 different hosts, 3 ports per Host.
Host A - 1:3 mgmt, 2:26 iSCSI, 2:18 VM data (mulitple vlans)
Host B - 1:38 mgmt, 1:21 iSCSI, 2:8 VM data (multiple vlans)
Host C - 2:21 mgmt, 2:14 iSCSI, 1:12 VM data (multiple vlans)

2 of the host, Host A and Host B were affected enough that they needed to be rebooted. Host C for some reason did not show any signs of distress.

Here is the full log from the time frame:

I am still having trouble but I don't believe it is an issue with the extreme equipment. I believe it is another issue. Thank you.
Userlevel 2
Just an FYI for anyone else who see this. I just had this issue with a 10gb connection between a x620 and an x460. I swapped out the SFP+ cable and the problem went away.

Reply