10gig optic port goes down with remote fault, needs power down and up to get it back once a while

  • 0
  • 1
  • Problem
  • Updated 3 years ago
  • Solved
I have an extreme switch recently configured, the optical port is not working very good. The link state is always active for a while(couple of hours to couple of days), and then change to link state "ready" and never come back until I power the whole switch down and up.

Recent log, related to this issue is :
09/29/2015 22:05:50.48 <Info:vlan.msgs.portLinkStateDown> Port 45 link down - remote fault 09/29/2015 22:05:50.47 <Info:vlan.msgs.portLinkStateDown> Port 46 link down - remote fault
09/29/2015 07:02:32.41 <Info:vlan.msgs.portLinkStateUp> Port 45 link UP at speed 10 Gbps and full-duplex
09/29/2015 07:02:32.40 <Info:vlan.msgs.portLinkStateUp> Port 46 link UP at speed 10 Gbps and full-duplex
09/29/2015 07:02:27.37 <Info:vlan.msgs.portLinkStateDown> Port 45 link down - Local fault
09/29/2015 07:02:27.36 <Info:vlan.msgs.portLinkStateDown> Port 46 link down - Local fault

Seems like the local fault can be recovered by the switch itself, but if the remote fault occurs, the port won't come back. Does anybody know what the remote fault really means? 


Switch Type: X670V-48t
EXOS version: 15.2.2.7

detailed transceiver information on the extreme switch side:
SFP or SFP+:               SFP+
Signal:                    present
TX Fault:                  no
SFP/SFP+ Vendor:           AVAGO
SFP/SFP+ Part Number:      AFBR-709SMZ
SFP/SFP+ Serial Number:    AD15163022E
SFP/SFP+ Manufacture Date: 150413
SFP/SFP+ Type:             SFP
Connector:                 LC
Type:                      SR
Supported:                 no

Wavelength:                850
GBIC supports DDMI.  MonitorType: 68

On the other end, we are connecting to EXFO QA-805 10 gig optic ports with transceivers supported by them.

Thanks!
Photo of Mary Peng

Mary Peng

  • 150 Points 100 badge 2x thumb

Posted 3 years ago

  • 0
  • 1
Photo of Patrick Voss

Patrick Voss, Employee

  • 11,474 Points 10k badge 2x thumb
The remote fault means that the disconnect is coming from the other side. Can you check the "show port <port#> rxerror" on both those ports?
Photo of Mary Peng

Mary Peng

  • 150 Points 100 badge 2x thumb
Hi Patrick, thanks for your quick response. I think the show command shows nothing like the following.

X670V-48t.2 # show port 46 rxerror
Port Rx Error Monitor                                  Tue Sep 29 23:17:16 2015
Port      Link     Rx      Rx      Rx        Rx      Rx         Rx         Rx
          State    Crc    Over    Under     Frag    Jabber      Align      Lost
================================================================================
46        R        0       0        0        0        0          0          0















================================================================================
          > indicates Port Display Name truncated past 8 characters
          Link State: A-Active, R-Ready, NP-Port Not Present, L-Loopback
          0->Clear Counters  U->page up  D->page down ESC->exit
Photo of Patrick Voss

Patrick Voss, Employee

  • 11,474 Points 10k badge 2x thumb
Hi Mary,

You are correct. I don't see any errors. The errors do reset after a switch reboot and if the port never became active it wouldn't receive any errors. Can you try taking a known working GBIC and replace the one having issues. Check both sides as well, this could be related to the device on the other side.
Photo of Mary Peng

Mary Peng

  • 150 Points 100 badge 2x thumb
Hi Patrick,

I heard other teams in our company use the same GBIC with another extreme switch with no problem. I didn't reboot the switch, and still kept it at the 'ready' state when doing the show command. Is there some other command that I can try to get errors happened on those ports?

The device on the other side don't show any error and I don't need to reboot it to get the port back. You are right, I will also contact with them to see if they know about this issue. But for any reason the port on extreme switch got an error, I think it should restart or active itself, instead of in state 'ready' forever.

Thanks,
Mary
Photo of EtherMAN

EtherMAN, Embassador

  • 6,456 Points 5k badge 2x thumb
need to get light readings if you can.. I would guess you are also using type 3 MM patches?? We have found if there is a bad patch or a single high loss in one fiber it will also randomly go down and you have to pull and reseat the sfp+ to get the link back up... We have also seen that some of the SFP+ just don't electrically seat properly and can be a bit flakey to keep stable
Photo of Mary Peng

Mary Peng

  • 150 Points 100 badge 2x thumb
Thanks for your reply. I'm using MM50/125 (OM3) fiber patches and SFP+ as stated above. I think it might be the issue, if the following experiment is correct: I loop back from port 45 to port 46 using the fiber patch, the link is down after a while with "local fault" in event log and no way to up again without a cold reboot.
Photo of Alexandr P

Alexandr P, Embassador

  • 12,042 Points 10k badge 2x thumb
Hi!

Also you can try to increase the value of debounce timer on ports 45-46.

Do you use ports 47-48 as alternate stack-ports? (when use, ports 45-46 should not be used as data ports - it belongs to X670, but was an issue with X670V too).

Thank you!
Photo of Mary Peng

Mary Peng

  • 150 Points 100 badge 2x thumb
Hi AlexandrP, I'm not very familiar with stack ports, is it set by default? I'm not sure if we are using stack port or not.

I tried the follow command for setting debounce timer:
X670V-48t.1 # configure port 45 debounce time 4000                                  ^
%% Invalid input detected at '^' marker.

But the command is not working for my console. 
Photo of Mary Peng

Mary Peng

  • 150 Points 100 badge 2x thumb
I did a show stacking-support and find the following:
Stack    Available PortsPort    Native  Alternate  Configured  Current
-----   -----------------  ----------  ----------
1       No      47         Native      N/A
2       No      48         Native      N/A
stacking-support:          Disabled    N/A

Flags: * - Current stack port selection


Does that mean I'm using port 47 and 48 as alternate ports?
Photo of christopher madison

christopher madison

  • 360 Points 250 badge 2x thumb
we have had this issue in the past with our 10g ports going down with remote fault and will not clear until reboot. We ended up putting in RMA for our devices and haven't had the issues since. I can tell you that putting in a different sfp on both sides of the same type seems to clear the remote fault issue.

step 1. remove sfps.
step 2. put in different sfp on both sides.
step 3. reinsert original sfp in both sides.
step 4. connect fibers back.

this has worked for me in the past as a work around to get things back on the air for a short period of time.
Photo of christopher madison

christopher madison

  • 360 Points 250 badge 2x thumb
Please never....ever....look at the laser. if you are looking into any laser you can damage your eye. Please review your fiber safety documentation. You should never be able to see a laser with your bare eye,
Photo of Mary Peng

Mary Peng

  • 150 Points 100 badge 2x thumb
Thank you! I will check it via a camera later.
Photo of Ryan Mathews

Ryan Mathews, Alum

  • 8,988 Points 5k badge 2x thumb
Good call out Christopher.  Thanks for keeping an eye on the big picture here...literally.
Photo of David Coglianese

David Coglianese, Embassador

  • 5,944 Points 5k badge 2x thumb
FYI, The phone camera works great as long as the camera does not have a filter built in. 
Photo of Mary Peng

Mary Peng

  • 150 Points 100 badge 2x thumb
Thanks David!