x450e stack port errors

  • 0
  • 1
  • Problem
  • Updated 2 years ago
  • Solved
Multiple stacks of 450e switches are taking large numbers of rx over errors on all stack ports .

Image 12.3.1.2

We are trying to upgrade to 15.3 but are having trouble getting the code through.

Does anyone have any ideas for this?
Photo of David Coglianese

David Coglianese, Embassador

  • 5,944 Points 5k badge 2x thumb

Posted 2 years ago

  • 0
  • 1
Photo of Patrick Voss

Patrick Voss, Employee

  • 11,474 Points 10k badge 2x thumb
Hello David,

How long have these stacks been up?
Photo of David Coglianese

David Coglianese, Embassador

  • 5,944 Points 5k badge 2x thumb
A long time. They went in about 2009 in the pic the first switch is up 900 days and the other 2 have been up about 20 days.

Thanks
Photo of Colatuno, Joe

Colatuno, Joe, Escalation Support Engineer

  • 1,096 Points 1k badge 2x thumb
Hey David,

RX over is good jumbo frames over the 1518 byte size.  I wouldn't expect this to be a problem with upgrading the stack.  Can you provide a little more detail on where in the upgrade process you are failing?

When packets need to traverse between stack nodes some packets could certainly be greater than the 1518 byte size as when they need to go through hi-gig links, a hi-gig header of 12 bytes would be added.

My recommendation, if the upgrade is adding the image to the master node and not the other nodes, would be to download/install to master and sychronize the remaining nodes to see if that works. 
Photo of Tony Thornton

Tony Thornton, Extreme Alumnus

  • 1,412 Points 1k badge 2x thumb
Hi David,

When you mention having problems getting code through, are there any other associated errors during download or timeouts, etc.?


Regards
Tony
Photo of Brandon Clay

Brandon Clay, Escalation Support Engineer

  • 13,086 Points 10k badge 2x thumb
Like Joe said, the RX over errors indicate packets larger than the configured MTU size were recieved.

Normally, jumbo frames should be enabled on the stack ports internally with the max MTU size, and it should not be possible to disable it on the stack ports.

Since they have been up for so long and are running an old version of EXOS, it might be worth a shot to just reboot the stack then try to upgrade it after the reboot. It seems like the port config on the stack ports may be stuck in some odd state.
(Edited)
Photo of Patrick Voss

Patrick Voss, Employee

  • 11,474 Points 10k badge 2x thumb
Hey David,

You may be running into another unrelated issue on the stack. I know that CPU utilization can become high based on up time on older versions of code. Would it be possible to look at top and see if it is running high. A reboot may be needed before the upgrade is attempted.
Photo of David Coglianese

David Coglianese, Embassador

  • 5,944 Points 5k badge 2x thumb
Thanks for all the feedback.

I have seen failure due to timeouts, which I theorized could have been due to re-transmits.

We also had slot 2 of a 3 slot stack fail after code was pushed out. I have not had access to fully trouble shoot this failure. I thought it possible that code made it to the other slots and the stack rebooted due to a  power event there would have been a code miss-match which could explain the slot failure. 

Unfortunately I can not reboot the slots unless I am onsite because the customer is concerned that some switches wont make it back. 

To confirm:
You're saying the overs are not something I should be concerned with?
Is there a way to check that the stack ports are properly configured for jumbo frames?
Are there any commands I could run that might reduce the numbers I am seeing?


Here are about 16 hours worth of errors:

We did reboot a building last night while we were onsite installing a 10g TOR backbone and those stacks appear to be running clean now.

I hate to ask, but if the reboot does fix the issue how often should they be rebooted? Two of the stacks in the image above have only been up 20 days.

Thanks,