SNMP Error Timeout

  • 0
  • 2
  • Problem
  • Updated 3 years ago
  • In Progress

Have a customer that has two different sites with a couple S8's at the core of one and S4's at the core of another.

The edge consists of multiple stacks of C5's which the top switch being a C5K for 10gb.

There seems to be a couple, out of the manystacks that persistently give the following error:

SNMP Contact Lost: No SNMP reply from device 192.168.xxx.xx caused by SNMP Error: Timeout[4098], last uptime was 21 Days 23:14:26.4

When Netsight looses SNMP contact to the switch you can still ping and SSH to it, and no other adverse affect seems to be happening to the switch then what seems random lose in polling.

When an event briefly happened I was able to get onto the device and tried to ping the IP address of the Netsight server, which failed! I wasn't able to do anymore testing before it came back online.

The uplink ports consist of one 1 x 10gb and 2 x 1GB as a lag. MSTP is configured with the data vlan using the 10Gb and the lag used for the voice vlan.

Have looked at spanning, and there has been no topology changes, the ports have remained continually up and show no errors in rmon stats.

Not sure if anyone has seen this before and can provide any suggestions?

Many thanks

Photo of Martin Flammia

Martin Flammia

  • 5,744 Points 5k badge 2x thumb

Posted 4 years ago

  • 0
  • 2
Photo of Jason Parker

Jason Parker, Employee

  • 2,918 Points 2k badge 2x thumb
Take a look at the SecureStacks and type
show port flowcontrol
If you receive any pause frames you need to look at the end stations sending them as there may be an issue.
If you do not have the time to look at this then I would(Warning #1: the command will drop the link and renegotiate the options/speed duplex, and bring the link back to life) run the command, clear port advertise port# pause
If you select to run the command on the uplink port(s)
Warning #2- This would cause a topology change -thus clearing the FDB(filtering data base) of all local switches so please take note.
Good Luck
Jason
Photo of Martin Flammia

Martin Flammia

  • 5,744 Points 5k badge 2x thumb

Just to let you know it seems you where right on the money there, the 10Gb link between the core and the edge is experiencing problems:

show port flowcontrol tg.3.2

Port         TX Admin  TX Oper RX Admin  RX Oper TX Pause Count RX Pause Count
------------ -------- -------- -------- -------- -------------- --------------
tg.3.2        enabled  enabled  enabled  enabled              0        4597790


KGH_SDP5-2_US1(su)->show port flowcontrol tg.1.49
 port           TX Pause Count      RX Pause Count
--------       ---------------      --------------
 tg.1.49        31303                0

Now I just have to work out why.

Many thanks for your help.

Photo of Jason Parker

Jason Parker, Employee

  • 2,918 Points 2k badge 2x thumb
Please note that this could be as simple as Multicast traffic flooding on the network and some PC's do not take well to dealing with traffic they are dropping

Here are a  few sets to take
1. See what ports have high Rx's
2. Duplex/speed issue -printers and low end machines
3. If daisy chained then maybe the traffic s coming from another switch and they have a bad client(Blue screen of death.

Good Luck
Jason
 
Photo of Martin Flammia

Martin Flammia

  • 5,744 Points 5k badge 2x thumb

Hi Jason,

Here are some of the ports that have RX counts:

 port           TX Pause Count      RX Pause Count
--------       ---------------      --------------
 ge.1.7         0                    720
 ge.1.14        0                    82
 ge.1.25        0                    734
 ge.1.26        0                    734
 ge.1.29        0                    26838
 ge.1.45        0                    706
 ge.1.46        0                    228596

So I'll investigate what these are, especially on port ge.1.46.

Out of interest IGMP Snooping has been enabled on the switch and all the user ports, although they don't use multicasting - could that have any bearing?

Thanks

Photo of Jason Parker

Jason Parker, Employee

  • 2,918 Points 2k badge 2x thumb
Run the command show igmpsnooping MFDB (multicast filtering data base) to see what you are running
Note:
If there is no querier on the network then the igmp commands are really not helpful and also some companies that use multicast may have the communication between controller and AP's

But in most cases a sniffer(wireshark is free) can help you isolate any traffic concerns.

The easy way to handle this is to document the numbers on each switch and then select ports not enabled(or connected) and run the command clear port advertise ge.x.x pause on the port

Another option is to set flowcontrol disabled(This does the same thing as clear port advertised pause in a way that the links would be dropped and reconnected).

Jason

 
Photo of James A

James A, Embassador

  • 6,542 Points 5k badge 2x thumb
I've been getting SNMP error timeouts for a while now, and I had a look last week on our S4, and the port our Linux router was connected to was occasionally increasing. That router had a hardware failure and now we're running on the backup router which is connected to a G3 and now I'm seeing nearly constant increases in pause frames. From today
G3(su)->show port flowcontrol tg.3.3
 port           TX Pause Count      RX Pause Count
--------       ---------------      --------------
 tg.3.3         0                    14858653
and about 5 hours later:
G3(su)->show port flowcontrol tg.3.3
 port           TX Pause Count      RX Pause Count
--------       ---------------      --------------
 tg.3.3         0                    17704492
For reference, the routers have 10Gb Myricom cards, and today are pushing 800-1200Mbps.
Photo of Jason Parker

Jason Parker, Employee

  • 2,918 Points 2k badge 2x thumb
Clear port advertise tg.1.3 pause would work but a reminder that the link will drop for a second
Photo of Jason Parker

Jason Parker, Employee

  • 2,918 Points 2k badge 2x thumb
If you call into the GTAC to open a ticket then we can assist you. Just mention this site (HUB) and then ask them to make me a contributor or co-owner
Then I can assist over the phone. This would be needed to document the situation
Thanks
Jason Parker
Photo of James A

James A, Embassador

  • 6,542 Points 5k badge 2x thumb
I've turned off the tx flowcontrol on the network card, and now it's dropping packets (dropped_no_small_buffer is increasing) so it's looking like a problem with the host itself. I do have a ticket open (01047722) but it doesn't look like a problem on the network side. Thanks, James
Photo of Jason Parker

Jason Parker, Employee

  • 2,918 Points 2k badge 2x thumb
So to look at the big picture

If you turn it off at the switch port then we will drop the pause frames but you really have to look at the big picture.
Why is the traffic so hi on the port
Broadcasts, Multicasts are there spanningtree issues?

I would consider either disabling pause on all port and flow control on the switch
Capture traffic via wireshark(free) and either capture an open port or set port mirroring create Srceport destport ane review traffic
Photo of bw447

bw447

  • 906 Points 500 badge 2x thumb
Is there a way to look at the number of paused packets on an Extreme switch? I don't see the command that would show this info.
Thanks!
Photo of Anil Waghmode

Anil Waghmode

  • 60 Points
I also facing same problem, some devices getting this kind of error, i am using Netsight.
Photo of Jason Parker

Jason Parker, Employee

  • 2,918 Points 2k badge 2x thumb
I will try to answer both questions
Questions 1. The Extreme Side is adding or allowing a command that you can see the number of pause frames Extreme Expose per-port flow-control (rx/tx pause & PFC) counters 15.7 or later

Question 2. Port errors would be seen on the C2 etc. as they do not have the command show port flowcontrol

Please let us know if you have any issues by calling the GTAC and opening a ticket. Those tickets have a higher priority but we do try to get back to the HUB to assist when we can
Thank You
Jason