SNMP Error Timeout

Userlevel 5
Have a customer that has two different sites with a couple S8's at the core of one and S4's at the core of another.

The edge consists of multiple stacks of C5's which the top switch being a C5K for 10gb.

There seems to be a couple, out of the manystacks that persistently give the following error:

SNMP Contact Lost: No SNMP reply from device caused by SNMP Error: Timeout[4098], last uptime was 21 Days 23:14:26.4

When Netsight looses SNMP contact to the switch you can still ping and SSH to it, and no other adverse affect seems to be happening to the switch then what seems random lose in polling.

When an event briefly happened I was able to get onto the device and tried to ping the IP address of the Netsight server, which failed! I wasn't able to do anymore testing before it came back online.

The uplink ports consist of one 1 x 10gb and 2 x 1GB as a lag. MSTP is configured with the data vlan using the 10Gb and the lag used for the voice vlan.

Have looked at spanning, and there has been no topology changes, the ports have remained continually up and show no errors in rmon stats.

Not sure if anyone has seen this before and can provide any suggestions?

Many thanks

13 replies

Userlevel 4
Take a look at the SecureStacks and type
show port flowcontrol
If you receive any pause frames you need to look at the end stations sending them as there may be an issue.
If you do not have the time to look at this then I would(Warning #1: the command will drop the link and renegotiate the options/speed duplex, and bring the link back to life) run the command, clear port advertise port# pause
If you select to run the command on the uplink port(s)
Warning #2- This would cause a topology change -thus clearing the FDB(filtering data base) of all local switches so please take note.
Good Luck
Userlevel 5
Just to let you know it seems you where right on the money there, the 10Gb link between the core and the edge is experiencing problems:

show port flowcontrol tg.3.2

Port TX Admin TX Oper RX Admin RX Oper TX Pause Count RX Pause Count
------------ -------- -------- -------- -------- -------------- --------------
tg.3.2 enabled enabled enabled enabled 0 4597790

KGH_SDP5-2_US1(su)->show port flowcontrol tg.1.49
port TX Pause Count RX Pause Count
-------- --------------- --------------
tg.1.49 31303 0

Now I just have to work out why.

Many thanks for your help.
Userlevel 4
Please note that this could be as simple as Multicast traffic flooding on the network and some PC's do not take well to dealing with traffic they are dropping

Here are a few sets to take
1. See what ports have high Rx's
2. Duplex/speed issue -printers and low end machines
3. If daisy chained then maybe the traffic s coming from another switch and they have a bad client(Blue screen of death.

Good Luck
Userlevel 5
Hi Jason,

Here are some of the ports that have RX counts:

port TX Pause Count RX Pause Count
-------- --------------- --------------
ge.1.7 0 720
ge.1.14 0 82
ge.1.25 0 734
ge.1.26 0 734
ge.1.29 0 26838
ge.1.45 0 706
ge.1.46 0 228596

So I'll investigate what these are, especially on port ge.1.46.

Out of interest IGMP Snooping has been enabled on the switch and all the user ports, although they don't use multicasting - could that have any bearing?

Userlevel 4
Run the command show igmpsnooping MFDB (multicast filtering data base) to see what you are running
If there is no querier on the network then the igmp commands are really not helpful and also some companies that use multicast may have the communication between controller and AP's

But in most cases a sniffer(wireshark is free) can help you isolate any traffic concerns.

The easy way to handle this is to document the numbers on each switch and then select ports not enabled(or connected) and run the command clear port advertise ge.x.x pause on the port

Another option is to set flowcontrol disabled(This does the same thing as clear port advertised pause in a way that the links would be dropped and reconnected).

I've been getting SNMP error timeouts for a while now, and I had a look last week on our S4, and the port our Linux router was connected to was occasionally increasing. That router had a hardware failure and now we're running on the backup router which is connected to a G3 and now I'm seeing nearly constant increases in pause frames. From today
G3(su)->show port flowcontrol tg.3.3[/code] port TX Pause Count RX Pause Count[/code]-------- --------------- --------------[/code] tg.3.3 0 14858653[/code]and about 5 hours later:
G3(su)->show port flowcontrol tg.3.3[/code] port TX Pause Count RX Pause Count[/code]-------- --------------- --------------[/code] tg.3.3 0 17704492[/code]For reference, the routers have 10Gb Myricom cards, and today are pushing 800-1200Mbps.
Userlevel 4
Clear port advertise tg.1.3 pause would work but a reminder that the link will drop for a second
Userlevel 4
If you call into the GTAC to open a ticket then we can assist you. Just mention this site (HUB) and then ask them to make me a contributor or co-owner
Then I can assist over the phone. This would be needed to document the situation
Jason Parker
I've turned off the tx flowcontrol on the network card, and now it's dropping packets (dropped_no_small_buffer is increasing) so it's looking like a problem with the host itself. I do have a ticket open (01047722) but it doesn't look like a problem on the network side. Thanks, James
Userlevel 4
So to look at the big picture

If you turn it off at the switch port then we will drop the pause frames but you really have to look at the big picture.
Why is the traffic so hi on the port
Broadcasts, Multicasts are there spanningtree issues?

I would consider either disabling pause on all port and flow control on the switch
Capture traffic via wireshark(free) and either capture an open port or set port mirroring create Srceport destport ane review traffic
Userlevel 2
Is there a way to look at the number of paused packets on an Extreme switch? I don't see the command that would show this info. Thanks!
I also facing same problem, some devices getting this kind of error, i am using Netsight.
Userlevel 4
Anil Waghmode wrote:

I also facing same problem, some devices getting this kind of error, i am using Netsight.

I will try to answer both questions
Questions 1. The Extreme Side is adding or allowing a command that you can see the number of pause frames Extreme Expose per-port flow-control (rx/tx pause & PFC) counters 15.7 or later

Question 2. Port errors would be seen on the C2 etc. as they do not have the command show port flowcontrol

Please let us know if you have any issues by calling the GTAC and opening a ticket. Those tickets have a higher priority but we do try to get back to the HUB to assist when we can
Thank You