Latency While Pinging Gateway of Vlan's Which is in our Core Switch

  • 0
  • 2
  • Problem
  • Updated 2 years ago
  • Solved
Hello,
We have a BD8810 core switch and few models of Extreme EXOS Edge switches (like X430, X440, X460). All of our servers are connected to our core switch. Clients are connected in our Edge Switches.
Our core switch having multiple VLAN's. Each vlan have own Tag ID. So, required VLAN's are created in our Edge Switches with relevant Tag ID in core switch.
Now, the issue is, From past two days we are facing a issue while pinging the Gateway IP addresses of each VLAN's which are configured in our core.
The issue is, reply time in ms is more than 150ms. It was less than 1ms previously. We don't know where the actual issue is? In configuration or is any loop or any slot problem in core.
Please help us to resolve this issue.
Photo of Thavamani Shanmugam

Thavamani Shanmugam

  • 710 Points 500 badge 2x thumb

Posted 2 years ago

  • 0
  • 2
Photo of Hagemann, Olaf

Hagemann, Olaf, Employee

  • 1,306 Points 1k badge 2x thumb
Did you check the CPU utilization of the core switch?
Photo of Darren Saliva

Darren Saliva

  • 662 Points 500 badge 2x thumb
It could be a lot of things that could be causing the issue. Was there a change made 2 days ago? How are the edge switches connected to the core? What are the port configs on both ends of the connections (ie. port speed and duplex). Type the command "top", what is the output?
Photo of Steve Ballantyne

Steve Ballantyne

  • 5,806 Points 5k badge 2x thumb
I had a similar issue crop up a few weeks ago. It was suggested that I upgrade my firmware. I did that, and after everything was upgraded and rebooted the issue went away for me.

Here is that thread: https://community.extremenetworks.com/extreme/topics/sudden-drop-in-speed-and-response-time-across-all-ssid-and-radios?utm_source=notification&utm_medium=email&utm_campaign=new_comment&utm_content=topic_link
Photo of EtherMAN

EtherMAN, Embassador

  • 7,200 Points 5k badge 2x thumb
Like Olaf said.. Ping response is CPU driven so doing the TOP command should give you insight.  In the past I have seen a few things drive up the CPU on the 8900's we have.  One is mac address churn where our tables had 100k or so mac's and apparently there were enough of the timing out and having to be re-learned it drove the cpu up... Increasing the timeout fixed that.  
Next was snmp queries.  We have multiple systems doing polling for bandwidth, management, port up or down on trunk ports ect and this drove up the cpu.  This has never affected services passing through our systems but it does affect polling and response to pings... 
(Edited)
Photo of Jeremy

Jeremy, Embassador

  • 9,788 Points 5k badge 2x thumb
You can use elrp client with the one-shot command to check for loops on various vlana that you have tagged to the edge.
Photo of Vellachery, Sumeesh

Vellachery, Sumeesh, Employee

  • 3,288 Points 3k badge 2x thumb
Thavamani,

To narrow down the issue, check for the following information:-
- Top ==> to see if there is any particular process is spiking up
- clear l2stats
- Execute "show l2stats" 
- Run the above output for 3-4 times and figure out which vlans have large number of packets going to the cpu.



- Once you figure out the vlans which has large number of packets going to the CPU. Run ELRP on those vlans.
- Type enable elrp-client
    Enables the Extreme Loop Recovery Protocol (ELRP) client (standalone ELRP) globally.
- Type configure elrp-client one-shot <vlan_name> ports all print-and-log
    Starts one-time, non-periodic ELRP packet transmission on the specified ports of the VLAN.

If any layer 2 loop is detected it would be printed in the logs, check for the physical connections.
Photo of Vellachery, Sumeesh

Vellachery, Sumeesh, Employee

  • 3,288 Points 3k badge 2x thumb
Thavamani,
 
What is the EXOS version running on the BD8K switch?

The below-mnetioned articles should be also helpful for you:-

https://gtacknowledge.extremenetworks.com/articles/Q_A/Which-commands-can-be-used-to-detect-a-loop

https://gtacknowledge.extremenetworks.com/articles/How_To/How-to-configure-ELRP-to-disable-ports 
Photo of Thavamani Shanmugam

Thavamani Shanmugam

  • 710 Points 500 badge 2x thumb
Sir,

Thanks for your reply. Please check the below image showing output of top command.
Photo of Vellachery, Sumeesh

Vellachery, Sumeesh, Employee

  • 3,288 Points 3k badge 2x thumb
Thavamani,

The "top" indicates bcmRX process is spiking. The bcmRX process handles traffic that is being received by the switches CPU.


In most situations this indicates a possible loop in the network. Please follow the steps which i have suggested earlier.
Photo of Roy Noh

Roy Noh, Employee

  • 1,182 Points 1k badge 2x thumb
Most of cases I've seen was about looping but there was an interesting case which might be helpful for you.
One of my customer has a syslog server attached on a dist switch and the server was dead for some reason.
There were still lots of syslog packets coming from other servers and it hits the switch CPU.
It is because that ARP and FDB was not resolved on the switch so the packets were handled as slow path traffic.
The issue was gone after they replaced the syslog server.
We checked it by tcpdump in debug mode.
Photo of Thavamani Shanmugam

Thavamani Shanmugam

  • 710 Points 500 badge 2x thumb
Sir,
Sorry for the delayed response. Still we are facing the same issue.
We tried to identify if loop persist in the network by using elrp client in all the vlans. But, no loop detected.
But, the CPU utilization is very high. Around 80 numbers of Edge switches are connected in our core switch. How to check which Edge Switch cause High CPU utilization in core switch. Please help us to resolve this issue.
Photo of Prashanth KG

Prashanth KG, Employee

  • 5,300 Points 5k badge 2x thumb
Hi,

Considering that you have real time impact to the production environment, it would be better if you could open a case with GTAC so that we can assist you diagnose this through a remote session.

Looking at the output of top, the process fdb is also consuming a considerable amount of CPU.
So, we could suspect mac moves on the switch.

Please refer the article below for configuring the mac-tracking.

https://gtacknowledge.extremenetworks.com/articles/How_To/How-to-configure-MAC-tracking-in-EXOS

After configuring the mentioned commands. collect the output of " show log ".
If there is any mac move, it will be displayed in the log.

Hope this helps!
Photo of Thavamani Shanmugam

Thavamani Shanmugam

  • 710 Points 500 badge 2x thumb
Sir,

Thanks for the response.

Please check the below image which showing the Log of the core switch.

This logs showing Dos protect packet Exceeded logs repeatedly. How to identify where this traffic originates from.
Is there any option to identify where this traffic originates from?

Please help us to resolve this issue.
Photo of Prashanth KG

Prashanth KG, Employee

  • 5,300 Points 5k badge 2x thumb
Hi,

The article below explains the dos-protect messages:

https://gtacknowledge.extremenetworks.com/articles/Q_A/DOS-protect-log-message 

I see that the threshold is set to a very minimal value of 150. Usually if the traffic has a pattern, (i.e from a specific source  or to a specific destination), the same will be displayed in the log.

Only notify threshold log messages are seen anyways. If the alert threshold is reached, it could display the traffic pattern or it could say "No traffic pattern found".

if you need help with the packet capture, open a case with GTAC.

https://gtacknowledge.extremenetworks.com/articles/How_To/How-to-contact-Extreme-Networks-Global-Technical-Assistance-Center-GTAC
Hope this helps!
Photo of Thavamani Shanmugam

Thavamani Shanmugam

  • 710 Points 500 badge 2x thumb
Sir,
Thanks for the reply.
We ran the packet capture and export it to a TFTP Server. How to identify where is high the traffic originates from by using this captured data. We tried to open the captured file with wireshark. But we are not able identify the source which is sending more number of packets to CPU.

Thanks in advance.
(Edited)
Photo of Thavamani Shanmugam

Thavamani Shanmugam

  • 710 Points 500 badge 2x thumb
Sir,
We have analyze the Packet capture file with WireShark. From the analysis, we found that, Most of the entries shows with MDNS protocol along with the IP address 224.0.0.251 and 224.0.0.252. All these packets comes from the WiFi networks. Is this the source for high CPU utilization of our core switch?
Photo of Prashanth KG

Prashanth KG, Employee

  • 5,300 Points 5k badge 2x thumb
Photo of Thavamani Shanmugam

Thavamani Shanmugam

  • 710 Points 500 badge 2x thumb
Sir,
Thanks. We are Planning to add ACL entry to block mDNS Packets. Where this ACL should be added? In Core switch? or all the edge switches? This will be very helpful to fix our issue.
Photo of Michal Rz

Michal Rz

  • 742 Points 500 badge 2x thumb
Have you fix your problem Thavamani by blocking mDNS?