Link Problems, Flapping, Flood Rate Limit Activated, clear eee stats Feature unavailable?

  • 0
  • 1
  • Problem
  • Updated 3 years ago
  • Solved

Advise on issues being experienced and features I could enable .i.e link flap etc The manifestation of the problems seem to be the stack reboots. Also recently introduced an ACL that blocks all MDNS traffic as a lot of this type of traffic was hitting the CPU and possibly causing problems - results are that we are still monitoring.

If you take a look at the log below we are experiencing a catalogue of link problems. There seems to be ports that are flapping, Flood Rate Limiting being Activated and 'could not clear eee stats Feature unavailable'

The flood rate limiting is configured as such:

config port <edge-ports> rate-limit flood broadcast 30

config port <uplink-ports> rate-limit flood broadcast 300

Not sure on EXOS's setup for Link Flap and what the best rate would be for it, and not sure what the clear eee stats error is?

The switch is part of a stack, the details are given below.

There have been various problems with ports in relation to a lot of drop packets and flow control pauses.

There are lots of things we can do and check like auto negotiation, auto polarity, flow control, cabling, traffic patterns etc

A lot of the drop packets are happening on the first switch in the stack, of which my thoughts are (and I know how ridiculous this is) but there is only one single 1GB link back to the core - hence why a lot of drops might be manifesting on the same master switch, probably why there are pauses and possible attributed to part of the problem.

Interested in what any advise anyone might have in case I'm missing anything.

Many thanks in advance.

-------------------------------------------------------------------------------------------------------------------------------

Port Congestion Monitor
Port      Link      Packet
          State     Drop
================================================================================
1:26      A         4591892
1:27      A         4591491
1:29      A         1862
1:32      A         4591992
1:36      A         4591896
1:38      A         4591887
1:39      A         4591932
1:41      A         4591935 (17.09%)

Port      Link     Tx Pkt       Tx Byte       Rx Pkt       Rx Byte Rx Pkt Rx Pkt
          State    Count        Count         Count        Count    Bcast  Mcast
================================================================================
1:26      A      26848222  17874875011         3808       358537        1        0
1:27      A      26818272  17864049115        25786      5126611    22346        0
1:29      A      32150904  24780183184       703100    190815959    11314    11823
1:32      A      26899680  17944102409        42204      3840482      227     3057
1:36      A      26919035  17890895923        66028     13053687        8        0
1:38      A      26876334  17906031574        17479      1510846      239     3028
1:39      A      27002777  17910602583       143252     30106649       33        0
1:41      A      26870551  17881547943        10248       701332        0        0

1:26                  (0002)              E     A     100   FULL
1:27                  (0002)              E     A     100   FULL
1:36                  (0002)              E     A     100   FULL
1:38                  (0002)              E     A     100   FULL
1:39                  (0002)              E     A     100   FULL
1:41                  (0002)              E     A     100   FULL

Flow Control Frames Received
Port     Pause    PFC0    PFC1    PFC2    PFC3    PFC4    PFC5    PFC6    PFC7
          Rcvs    Rcvs    Rcvs    Rcvs    Rcvs    Rcvs    Rcvs    Rcvs    Rcvs
==============================================================================
1:2         30       -       -       -       -       -       -       -       -
1:35        44       -       -       -       -       -       -       -       -
2:6        214       -       -       -       -       -       -       -       -
2:16       105       -       -       -       -       -       -       -       -
2:42       818       -       -       -       -       -       -       -       -
3:10       811       -       -       -       -       -       -       -       -
3:14       989       -       -       -       -       -       -       -       -
3:22        97       -       -       -       -       -       -       -       -
3:48       232       -       -       -       -       -       -       -       -
4:39      1432       -       -       -       -       -       -       -       -

Port Summary
Port  Display         VLAN Name          Port  Link  Speed  Duplex
#     String          (or # VLANs)       State State Actual Actual
==================================================================
1:2                   (0002)              E     A     1000  FULL
1:35                  (0002)              E     R
2:6                   (0002)              E     A     1000  FULL
2:16                  (0002)              E     R
2:42                  (0002)              E     R
3:10                  (0002)              E     R
3:14                  (0002)              E     R
3:22                  (0002)              E     R
3:48                  (0002)              E     A     1000  FULL
4:39                  (0002)              E     A     100   FULL


Flow Control Frames Transmitted
Port     Pause    PFC0    PFC1    PFC2    PFC3    PFC4    PFC5    PFC6    PFC7
          Xmts    Xmts    Xmts    Xmts    Xmts    Xmts    Xmts    Xmts    Xmts
==============================================================================
1:34         6       -       -       -       -       -       -       -       -
CoreUp> 299841       -       -       -       -       -       -       -       -


System Type:      X440-48p (Stack)

SysHealth check:  Enabled (Normal)
Recovery Mode:    All
System Watchdog:  Enabled

Current Time:     Sat Sep 12 23:39:09 2015
Timezone:         [Auto DST Disabled] GMT Offset: 0 minutes, name is UTC.
Boot Time:        Wed Aug 26 17:04:05 2015
Boot Count:       161
Next Reboot:      None scheduled
System UpTime:    17 days 6 hours 35 minutes 4 seconds

Slot:             Slot-1 *                     Slot-2
                  ------------------------     ------------------------
Current State:    MASTER                       BACKUP (In Sync)

Image Selected:   secondary                    secondary
Image Booted:     secondary                    secondary
Primary ver:      15.3.1.4                     15.3.1.4
Secondary ver:    15.5.4.2                     15.5.4.2
                  patch1-5                     patch1-5

Config Selected:  primary.cfg
Config Booted:    primary.cfg

primary.cfg       Created by ExtremeXOS version 15.5.4.2
                  2255898 bytes saved on Sat Sep 12 16:46:35 2015

-----------------------------------------------------------------------------------------------------------------------

Show log:

09/12/2015 23:26:08.64 <Info:vlan.msgs.portLinkStateUp> Slot-1: Port 4:23 link UP at speed 100 Mbps and full-duplex
09/12/2015 23:26:06.42 <Info:vlan.msgs.portLinkStateDown> Slot-1: Port 4:23 link down
09/12/2015 23:23:58.35 <Info:AAA.logout> Slot-1: User admin logout from ssh (172.17.4.164)
09/12/2015 23:19:42.39 <Info:HAL.Port.RateLimit> Slot-1: Flood Rate Limiting activated on Port 3:44
09/12/2015 23:16:10.44 <Info:vlan.msgs.portLinkStateUp> Slot-1: Port 4:17 link UP at speed 1 Gbps and full-duplex
09/12/2015 23:16:07.95 <Info:vlan.msgs.portLinkStateDown> Slot-1: Port 4:17 link down
09/12/2015 23:16:06.99 <Info:vlan.msgs.portLinkStateUp> Slot-1: Port 4:17 link UP at speed 10 Mbps and half-duplex
09/12/2015 23:16:04.72 <Info:vlan.msgs.portLinkStateDown> Slot-1: Port 4:17 link down
09/12/2015 23:09:39.89 <Info:HAL.Port.RateLimit> Slot-1: Flood Rate Limiting activated on Port 3:36
09/12/2015 23:09:39.89 <Info:HAL.Port.RateLimit> Slot-1: Flood Rate Limiting activated on Port 2:4
09/12/2015 23:04:38.64 <Info:HAL.Port.RateLimit> Slot-1: Flood Rate Limiting activated on Port 4:10
09/12/2015 23:04:38.64 <Info:HAL.Port.RateLimit> Slot-1: Flood Rate Limiting activated on Port 3:28
09/12/2015 23:02:05.38 <Info:vlan.msgs.portLinkStateUp> Slot-1: Port 3:28 link UP at speed 1 Gbps and full-duplex
09/12/2015 23:02:02.68 <Info:vlan.msgs.portLinkStateUp> Slot-1: Port 4:17 link UP at speed 1 Gbps and full-duplex
09/12/2015 23:02:01.17 <Info:vlan.msgs.portLinkStateDown> Slot-1: Port 3:28 link down
09/12/2015 23:02:00.20 <Info:vlan.msgs.portLinkStateDown> Slot-1: Port 4:17 link down
09/12/2015 23:01:59.25 <Info:vlan.msgs.portLinkStateUp> Slot-1: Port 4:17 link UP at speed 10 Mbps and half-duplex
09/12/2015 23:01:56.55 <Info:vlan.msgs.portLinkStateDown> Slot-1: Port 4:17 link down
09/12/2015 22:59:37.35 <Info:HAL.Port.RateLimit> Slot-1: Flood Rate Limiting activated on Port 4:16
09/12/2015 22:59:37.35 <Info:HAL.Port.RateLimit> Slot-1: Flood Rate Limiting activated on Port 2:13
09/12/2015 22:59:37.35 <Info:HAL.Port.RateLimit> Slot-1: Flood Rate Limiting activated on Port 1:48
09/12/2015 22:58:05.83 <Info:vlan.msgs.portLinkStateUp> Slot-1: Port 4:16 link UP at speed 1 Gbps and full-duplex
09/12/2015 22:58:01.51 <Info:vlan.msgs.portLinkStateDown> Slot-1: Port 4:16 link down
09/12/2015 22:57:18.26 <Erro:HAL.Port.Error> Slot-1: could not clear eee stats Feature unavailable for slot 3: port 46

09/12/2015 22:57:17.61 <Erro:HAL.Port.Error> Slot-1: could not clear eee stats Feature unavailable for slot 2: port 46

09/12/2015 22:57:16.02 <Noti:DM.Notice> Slot-1: Clearing all counters
09/12/2015 22:57:15.99 <Info:cli.logRemoteCmd> Slot-1: :: (ssh) admin: clear counters
09/12/2015 22:54:20.29 <Info:AAA.LogSsh> Slot-1: Msg from Master : Did password authentication for user admin (172.17.4.164)
09/12/2015 22:54:20.27 <Info:AAA.authPass> Slot-1: Login passed for user admin through ssh (172.17.4.164)
09/12/2015 22:47:54.71 <Info:vlan.msgs.portLinkStateUp> Slot-1: Port 4:16 link UP at speed 100 Mbps and full-duplex
09/12/2015 22:47:49.20 <Info:vlan.msgs.portLinkStateDown> Slot-1: Port 4:16 link down
09/12/2015 22:42:42.96 <Info:vlan.msgs.portLinkStateUp> Slot-1: Port 4:17 link UP at speed 1 Gbps and full-duplex
09/12/2015 22:42:40.54 <Info:vlan.msgs.portLinkStateDown> Slot-1: Port 4:17 link down
09/12/2015 22:42:37.15 <Info:vlan.msgs.portLinkStateUp> Slot-1: Port 4:17 link UP at speed 10 Mbps and half-duplex
09/12/2015 22:42:34.80 <Info:vlan.msgs.portLinkStateDown> Slot-1: Port 4:17 link down
09/12/2015 22:31:55.08 <Info:vlan.msgs.portLinkStateUp> Slot-1: Port 4:17 link UP at speed 1 Gbps and full-duplex


Photo of Martin Flammia

Martin Flammia

  • 6,326 Points 5k badge 2x thumb

Posted 3 years ago

  • 0
  • 1
Photo of Martin Flammia

Martin Flammia

  • 6,326 Points 5k badge 2x thumb

Just as an update I noticed the switch had qosprofiles 2-6 configured, although only QP6 was being used. This therefore had port buffer space being reserved unnecessarily. After removing the unused qosprofiles the packet drops seem to have got a lot better.

Still a lot to do and appreciate any feedback on the above. Thanks

Photo of Grosjean, Stephane

Grosjean, Stephane, Employee

  • 13,676 Points 10k badge 2x thumb
Hi,

Do you have Flow Control enabled? If yes, how and why?

From the logs, port 4:17 is appearing very often, and keeps flapping between 10Mbps HD and 1G FD. What's behind it? 10Mbps HD may be the result of wrong autoneg (usually only one side has it configured).

As for the flood-rate, I believe 30 / 300 are very low values. The HW is not based on a 1s monitoring, but works at a much more higher frequency, and it extrapolates your config values into some other values (I don't remember exactly off the top of my head the exact math behind it, see with GTAC for a deeper analysis and explanation). A small burst can trigger the threshold even though on a 1s basis the traffic is under your setting. If you need some accurate threshold I would recommend using ACL with meters instead of the flood-rate knob.

My 2cents.
Photo of Prashanth KG

Prashanth KG, Employee

  • 5,300 Points 5k badge 2x thumb
Hi Martin,

As Stephane asked, we need to know if flow-control is necessary for your environment. Because, when pause frames are received by the switch, it will stop transmitting the packets for the specified time frame. During this time, the egress buffer of the port would be full and it could result in the packet drops which you are noticing on the ports. Similarly, with tx-pause frames enabled on the switch port, the switch could send out the pause frames to the connected switch. 

Also, please check if you have the eee support enabled on the ports. 

command: 
configure port <> eee on

This command is to enable EEE on the switch. Specifies that the port advertises to its linkpartner that it is EEE capable at certain speeds. If both sides, during auto-negotiation, determine that
they both have EEE on and are compatible speed wise, they will determine other parameters (how long
it takes to come out of sleep time, how long it takes to wake up) and the link comes up. During periods
of non-activity, the link will shut down parts of the port to save energy. This is called LPI for low power idle. When one side sees it must send something, it wakes up the remote and then transmits.
You could find more details in the command reference guide! 
The below error could be because of the fact that ports 3:46 and 2:46 do not support eee feature.

09/12/2015 22:57:18.26 <Erro:HAL.Port.Error> Slot-1: could not clear eee stats Feature unavailable for slot 3: port 46

09/12/2015 22:57:17.61 <Erro:HAL.Port.Error> Slot-1: could not clear eee stats Feature unavailable for slot 2: port 46 

Please let us know what switches are slots 2 and 3. And if port 46 is a copper of a fiber port. 

Hope this helps! 
Photo of Martin Flammia

Martin Flammia

  • 6,326 Points 5k badge 2x thumb

Thanks for posting back.

Regarding eee this is not configured on this switch. Port 3:46 is currently not active and 2:46 shows a Dell device attached, and it a copper port:

Port:   2:46
        Virtual-router: VR-Default
        Type:           UTP
        Redundant Type: NONE
        Random Early drop:      Unsupported
        Admin state:    Enabled
        Copper Medium Configuration:     auto-speed sensing  auto-duplex auto-polarity on
        Fiber Medium Configuration:      auto-speed sensing  auto-duplex
        Link State:     Active, 1Gbps, full-duplex
        Link Ups:       2        Last: Mon Sep 14 06:37:07 2015
        Link Downs:     1        Last: Mon Sep 14 06:37:04 2015

With regards to broadcast control what would your recommendation be, either as a value for 100/1gb links, and if to be done by ACL metering what the best default value / configuration might be?

Is there anything in EXOS that is equivalent to configuring Link Flap as have been struggling to find anything?

We are still currently experiencing packet loss, although this is only on ports on the first switch / master switch in the stack which consists of 4 x 440x's, also interesting is the values are all pretty much the same?

1:26      A         548181
1:27      A         547201
1:32      A         548213
1:36      A         548182
1:38      A         548182
1:39      A         548204
1:41      A         548206

These ports are all 100mb and possibly connected to IP Phones, maybe there is a PC piggy backing them (need to investigate), negotiation issue, contention issue with 1gb PC port etc. The master switch seems to be having congestion problems as a whole if you look at the output below, yet port utilisation is next to nil.

Stack 1.19 # show stacking
Stack Topology is a Ring
Active Topology is a Ring
Node MAC Address    Slot  Stack State  Role     Flags
------------------  ----  -----------  -------  ---
*00:04:96:82:46:c1  1     Active       Master   CA-
 00:04:96:82:46:ec  2     Active       Backup   CA-
 00:04:96:82:10:07  3     Active       Standby  CA-
 00:04:96:82:44:34  4     Active       Standby  CA-

Stack 1.24 # show stacking configuration
Stack MAC in use: 02:04:96:82:46:c1
Node               Slot         Alternate          Alternate
MAC Address        Cfg Cur Prio Mgmt IP / Mask     Gateway         Flags     Lic
------------------ --- --- ---- ------------------ --------------- --------- ---
*00:04:96:82:46:c1 1   1   50   <none>             <none>          CcEeMm-Nn --
 00:04:96:82:46:ec 2   2   45   <none>             <none>          CcEeMm-Nn --
 00:04:96:82:10:07 3   3   1    <none>             <none>          --EeMm-Nn --
 00:04:96:82:44:34 4   4   Auto <none>             <none>          --EeMm-Nn --

Stack 1.18 # debug hal show congestion
Congestion information for slot 1 type X440-48p since last query
  Switch fabric congestion present: 2559080

Congestion information for slot 2 type X440-48p since last query
  No switch fabric and CPU congestion present

Congestion information for slot 3 type X440-48p since last query
  No switch fabric and CPU congestion present

Congestion information for slot 4 type X440-48p since last query
  Switch fabric congestion present: 504

Before correcting the QoSprofiles it also seems that the CPU was experiencing congestion too:

CPU congestion present: 237

Here is the port information for 1:26 that is experiencing packet loss:


Port:   1:26
        Virtual-router: VR-Default
        Type:           UTP
        Random Early drop:      Unsupported
        Admin state:    Enabled with  auto-speed sensing  auto-duplex
        Link State:     Active, 100Mbps, full-duplex
        Link Ups:       0        Last: Sun Sep 13 10:13:19 2015
        Link Downs:     0        Last: Sun Sep 13 10:13:10 2015

        VLAN cfg:
                 Name: HH-Data, Internal Tag = 100, MAC-limit = No-limit, Virtual router:   VR-Default
                 Name: HH-Voice, 802.1Q Tag = 200, MAC-limit = No-limit, Virtual router:   VR-Default
                       Port-specific VLAN ID:  200
        STP cfg:
                s0(enable), Tag=(none), Mode=802.1D, State=FORWARDING
                s1(enable), Tag=(none), Mode=802.1D, State=FORWARDING

        Protocol:
                 Name: HH-Data      Protocol: ANY      Match all protocols.
        Trunking:       Load sharing is not enabled.

        EDP:            Enabled

        EEE:            Disabled
        ELSM:           Disabled
        Ethernet OAM:           Disabled
        Learning:       Enabled
        Unicast Flooding:       Enabled
        Multicast Flooding:     Enabled
        Broadcast Flooding:     Enabled
        Jumbo:          Disabled
        Flow Control:   Rx-Pause: Enabled       Tx-Pause: Enabled
        Priority Flow Control: Disabled
        Reflective Relay:       Disabled
        Link up/down SNMP trap filter setting:  Enabled
        Egress Port Rate:       No-limit
        Broadcast Rate:         30 packets-per-second
        Multicast Rate:         No-limit
        Unknown Dest Mac Rate:  No-limit
        QoS Profile:    None configured
        Ingress Rate Shaping :          Unsupported
        Ingress IPTOS Examination:      Enabled
        Ingress 802.1p Examination:     Enabled
        Ingress 802.1p Inner Exam:      Disabled
        Egress IPTOS Replacement:       Disabled
        Egress 802.1p Replacement:      Disabled
        NetLogin:                       Disabled
        NetLogin port mode:             Port based VLANs
        Smart redundancy:               Enabled
        Software redundant port:        Disabled
        IPFIX:   Disabled               Metering:  Ingress, All Packets, All Traffic
                IPv4 Flow Key Mask:     SIP: 255.255.255.255            DIP: 255.255.255.255
                IPv6 Flow Key Mask:     SIP: ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff
                                        DIP: ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff

        auto-polarity:                  Enabled
        Shared packet buffer:           100%
        VMAN CEP egress filtering:      Disabled
        Isolation:                      Off
        PTP Configured:                 Disabled
        Time-Stamping Mode:             None
        Synchronous Ethernet:           Unsupported
        Dynamic VLAN Uplink:            Disabled
        VM Tracking Dynamic VLANs:      Disabled

As for flow control RX and TX has been configured for ALL ports, not sure why and in my view doesn't seem necessary on this site, but interested on your view of when it should and shouldn't be enabled in either direction.

Here is the output of port utilisation:

 show port utilization bandwidth
Port     Link    Link   Rx             Peak Rx       Tx            Peak Tx
         State   Speed  % bandwidth    % bandwidth   % bandwidth   % bandwidth
================================================================================
1:1       A       100       0.00          0.00          0.40            0.40
1:2       R       0         0.00          0.00          0.00            0.00
1:3       A       1000      0.00          0.00          0.00            0.00
1:4       R       0         0.00          0.00          0.00            0.00
1:5       A       100       0.00          0.00          0.40            0.40
1:6       A       1000      0.00          0.00          0.00            0.00
1:7       A       100       0.00          0.00          0.40            0.40
1:8       R       0         0.00          0.00          0.00            0.00
1:9       R       0         0.00          0.00          0.00            0.00
1:10      A       1000      0.00          0.00          0.00            0.00
1:11      A       1000      0.00          0.00          0.00            0.00
1:12      A       1000      0.00          0.00          0.00            0.00
1:13      A       100       0.00          0.00          0.40            0.40
1:14      A       1000      0.00          0.00          0.00            0.00
1:15      A       1000      0.00          0.00          0.00            0.00
1:16      R       0         0.00          0.00          0.00            0.00
1:17      R       0         0.00          0.00          0.00            0.00
1:18      A       1000      0.00          0.00          0.00            0.00
1:19      A       1000      0.00          0.00          0.00            0.00
1:20      R       0         0.00          0.00          0.00            0.00
1:21      A       100       0.00          0.00          0.40            0.40
1:22      R       0         0.00          0.00          0.00            0.00
1:23      R       0         0.00          0.00          0.00            0.00
1:24      A       10        0.00          0.00          4.02            4.02
1:25      A       1000      0.00          0.00          0.00            0.00
1:26      A       100       0.00          0.00          0.43            0.43
1:27      A       100       0.00          0.00          0.43            0.43
1:28      A       1000      0.00          0.00          0.00            0.00
1:29      R       0         0.00          0.00          0.00            0.00
1:30      R       0         0.00          0.00          0.00            0.00
1:31      A       1000      0.00          0.00          0.00            0.00
1:32      A       100       0.00          0.00          0.43            0.43
1:33      A       1000      0.00          0.00          0.00            0.00
1:34      A       1000      0.00          0.00          0.05            0.05
1:35      R       0         0.00          0.00          0.00            0.00
1:36      A       100       0.00          0.00          0.43            0.43
1:37      R       0         0.00          0.00          0.00            0.00
1:38      A       100       0.00          0.00          0.43            0.43
1:39      A       100       0.00          0.00          0.43            0.43
1:40      A       100       0.00          0.00          4.92            4.92
1:41      A       100       0.00          0.00          0.43            0.43
1:42      R       0         0.00          0.00          0.00            0.00
1:43      A       1000      0.00          0.00          0.05            0.05
1:44      R       0         0.00          0.00          0.00            0.00
1:45      R       0         0.00          0.00          0.00            0.00
1:46      R       0         0.00          0.00          0.00            0.00
1:47      R       0         0.00          0.00          0.00            0.00
CoreUpli> A       1000      1.39          1.39          0.02            0.02
2:1       A       100       0.00          0.00          0.40            0.40
2:2       R       0         0.00          0.00          0.00            0.00
2:3       A       100       0.00          0.00          0.40            0.40
2:4       A       1000      0.00          0.00          0.04            0.04
2:5       A       100       0.00          0.00          0.40            0.40
2:6       A       100       0.01          0.01          0.41            0.41
2:7       R       0         0.00          0.00          0.00            0.00
2:8       R       0         0.00          0.00          0.00            0.00
2:9       R       0         0.00          0.00          0.00            0.00
2:10      R       0         0.00          0.00          0.00            0.00
2:11      R       0         0.00          0.00          0.00            0.00
2:12      R       0         0.00          0.00          0.00            0.00
2:13      A       1000      0.01          0.01          0.41            0.41
2:14      A       100       0.00          0.00          0.40            0.40
2:15      A       100       0.00          0.00          0.40            0.40
2:16      A       1000      0.00          0.00          0.00            0.00
2:17      R       0         0.00          0.00          0.00            0.00
2:18      R       0         0.00          0.00          0.00            0.00
2:19      A       100       0.00          0.00          4.88            4.88
2:20      A       100       0.00          0.00          3.42            3.42
2:21      R       0         0.00          0.00          0.00            0.00
2:22      R       0         0.00          0.00          0.00            0.00
2:23      A       100       0.00          0.00          0.40            0.40
2:24      R       0         0.00          0.00          0.00            0.00
2:25      R       0         0.00          0.00          0.00            0.00
2:26      R       0         0.00          0.00          0.00            0.00
2:27      A       1000      0.00          0.00          4.02            4.02
2:28      R       0         0.00          0.00          0.00            0.00
2:29      R       0         0.00          0.00          0.00            0.00
2:30      R       0         0.00          0.00          0.00            0.00
2:31      A       100       0.00          0.00          0.40            0.40
2:32      A       100       0.00          0.00          0.40            0.40
2:33      R       0         0.00          0.00          0.00            0.00
2:34      R       0         0.00          0.00          0.00            0.00
2:35      A       100       0.00          0.00          0.40            0.40
2:36      R       0         0.00          0.00          0.00            0.00
2:37      R       0         0.00          0.00          0.00            0.00
2:38      A       100       0.00          0.00          0.40            0.40
2:39      A       1000      0.00          0.00          0.04            0.04
2:40      R       0         0.00          0.00          0.00            0.00
2:41      A       100       0.00          0.00          0.40            0.40
2:42      R       0         0.00          0.00          0.00            0.00
2:43      A       100       0.00          0.00          0.40            0.40
2:44      A       1000      0.00          0.00          0.04            0.04
2:45      A       100       0.00          0.00          0.40            0.40
2:46      A       1000      0.00          0.00          0.00            0.00
2:47      R       0         0.00          0.00          0.00            0.00
CoreUpli> R       0         0.00          0.00          0.00            0.00
3:1       A       100       0.00          0.00          0.40            0.40
ACU       A       100       0.00          0.00          0.40            0.40
3:3       R       0         0.00          0.00          0.00            0.00
ACU       A       1000      0.00          0.00          0.00            0.00
3:5       A       100       0.00          0.00          0.40            0.40
3:6       R       0         0.00          0.00          0.00            0.00
3:7       R       0         0.00          0.00          0.00            0.00
3:8       R       0         0.00          0.00          0.00            0.00
3:9       A       100       0.00          0.00          0.40            0.40
3:10      A       1000      0.00          0.00          0.04            0.04
3:11      R       0         0.00          0.00          0.00            0.00
3:12      A       100       0.00          0.00          0.00            0.00
3:13      R       0         0.00          0.00          0.00            0.00
3:14      A       1000      0.00          0.00          0.04            0.04
3:15      A       1000      0.00          0.00          0.00            0.00
3:16      A       100       0.00          0.00          0.00            0.00
3:17      R       0         0.00          0.00          0.00            0.00
3:18      R       0         0.00          0.00          0.00            0.00
3:19      R       0         0.00          0.00          0.00            0.00
3:20      R       0         0.00          0.00          0.00            0.00
3:21      A       100       0.00          0.00          0.40            0.40
3:22      R       0         0.00          0.00          0.00            0.00
3:23      R       0         0.00          0.00          0.00            0.00
3:24      R       0         0.00          0.00          0.00            0.00
3:25      R       0         0.00          0.00          0.00            0.00
3:26      A       100       0.00          0.00          4.88            4.88
3:27      A       100       0.00          0.00          0.40            0.40
3:28      A       1000      0.00          0.00          0.00            0.00
3:29      R       0         0.00          0.00          0.00            0.00
3:30      R       0         0.00          0.00          0.00            0.00
3:31      A       1000      0.00          0.00          0.00            0.00
3:32      A       100       0.00          0.00          0.40            0.40
3:33      R       0         0.00          0.00          0.00            0.00
3:34      A       100       0.00          0.00          0.40            0.40
3:35      A       100       0.00          0.00          0.40            0.40
3:36      R       0         0.00          0.00          0.00            0.00
3:37      A       100       0.00          0.00          4.88            4.88
3:38      R       0         0.00          0.00          0.00            0.00
3:39      R       0         0.00          0.00          0.00            0.00
3:40      A       1000      0.00          0.00          0.00            0.00
3:41      R       0         0.00          0.00          0.00            0.00
3:42      R       0         0.00          0.00          0.00            0.00
3:43      A       100       0.00          0.00          0.40            0.40
3:44      R       0         0.00          0.00          0.00            0.00
3:45      A       100       0.00          0.00          0.40            0.40
3:46      R       0         0.00          0.00          0.00            0.00
3:47      A       100       0.00          0.04          0.40           60.06
3:48      A       1000      0.00          0.00          0.00            0.00
4:1       A       100       0.00          0.00          0.40            0.40
4:2       A       100       0.00          0.00          0.40            0.40
4:3       R       0         0.00          0.00          0.00            0.00
4:4       R       0         0.00          0.00          0.00            0.00
4:5       A       100       0.00          0.00          0.40            0.40
4:6       A       1000      0.00          0.00          0.00            0.00
4:7       R       0         0.00          0.00          0.00            0.00
4:8       A       100       0.00          0.00          0.40            0.40
4:9       A       1000      0.00          0.00          0.04            0.04
4:10      A       1000      0.00          0.00          0.04            0.04
4:11      A       100       0.00          0.00          0.40            0.40
4:12      R       0         0.00          0.00          0.00            0.00
4:13      A       1000      0.00          0.00          0.00            0.00
4:14      R       0         0.00          0.00          0.00            0.00
4:15      R       0         0.00          0.00          0.00            0.00
4:16      R       0         0.00          0.00          0.00            0.00
4:17      A       1000&

Photo of Prashanth KG

Prashanth KG, Employee

  • 5,300 Points 5k badge 2x thumb
Martin,

Thanks for the reply. 

Please share the following outputs to investiagate the eee counters error. 

show conf | include eee
show slot

Regarding the flow-control, on the ports where the congestion is seen, please try disabling the rx pause with the following command: 

disable flow-control rx-pause port <port number>

This would ensure that the traffic is not stopped transmitting even if a pause frame is received. 

Monitor and let us know if that helps!! 
Photo of Prashanth KG

Prashanth KG, Employee

  • 5,300 Points 5k badge 2x thumb
Regarding the fabric congestion, please execute the debug hal show congestion output multiple times and check if that counter is incrementing in real time. 
Photo of Martin Flammia

Martin Flammia

  • 6,326 Points 5k badge 2x thumb

Hi,

Info below:

Stack 1.1 # show config | include eee
Stack 1.2 # show slot
Slots    Type                 Configured           State       Ports
--------------------------------------------------------------------
Slot-1   X440-48p             X440-48p             Operational   48
Slot-2   X440-48p             X440-48p             Operational   48
Slot-3   X440-48p             X440-48p             Operational   48
Slot-4   X440-48p             X440-48p             Operational   48
Slot-5                                             Empty          0
Slot-6                                             Empty          0
Slot-7                                             Empty          0
Slot-8                                             Empty          0

Turned off flow control on all ports rx and tx and it made port congestion (packet drops) worse.

Stack 1.2 # show port congestion no-refresh
Port Congestion Monitor
Port      Link      Packet
          State     Drop
================================================================================
1:1       A         46853
1:5       A         46836
1:7       A         46836
1:13      A         46851
1:21      A         46850
1:26      A         34991
1:27      A         46838
1:32      A         46835
1:36      A         46836
1:38      A         46836
1:39      A         46836
1:41      A         46836
2:1       A         46846
2:3       A         46840
2:5       A         46840
2:14      A         46836
2:15      A         46836
2:17      A         46844
2:23      A         46842
2:26      A         55801
2:31      A         46836
2:32      A         46829
2:35      A         46845
2:38      A         46856
2:41      A         46839
2:43      A         46833
2:45      A         46825
3:1       A         46861
ACU       A         46859
3:5       A         46864
3:9       A         46865
3:21      A         46867
3:27      A         46853
3:32      A         46868
3:34      A         46868
3:35      A         46868
3:43      A         46862
3:45      A         46868
3:46      A         46866
3:47      A         46868
4:1       A         46851
4:2       A         46842
4:5       A         46858
4:8       A         46855
4:11      A         46858
4:18      A         46865
4:19      A         46847
4:23      A         46861
4:39      A         46844
4:44      A         46869

================================================================================

Interesting is that all these ports are 100mb ports. So turned off negotiation and fixed speed and duplex but still getting packet drops! The other interesting thing is looking at the MAC addresses on these ports they all connected to Avaya IP phones - need to check the physical port settings.....


Here is the result of the debug hal show congestion when it entered it multiple times:


Congestion information for slot 1 type X440-48p since last query
  Switch fabric congestion present: 7174025

Congestion information for slot 2 type X440-48p since last query
  Switch fabric congestion present: 34594

Congestion information for slot 3 type X440-48p since last query
  No switch fabric and CPU congestion present

Congestion information for slot 4 type X440-48p since last query
  Switch fabric congestion present: 34004

----------------------------------------------------------------

Stack 1.2 # debug hal show congestion
Congestion information for slot 1 type X440-48p since last query
  No switch fabric and CPU congestion present

Congestion information for slot 2 type X440-48p since last query
  No switch fabric and CPU congestion present

Congestion information for slot 3 type X440-48p since last query
  No switch fabric and CPU congestion present

Congestion information for slot 4 type X440-48p since last query
  No switch fabric and CPU congestion present

----------------------------------------------------------------

Stack 1.2 # debug hal show congestion
Congestion information for slot 1 type X440-48p since last query
  No switch fabric and CPU congestion present

Congestion information for slot 2 type X440-48p since last query
  No switch fabric and CPU congestion present

Congestion information for slot 3 type X440-48p since last query
  No switch fabric and CPU congestion present

Congestion information for slot 4 type X440-48p since last query
  No switch fabric and CPU congestion present

----------------------------------------------------------------

Stack 1.2 # debug hal show congestion
Congestion information for slot 1 type X440-48p since last query
  No switch fabric and CPU congestion present

Congestion information for slot 2 type X440-48p since last query
  No switch fabric and CPU congestion present

Congestion information for slot 3 type X440-48p since last query
  No switch fabric and CPU congestion present

Congestion information for slot 4 type X440-48p since last query
  No switch fabric and CPU congestion present

----------------------------------------------------------------

Stack 1.2 # debug hal show congestion
Congestion information for slot 1 type X440-48p since last query
  No switch fabric and CPU congestion present

Congestion information for slot 2 type X440-48p since last query
  No switch fabric and CPU congestion present

Congestion information for slot 3 type X440-48p since last query
  No switch fabric and CPU congestion present

Congestion information for slot 4 type X440-48p since last query
  No switch fabric and CPU congestion present



Photo of Martin Flammia

Martin Flammia

  • 6,326 Points 5k badge 2x thumb

Enabled LLDP on 100mb port so that I could get details of Phone:


LLDP Port 1:5 detected 1 neighbor
  Neighbor: (5.1)172.17.255.35/70:38:EE:D2:36:C3, age 11 seconds
    - Chassis ID type: Network address (5); Address type: IPv4 (1)
      Chassis ID     : 172.17.255.35
    - Port ID type: MAC address (3)
      Port ID     : 70:38:EE:D2:36:C3
    - Time To Live: 120 seconds
    - System Name: "AVXD236C3"
    - System Capabilities : "Telephone"
      Enabled Capabilities: "Telephone"
    - Management Address Subtype: IPv4 (1)
      Management Address        : 172.17.255.35
      Interface Number Subtype  : System Port Number (3)
      Interface Number          : 1
      Object ID String          : "1.3.6.1.4.1.6889.1.69.3.1"
    - IEEE802.3 MAC/PHY Configuration/Status
      Auto-negotiation       : Supported, Enabled (0x03)
      Operational MAU Type   : 100BaseTXHD (15)
    - MED Capabilities: "MED Capabilities, Network Policy, Extended Power via MDI - PD, Inventory"
      MED Device Type : Endpoint Class III (3)
    - MED Extended Power-via-MDI
      Power Type    : PD Device (1)
      Power Source  : Unknown (0)
      Power Priority: High (2)
      Power Value   : 5.1 Watts
    - MED Network Policy
      Application Type  : Voice (1)
      Policy Flags      : Known Policy, Untagged (0x0)
      VLAN ID           : 200
      L2 Priority       : 6
      DSCP Value        : 46
    - MED Network Policy
      Application Type  : Voice Signaling (2)
      Policy Flags      : Known Policy, Untagged (0x0)
      VLAN ID           : 200
      L2 Priority       : 6
      DSCP Value        : 46
    - MED Hardware Revision: "1603D02A"
    - MED Firmware Revision: "hb1603ua1_350B.bin"
    - MED Software Revision: "ha1603ua1_350B.bin"
    - MED Serial Number: "12WZ274604JK"
    - MED Manufacturer Name: "Avaya"
    - MED Model Name: "1603"
    - Avaya/Extreme Conservation Level Support
      Current Conservation Level: 0
      Typical Power Value       : 4.4 Watts
      Maximum Power Value       : 5.1 Watts
      Conservation Power Level  : 1=3.8W
    - Avaya/Extreme Call Server(s): 172.17.255.240
    - Avaya/Extreme IP Phone Address: 172.17.255.35 255.255.254.0
      Default Gateway Address       : 172.17.254.1
    - Avaya/Extreme File Server(s): 0.0.0.0
    - Avaya/Extreme IEEE 802.1q Framing: Tagged


Might see if I get the port set to auto-negotiate disable and see what happens?


What's also interesting is that when doing a 'show port congestion' all the ports increment drops packets in the same time frame (i.e wont increment for a few seconds, then all at the same time) and increment at the same rate?

(Edited)
Photo of Mike D

Mike D, Alum

  • 3,852 Points 3k badge 2x thumb
I know sleep has been touched on earlier in the thread - but end stations in power down mode often drop to 10Mb hdx and are infamous for negative network impact.
1) continuous pause frames (impact can go far beyond the local link)
2) high bandwidth ipv6 ND

These often hit a stack in distributed fashion since 
* enterprises often use the same hardware and OS/drivers cookie cutter style 
* a new version of a popular OS gets released and finds its way to many users PCs.
* at 5PM or whenever work or school gets out multiple end stations sleep at the same time

No docs on the pause behavior but see multicast ipv6 packet blurb here:
https://communities.intel.com/thread/48051.   

http://community.spiceworks.com/topic/422869-dell-optiplex-9020-blasting-icmpv6-multicast-listener-discovery-during-s1-sleep

If broadcast or flood related, add wireshark to your growing list of analysis tools. 
plenty of sound direction here already - but when there's free time try connecting to a troubled port with a wireshark laptop or other capture tool - wide open.  Buffer will probably fill fast so a short snapshot is best - couple seconds.  Capture may provide some insight.  


Regards,
Mike


 
Photo of Martin Flammia

Martin Flammia

  • 6,326 Points 5k badge 2x thumb

Thanks for the info and will certainly look into that and great advise.

My thoughts though are that since I turned off flow-control the amount of ports showing packet drops has significantly grown, so perhaps it was these ports that where sending the pauses?

With that it mind is there anyway (aside from attaching a sniffer) that I could see what packets are being dropped on the ports, some examples could be:

  • debug command that outputs drops
  • turn on a filter that outputs drops
  • tcpdump on a single port
  • write an ACL for all traffic on port and log

Be interested to know if the same can be done at looking at what would be causing congestion on the switch fabric and CPU?

At the moment both ends are set to auto-negotiate and they are both negotiating 100mb full. I want to try fixing both ends but currently looking at the issue remotely.

There seems no logical reason packets should be being dropped, there are no other errors (like CRC) and there is plenty of bandwidth, and there is no other device attached to the phone. I could increase the buffer size for QP6 as its currently set to default but I really shouldn't be getting any contention as the traffic is so low?

If I could see what's being dropped, then that might give me a clue as to what's happening.

Also appreciate if there is any equivalent to configuring Link Flap in EXOS?

Many thanks.

Photo of Martin Flammia

Martin Flammia

  • 6,326 Points 5k badge 2x thumb

Created the following ACL in order to try and collect the traffic and what might be being dropped. Haven't gone through it properly yet but thought I would post in case anyone sees anything useful.

create access-list Debug-Port-Ingress " ; " "  permit  ; log  ; mirror-cpu  ; count Debug-Port-Ingress ;" application "Cli"

configure access-list add Debug-Port-Ingress last priority 0 zone SYSTEM ports 1:5 ingress


09/15/2015 14:39:25.22 <Info:Kern.Card.Info> Slot-1: seq: 0xa17f385c ackSeq: 0xf5cc4b21 win: 0x4470 urgPtr: 0x0 ack
09/15/2015 14:39:25.22 <Info:Kern.Card.Info> Slot-1: 172.17.255.35:1678 -> 172.17.255.240:1720 TCP v4 hLen: 20 ttl: 64 tos: 0x0 tLen: 40
09/15/2015 14:39:25.22 <Info:Kern.Card.Info> Slot-1: 70:38:ee:d2:36:c3 -> e4:1f:13:32:95:44 IP
09/15/2015 14:39:25.22 <Info:Kern.Card.Info> Slot-1: 60-byte packet from 1:5 (vlanId=200) matches rule Debug-Port-Ingress

09/15/2015 14:39:15.67 <Info:Kern.Card.Info> Slot-1: 70:38:ee:d2:36:c3 -> 01:80:c2:00:00:0e EtherType: 0x88cc
09/15/2015 14:39:15.67 <Info:Kern.Card.Info> Slot-1: 291-byte packet from 1:5 (vlanId=200) matches rule Debug-Port-Ingress

09/15/2015 14:39:10.21 <Info:Kern.Card.Info> Slot-1: 70:38:ee:d2:36:c3 172.17.255.35 -> e4:1f:13:32:95:44 172.17.255.240
09/15/2015 14:39:10.21 <Info:Kern.Card.Info> Slot-1: hwType: 0x1 protoType: 0x800 op: 2 hdrLen: 6 protoLen: 4
09/15/2015 14:39:10.21 <Info:Kern.Card.Info> Slot-1: 70:38:ee:d2:36:c3 -> e4:1f:13:32:95:44 EtherType: 0x0806
09/15/2015 14:39:10.21 <Info:Kern.Card.Info> Slot-1: 64-byte packet from 1:5 (vlanId=200) matches rule Debug-Port-Ingress
09/15/2015 14:39:05.21 <Info:Kern.Card.Info> Slot-1: seq: 0xa17f385d ackSeq: 0xf5cc4b21 win: 0x4470 urgPtr: 0x0 ack
09/15/2015 14:39:05.21 <Info:Kern.Card.Info> Slot-1: 172.17.255.35:1678 -> 172.17.255.240:1720 TCP v4 hLen: 20 ttl: 64 tos: 0xb8 tLen: 40
09/15/2015 14:39:05.21 <Info:Kern.Card.Info> Slot-1: 70:38:ee:d2:36:c3 -> e4:1f:13:32:95:44 IP
09/15/2015 14:39:05.21 <Info:Kern.Card.Info> Slot-1: 60-byte packet from 1:5 (vlanId=200) matches rule Debug-Port-Ingress

09/15/2015 14:38:45.67 <Info:Kern.Card.Info> Slot-1: 70:38:ee:d2:36:c3 -> 01:80:c2:00:00:0e EtherType: 0x88cc
09/15/2015 14:38:45.67 <Info:Kern.Card.Info> Slot-1: 291-byte packet from 1:5 (vlanId=200) matches rule Debug-Port-Ingress

09/15/2015 14:38:45.26 <Info:Kern.Card.Info> Slot-1: seq: 0xa17f385c ackSeq: 0xf5cc4b21 win: 0x4470 urgPtr: 0x0 ack
09/15/2015 14:38:45.26 <Info:Kern.Card.Info> Slot-1: 172.17.255.35:1678 -> 172.17.255.240:1720 TCP v4 hLen: 20 ttl: 64 tos: 0x0 tLen: 40
09/15/2015 14:38:45.26 <Info:Kern.Card.Info> Slot-1: 70:38:ee:d2:36:c3 -> e4:1f:13:32:95:44 IP
09/15/2015 14:38:45.26 <Info:Kern.Card.Info> Slot-1: 60-byte packet from 1:5 (vlanId=200) matches rule Debug-Port-Ingress

09/15/2015 14:38:30.24 <Info:Kern.Card.Info> Slot-1: 70:38:ee:d2:36:c3 172.17.255.35 -> e4:1f:13:32:95:44 172.17.255.240
09/15/2015 14:38:30.24 <Info:Kern.Card.Info> Slot-1: hwType: 0x1 protoType: 0x800 op: 2 hdrLen: 6 protoLen: 4
09/15/2015 14:38:30.24 <Info:Kern.Card.Info> Slot-1: 70:38:ee:d2:36:c3 -> e4:1f:13:32:95:44 EtherType: 0x0806
09/15/2015 14:38:30.24 <Info:Kern.Card.Info> Slot-1: 64-byte packet from 1:5 (vlanId=200) matches rule Debug-Port-Ingress

09/15/2015 14:38:25.22 <Info:Kern.Card.Info> Slot-1: seq: 0xa17f385c ackSeq: 0xf5cc4b21 win: 0x4470 urgPtr: 0x0 ack
09/15/2015 14:38:25.22 <Info:Kern.Card.Info> Slot-1: 172.17.255.35:1678 -> 172.17.255.240:1720 TCP v4 hLen: 20 ttl: 64 tos: 0x0 tLen: 40
09/15/2015 14:38:25.22 <Info:Kern.Card.Info> Slot-1: 70:38:ee:d2:36:c3 -> e4:1f:13:32:95:44 IP
09/15/2015 14:38:25.22 <Info:Kern.Card.Info> Slot-1: 60-byte packet from 1:5 (vlanId=200) matches rule Debug-Port-Ingress

09/15/2015 14:38:15.67 <Info:Kern.Card.Info> Slot-1: 70:38:ee:d2:36:c3 -> 01:80:c2:00:00:0e EtherType: 0x88cc
09/15/2015 14:38:15.67 <Info:Kern.Card.Info> Slot-1: 291-byte packet from 1:5 (vlanId=200) matches rule Debug-Port-Ingress

09/15/2015 14:31:25.04 <Info:Kern.Card.Info> Slot-1: seq: 0xa17f3803 ackSeq: 0xf5cc47b4 win: 0x4470 urgPtr: 0x0 ack
09/15/2015 14:31:25.04 <Info:Kern.Card.Info> Slot-1: 172.17.255.35:1678 -> 172.17.255.240:1720 TCP v4 hLen: 20 ttl: 64 tos: 0x0 tLen: 40
09/15/2015 14:31:25.04 <Info:Kern.Card.Info> Slot-1: 70:38:ee:d2:36:c3 -> e4:1f:13:32:95:44 IP
09/15/2015 14:31:25.04 <Info:Kern.Card.Info> Slot-1: 60-byte packet from 1:5 (vlanId=200) matches rule Debug-Port-Ingress

(Edited)
Photo of Martin Flammia

Martin Flammia

  • 6,326 Points 5k badge 2x thumb

Got I little bit further with this so thought I would share the update.....

When looking at the qosmonitor congestion I noticed that it was all showing on QP1, where as all the voice traffic being QoS marked uses QP6

Currently the port 1:5 as voice vlan configured for tagged and data vlan for untagged. Based on this I removed the data vlan from the port and now the packets have stopped dropping.

So I guess my next step is get a packet trace of the traffic on the data vlan........ unless anyone knows how I can do a TCPDUMP on the switch port directly...... that would be really useful.

Wonder if I'm simply seeing contention of a 1Gb PC being connected to a 100mb phone?

Thanks.


 Stack 1.1 # show port 1:5 congestion no-refresh
Port Congestion Monitor
Port      Link      Packet
          State     Drop
================================================================================
1:5       A         425053
================================================================================


Stack 1.2 # show port 1:5 qosmonitor no-refresh
Port Qos Monitor
Port   QP1      QP2      QP3      QP4      QP5      QP6      QP7      QP8
       Pkt      Pkt      Pkt      Pkt      Pkt      Pkt      Pkt      Pkt
       Xmts     Xmts     Xmts     Xmts     Xmts     Xmts     Xmts     Xmts
===============================================================================
1:5    1160455  0        0        0        0        7069     0        170


Stack 1.3 # show port 1:5 qosmonitor congestion no-refresh
Port Qos Monitor
Port   QP1      QP2      QP3      QP4      QP5      QP6      QP7      QP8
       Pkt      Pkt      Pkt      Pkt      Pkt      Pkt      Pkt      Pkt
       Cong     Cong     Cong     Cong     Cong     Cong     Cong     Cong
===============================================================================
1:5    425122   0        0        0        0        0        0        0

(Edited)
Photo of Prashanth KG

Prashanth KG, Employee

  • 5,300 Points 5k badge 2x thumb
Hi Martin,

You might be interested in this article below for the packet capture at the port level from the CLI. 

https://gtacknowledge.extremenetworks.com/articles/How_To/How-to-perform-a-local-packet-capture-on-an-EXOS-switch

If you want to stop the capture at any time, just hit CTRL+C. And make sure to specify the cmd-args -c <count of  packets> since the switch is in production. 

Hope this helps! 
Photo of Martin Flammia

Martin Flammia

  • 6,326 Points 5k badge 2x thumb
That's exactly what I have been looking for...... Thanks Prashanth