Header Only - DO NOT REMOVE - Extreme Networks

QoS on ERS 4900 HW:03

Userlevel 1
We have got complaints from users about low network performance. The users are behind newly deployed ERS 4900 switches with 10Gbps uplinks. After some investigation we have addressed it as a QoS issue. When queue-set other than 1 is set it is not possible to use the whole bandwidth of the access port. For example, if you set queue-set to 4, the speed of 1Gbps port is shaped to 250Mbps (1000/4) and it is not possible to reach higher speed. This problem can be reproduced only on switches with HW revision 03. We also have switches with HW02 and they behave correctly.

Is there anyone else experiencing the same issue?

As a workaround we set only one queue which technically disables QoS, but the users are much happier.

9 replies

Userlevel 2
Martin, let me do some digging here. Roger
Userlevel 2
Hello Martin
I'm not aware of such an issue.
You should open a ticket with our GTAC.
Best regards
Userlevel 1
Hello guys, thanks for the quick reply. I have already opened a case. This post was just to help others who might eventually experience the same issue.

Take care!
Latest code?

If you can get the same workaround affect with queue-set 2, that would be better. Setting it to 1 lumps control traffic and best effort traffic in the same queue and it becomes a free for all.

What is your qos agent buffer setting on the HW02 and HW03 switches. I believe the default is Large. Sometimes people set it to Maximum (if only because Maximum sounds better) but that causes more problems than it fixes especially on switches with low speed or older devices. Certainly if you queue set and agent buffers aren't the same then it is not a comparison.

Are you seeing Dropped on No Resources incrementing when you are testing?

If you have an extra HW03 switch you can test with (not in production), then I would recommend you factory default it and then test with just a sender and receiver. If you see the same behavior then it is a bug and Extreme support should be able to reproduce it easily.
Userlevel 1
Yes, version 7.6.0.

But we have been digging more into this and found out it is not tied to a specific HW revision. We were misled by a fact that the problem could not be reproduced on the switches with HW revision 02. Actually the difference between those switches is in uplink speed. When the switch is connected to a core via 1G SFP it performs as expected. When we use 10G SFP+ the speed drops down.

And also it cannot be reproduced with factory default (boot default) configuration. When only basic set of commands (set switch ip, create two vlans, set tagging for uplink and membership for the ports) is used the performance is affected.

Yes, we are seeing Dropped on No Resources counter increasing on access port when the issue exists.
Martin disable the rate limit on the 10G SFP port in question..
Userlevel 1
Hello Alan,

rate-limit can be set only for broadcast and multicast packets. Anyway there is no rate-limit set on the uplink.
Userlevel 2
If you are seeing dropped on no resources on the access ports, it means packets are QOS tail dropped as they are queued on the access ports (beware of that stat, it only works for packets tail dropped on best effort queue; as of 5.8 you have QOS queue stats which work across all queues). With the 10G uplinks the traffic might present bursts which saturate the available QOS queues on the 1g access ports.
Apart from setting the qos queue-set to 1 (which effectively completely disables QOS on the switch), the only setting which can help is the QOS buffer sharing setting: Regular, Large, Maximum. This determines how much of the unused QOS queues from other ports/queues on the system a given port can use. You do not say what setting you have here. You need to try increasing this setting, and if to no avail, then reducing QOS queue-set is the only option.
Userlevel 1
Hello Ludovico,

you nailed it. I haven't changed the buffering sharing setting initialy and it was set to default large. Changing it to maximum helps.