Slow file copying to site usings Windows SMB over MPLS circuit

  • 0
  • 2
  • Problem
  • Updated 12 months ago
  • Not a Problem
  • (Edited)
Hello folks,

This is not necessarily an Extreme Networks specific question, but I am hoping the community here can give me a clue about a problem that I have exhausted myself on.

I have a remote site which is connected to my home office using a Spectrum MPLS circuit (20Mbps). When *downloading* a file to a client PC from the home office, the speed is around 512KBps. At times, it will leap to 700KBps and then immediately drop. There are also long pauses before the transfer starts. And there are several pauses during the transfer. If I upload that same file to that same server, the speed is 2MBps (what is expected).

Some facts about the issue:
  1. A simple speed test indicates I am getting a full 20Mbps in both directions.
  2. Downloading a file using FTP, I achieve full speed.
  3. Downloading anything from the Internet - I achieve full speed.
  4. I have verified speed and duplex settings on the interfaces which connect the two sites, and they are fine.
  5. I have uninstalled the Antivirus software on the client, with no change.
  6. Downloading while connected to my Extreme Networks wireless access point at this site - I achieve full speed!?!? :-)
  7. Lowering the MTU of the client had no effect (I played the game of ping -f to find the 'perfect MTU value'). Although I found that the native MTU it was choosing for the file transfer was 1514 (while the MPLS circuit is 1500).
  8. The switch at the site is an old Cisco Catalyst. I have exchanged that switch with a known good spare which had no effect.
  9. I have replaced all the cables with brand new Cat6e cables, which had no effect.
I am going to try and attach some Wireshark captures demonstrating a "fast" and then a "slow" transfer. The difference is that the fast transfer was achieved on wireless, and the slow transfer was while connected via physical cable to the switch.

Here is a SLOW file transfer: CLICK ME
Here is a FAST file transfer: CLICK ME
Photo of Steve Ballantyne

Steve Ballantyne

  • 5,806 Points 5k badge 2x thumb

Posted 12 months ago

  • 0
  • 2
Photo of Jarek

Jarek

  • 2,398 Points 2k badge 2x thumb
Hi,

I don't know  your network schema (connections, etc.),
but I see for example:
- in slow transfer - frame length is 1514 + many TCP out - of - order + TCP DUP
- in fast  transfer - frame length is 1450 

If you download  from server to client, you can try to lower MTU on server side
for example to 1450 

--
Jarek
Photo of Steve Ballantyne

Steve Ballantyne

  • 5,806 Points 5k badge 2x thumb
Hello Jarek, I actually tried lowering the MTU value so that it matched what I was seeing in the fast transfer. But it didn't make any difference. I should have used a better example that compares uploading a file to downloading a file (using the same wired source and the same server/PC).

When I make that comparison, the MTU in both is 1514. The only difference I am seeing in the packet capture is this error message appears frequently, "TCP Previous segment not captured". As far as I can tell, this is an indication of packet loss. It's looking for an acknowledge on a packet that never showed up. But why packet loss specifically when downloading a file with SMB?
Photo of Jarek

Jarek

  • 2,398 Points 2k badge 2x thumb
Hmm... could you draw a simple picture what is where 
 and write when it is OK and when not ?

I think this could help me/us to better understand  the problem :).

--
Jarek
Photo of Steve Ballantyne

Steve Ballantyne

  • 5,462 Points 5k badge 2x thumb
Sure, sure. Here is "the worlds worst Visio".  ;-)

Note that this problem only exists at this particular MPLS site (I have many). Also it doesn't seem to matter what VLAN I put the end workstation on - or what VLAN the server is on that I am copying files from.

Photo of Erik Auerswald

Erik Auerswald, Embassador

  • 13,720 Points 10k badge 2x thumb
Hi Steve,

in the slow trace, the TCP window size does not go above 16425 bytes. In the fast trace, the window size goes up to 65263 (nearly a factor of 4). Window size handling depends on the implementations on the end systems.

If the window size is smaller than the BDP (bandwidth delay product), the TCP window size limits the throughput.

From the traces it seems to me that the client does not want to receive more than 16425 bytes without sending an acknowledgement. You might want to look into tuning the client's TCP stack.

I have heard the rumor that recent Windows client versions show bad WAN performance because of the default TCP settings, but I am not a Windows expert.

Thanks,
Erik
Photo of Steve Ballantyne

Steve Ballantyne

  • 5,806 Points 5k badge 2x thumb
That was good advice. The trick is that you can no longer manually set the TCP Window Size in Windows 10. It took me a while to find the article, but Technet states "this value is ignored with Windows 8 and later".

A colleague of mine found a note that said when you disable the all new "TCP autotuning" feature, you end up with a 64KB window size. I used this snazzy little TCP Optimizer utility to test this theory. And it certainly held true. Running a packet capture showed my window size had leaped to 63-64KB.

HOWEVER ... the speed issue still plagues me. I can run Wireshark while copying a file down and I can see that every time the download "hangs", I am getting a lot of "TCP Previous segment not captured" messages.

I would like to think that the speed is slightly better. It seems I can get closer to 2MB of speed, but with the constant halting and stalling - copying a copy takes just as long.

I am very tempted to drop a small firewall at this site and encrypt all my data through a tunnel just to put this stupid issue to bed!
Photo of Erik Auerswald

Erik Auerswald, Embassador

  • 13,720 Points 10k badge 2x thumb
Hi Steve,

that sounds a bit like bufferbloat to me.

Manually forcing the TCP window too high could create congestion. Together with too much buffering (perhaps even a well-meaning shaper configuration) this would result in lots of TCP segments being tail-dropped after the bottleneck's too large buffer eventually does fill.

Even with dynamic TCP window sizes bufferbloat will result in just what you have seen: high throughput, followed by stalling, and again from the start.

You can work around bufferbloat in the WAN by policing traffic entering the WAN to just below the bottleneck bandwidth (or configured shaper rate) and apply AQM or QoS on your equipment.

You can test the WAN's buffer depth by sending a steady stream of UDP packets and observing the additional delay added by buffering of excess traffic. See e.g. my notes about bufferbloat experiences for ideas. Testing that might help you rule out this specific issue as well, of course.

Thanks,
Erik
Photo of Grosjean, Stephane

Grosjean, Stephane, Employee

  • 13,516 Points 10k badge 2x thumb
The problem here, is that it works when connected to an AP. Is that AP connected to a different switch?
Photo of Steve Ballantyne

Steve Ballantyne

  • 5,806 Points 5k badge 2x thumb
Hello Stephane - that's the funny part of it. No, the AP is connected to the same switch that all the workstations are! Now you can see what is so frustrating about this.

When the client is traversing the encapsulated WASSP protocol on the AP, it seems to shake this issue of slow SMB/CIFS transfers.

I am tempted to install VPN tunnel. Or a GRE tunnel. I really think that it's the nesting of the CIFS protocol that "fixes" this issue.
Photo of Jarek

Jarek

  • 2,398 Points 2k badge 2x thumb
Hmmm... I think you can also try to disable Large Send Offload (LSO) (on client) and check again. You can find this option in network card -> advanced tab.

--
Jarek
(Edited)