TCP/IP MTU WTFery

Back in July, our home internet service provider decided to muck with some network settings which ended up "breaking the internet" for us. Since I work from home, entirely online, a non-functioning internet connection is a bit of a problem. So, we kicked into gear to try to get things working again. I learned some interesting stuff about TCP/IP networking that I didn't know before, so I thought I'd write it up in case anyone else is interested.

Reboot

First step: reboot ALL THE THINGS! We rebooted the ISP's router, and our wireless access point/router, just in case something had gone wrong in either of those, but the network behavior didn't seem to improve.

Strangely, DNS seemed to be working. I could ping several servers. But if I tried to open web pages like Twitter or Facebook, the browser would hang, and eventually complain that the connection had been closed. Interestingly, Google loaded, and could respond to searches.

Then I tried connecting to my Linux server. SSH would connect, but then tmux would automatically start and try to send a window full of text. At that point, the connection would hang, and eventually get dropped. If I tried to SSH and run a command that didn't output too much text, the problem wouldn't happen. (At least, not immediately.)

"Aha! It's failing when it tries to send me lots of data. It sounds like the MTU is too big."

So, I lowered the MTU on the router from its default 1500 down to a conservative 1400, which (unfortunately) required a reboot. When the router came back online, suddenly everything worked. Web sites loaded properly, and my SSH connection would successfully render a page full of text without dropping its connection.

Wait ... how did that fix things?

But then I realized that I wasn't entirely sure how that fixed things. And in my experience, that's a great opportunity to learn something new. First, let me recap my knowledge of networking at that point:

I knew that the MTU was the "Maximum Transmission Unit", the size of the largest data packet that a network can support. Packets larger than that might not work correctly on a network, so should be broken up into smaller units. You can do that in a couple ways: either you just make smaller packets, by putting less data into each one, or you use packet fragmentation to break up an existing large packet into several smaller packets that the other end of the connection will have to reassemble.

Which left me with two pieces that I didn't understand:

  1. I knew that IP routers can perform fragmentation. If they receive packets from one network that are too large for another network's MTU, they can break them up into fragments small enough for the other network. Why wasn't that happening for my network connections?
  2. I also knew that the MTU was the Maximum Transmission Unit. The MTU I set in my router tells the router the maximum packet size it's allowed to send to my ISP. But the problem seemed to be when I was receiving a packet too big. How did lowering my MTU stop me from receiving a packet that was too big?

To Wireshark!

If you're not familiar, Wireshark is an awesome tool that lets you inspect network traffic that your computer is sending and receiving, all the way down to the level of the bits in each packet. So, I reverted my router's MTU back to the "broken" 1500, fired up Wireshark, and tried connecting to my Linux server again.

When I looked at the packets in Wireshark, it helpfully automatically color-coded some of the packets with red text on a black background. That seemed like a good place to start. Here's a summary of what I saw:

  • Packet #57, sent from my server to my laptop, had a sequence ID (tcp.seq) of 3232, and a tcp.len of 1016.
  • Packet #58, also coming from my server, had a tcp.seq=5696, and tcp.len=592

Both had a TCP header length (tcp.hdr_len, in Wireshark) of 32 bytes. But #58 was marked in red, and its "info" line read: "[TCP Previous segment not captured]."

It turns out that a TCP connection can figure out that it's lost a packet by using the TCP packet's sequence ID and length. Since packet #57 had a SEQ of 3232, and a length of 1016, you would expect the next packet to have a sequence ID of 4248 (3232+1016). But instead, the next packet received had SEQ=5696, a full 1448 bytes too far ahead. If you add 32 (the TCP header size that the server seemed to be using), and 20 bytes of IP header, you get a packet of size of exactly 1500 bytes. That seemed to correlate with the MTU setting that was failing.

But I still didn't know how lowering my MTU was reducing the size of the packets that I was receiving, so I lowered the MTU (this time to 1492), started a new Wireshark capture, and connected to my server again. Here's what I saw this time:

  • The connection was surprisingly deterministic, the packets even got the same numbers/sizes!
  • Packet #57 again had SEQ=3232, len=1016.
  • Packet #58 had SEQ=4248, len=1440.

1440, + 32 (TCP header length in use) + 20 (IP header) = 1492, the new MTU. My server started sending me smaller packets!

But how did a server, on the other side of the internet, know that my home router's MTU had changed!? Surely there's some information exchange going on here.

I started inspecting the way that the TCP/IP connections were opened, wondering if they have some way of communicating MTUs to each other. Sure enough, there's quite a bit of information transmitted when you open up a new TCP connection.

IP Fragmentation Considered Harmful

The first thing that I noticed was that my connection had the IP flag "Don't fragment" set on packets coming from both the client and server. Why would anyone disable that!? ... it turns out that the general consensus is that fragmentation is a thing to be avoided. IPv6 doesn't even support IP fragmentation performed by routers!

Well, that would explain why IP fragmentation wasn't saving me. Can't rely on that. Next!

TCP MSS

Then I dug into the TCP packets that set up the connection between hosts. There I found a promising looking MSS ("Maximum Segment Size") flag. It defines the amount of data that you can send in each TCP packet for this connection. There are lots of hand-wavy details about this which get ironed out in RFC 879, but the (very) short version is that, nowadays, at least, MSS = MTU - 40.

OK! So that's how the MTU is getting communicated with the other end! Or, so I thought. But when I compared my "before" and "after" network captures, both tried to set up connections with MSS=1460 (MTU=1500). At this point, I got frustrated and gave up for the day.

Helpful Router Gremlins

But the next day, I couldn't stop thinking about it. The MTU setting for my router was the MTU for the ISP's network, on the outside port of the router. Inside my router, my private network's MTU would be completely different. So of course my laptop couldn't know about the MTU of the outside network. And yet, obviously somehow that data was getting communicated.

I got the idea to capture the network connection from the other side of the network (on the server). And that's when I was surprised to see... the TCP SYN packet that it received had had its MSS lowered! Though my laptop was always sending out the MTU that it knew about (1500), my home router was mangling my outbound SYN packets to let remote servers that they shouldn't send packets bigger than the MTU that it knew about! This seems like a total hack and I can't tell if it's disgusting or beautiful.

I tested another consumer router and found similar behavior, though it seemed to advertise a much more conservative MTU of 1300, so apparently this isn't all that uncommon. I wonder -- if my ISP changed the MTU for their network, shouldn't they have done the same sort of thing?

How Does Anything Ever Work?

Once all this new info was in my head, the state of TCP started looking pretty crappy. How does any of this ever work?

There's a really informative blog post by CloudFlare that filled me in on some things:

  • When "Do Not Fragment" is set, a router that can't forward a packet because it's too big should respond with an ICMP message saying so.
  • Buuut, you can't always rely on that ICMP message getting back to you, because so many routers drop them. (And I'd imagine when you throw NAT into the mix, the likelihood of ICMPs getting lost is even higher.)
  • There's a work-around to have your TCP stack automatically probe the network to find a workable MTU, but it's not enabled by default on Linux. (D'oh!)

Summary

  • TCP/IP is complicated.
  • If many of your internet connections seem to hang and eventually drop, but you can load Google.com, you might be able to fix it by lowering your router's MTU.
  • If you run a server, and you really want to make sure your customers can access your services, use a conservative MTU or enable MTU probing as detailed in the blog post I mentioned above. (That's why Google worked when other popular sites failed!)
comments powered by Disqus