Thursday, October 18, 2012

Linux NAT/Masquerading and TCP keepalive

I have a home office setup that consists of a desktop running Linux and laptop running Windows for work.

My Linux workstation has a Cisco VPN connection to the office. Due to the nature of the Cisco VPN setup, only one machine can maintain a VPN connection. So I simple have my Linux machine setup as the default gateway and it routes packets onto my employers network as needed.

This mostly works pretty well. Until today.

One application on my Windows laptop was behaving very erratically. It was not able to stay connect to the server on the head office network.

So I ran wireshark on tun0 - the VPN interface on the Linux host. This showed the issue - the application would send TCP keepalives after only 30 seconds idle time. The server was responding in a timely manner, but the application would respond with a RST.

Interestingly the TCP keep alive packets had a sequence number less that the already ACKed data.

Aha! Windows has a bug? Googling turned up that no, this is how it is meant to work.

Next I ran wireshark on my internal network - low and behold Windows was not the one generating the RST - it was the Linux machine.

My guess was that the connection tracking in the NAT code was seeing the out of window packet and generating the RST.

Now I could change my network config so that Windows runs the VPN client, and is the default gateway and router for my Linux machine. The problem is that the Cisco VPN client is a GUI app that requires the password be entered every time it has to connect. So I tried to get the open source VPNC running on Windows - it would connect but not getting the routing right. I didn't really like this solution anyway.

I could turn up a couple of posts from 2009 of someone with the same problem:


http://www.spinics.net/lists/netfilter/msg45996.html
http://www.spinics.net/lists/netfilter-devel/msg09465.html

but no real answer in those threads.

I started reading the source

http://lxr.linux.no/#linux+v3.6.2/net/netfilter/nf_conntrack_proto_tcp.c

but was not making quick progress.

I thought it might be worth poking around my netfilter settings. I found a couple of interesting files:

/proc/sys/net/netfilter/nf_conntrack_tcp_be_liberal
/proc/sys/net/netfilter/nf_conntrack_tcp_loose

Now loose was already set to 1, but liberal was not. Setting it to 1 resolved the problem. Netfilter was not longer generating the RST for the out of window keep alives.

  • nf_conntrack_tcp_be_liberal - If zero, drop out of window packets.
  • nf_ct_tcp_loose - If zero, don't pickup established connections 
So I thought I'd just put this in a blog post for the next time I forget about the solution, and for the next person who struggles to find an answer on google.