I have a laptop running FreeBSD 11.0 amd64, a server that was running FreeBSD 10.1 i386, and a gateway machine running FreeBSD 10.2 i386 . Everything worked fine. I recently "upgraded" the server machine to FreeBSD 11.0 amd64, and TCP started falling apart between the laptop and the server. The cross-architecture "upgrade" procedure involved installing FreeBSD 11.0 fresh, bringing back the old /usr/local, running "pkg bootstrap -f" and "pkg upgrade -f" for all the ports, and bringing back old filesystems (like /home) unchanged. Then recompile the programs I wrote (and sometimes discover unjustified int == long assumptions). This same procedure worked a couple of months back upgrading the laptop from FreeBSD 10.2 i386 to FreeBSD 11.0 amd64 without any problems.
Reproduction sequence: log in to the laptop. "slogin server". cd to a directory containing about 400 files (actually 50 files may be enough). Type the command "ls -l". The output whizzes by with some 5-second to 10-second pauses between bursts, then stops. (If I do this on the server console, there are no pauses and no lockup). This lockup is about 99.5% consistent. The only way out of that state I can figure is to disconnect and re-log-in to the server. If I leave the connection like that, it eventually (after a few minutes) says "Fssh_packet_write_wait: Connection to 192.168.1.3 port 22: broken pipe".
I can stay logged in and get work done if I remember to run any long output (which seems to be about 40 lines of text) though "more". If I make the xterm window 48 lines long, that is too much of a data burst and it will often lock up. "make" output causes no problems as long as the compiles are slow. If they are almost instant, it will lock up.
The problem is not limited to ssh connections. FTP gets stuck quickly. So do MySQL queries involving a lot of data (e.g. 4Kbytes). The worst is trying to access a web server on the server with firefox on the laptop, which usually gets a partially-complete page.
The same thing happens if I take the laptop off of wifi and put all 3 machines on the same ethernet wire (100Mbps). The laptop can use an xterm or a virtual console. Also, note I am using the exact same hardware that's been working for years, just the OS version is different. And it seems that FreeBSD 11.0 only has trouble talking to another FreeBSD 11.0. It doesn't matter whether I turn the (ipfw) firewall on or off on laptop, server, or both. The firewall allows pretty much everything on the LAN anyway.
The server can get stuff from the Internet via the gateway machine, such as downloading package updates, with no problem.
This smells like a MTU/MSS problem: anything that involves sending a full-size ethernet packet gets stuck, possibly because something along the line is refusing to deal with large packets. But I don't see where that happens on my LAN. (I have seen it happen on DSL modems where it encapsulates the packets, adding a small header, and neither end knows about it, and someone may be blocking ICMP, but the connection to the internet is the part that *is* working. I tried my little program to edit the MSS in TCP connection startup packets involving the server (even localhost!), but it didn't make any noticable difference.
Are there any TCP options that changed defaults between FreeBSD 10.1 and FreeBSD 11.0? like those net.inet.tcp.rfc* sysctl variables? Residual parts of HPN in sshd I need to configure OFF? Changes in the de0 driver?
Any ideas what I should check?
Reproduction sequence: log in to the laptop. "slogin server". cd to a directory containing about 400 files (actually 50 files may be enough). Type the command "ls -l". The output whizzes by with some 5-second to 10-second pauses between bursts, then stops. (If I do this on the server console, there are no pauses and no lockup). This lockup is about 99.5% consistent. The only way out of that state I can figure is to disconnect and re-log-in to the server. If I leave the connection like that, it eventually (after a few minutes) says "Fssh_packet_write_wait: Connection to 192.168.1.3 port 22: broken pipe".
I can stay logged in and get work done if I remember to run any long output (which seems to be about 40 lines of text) though "more". If I make the xterm window 48 lines long, that is too much of a data burst and it will often lock up. "make" output causes no problems as long as the compiles are slow. If they are almost instant, it will lock up.
The problem is not limited to ssh connections. FTP gets stuck quickly. So do MySQL queries involving a lot of data (e.g. 4Kbytes). The worst is trying to access a web server on the server with firefox on the laptop, which usually gets a partially-complete page.
The same thing happens if I take the laptop off of wifi and put all 3 machines on the same ethernet wire (100Mbps). The laptop can use an xterm or a virtual console. Also, note I am using the exact same hardware that's been working for years, just the OS version is different. And it seems that FreeBSD 11.0 only has trouble talking to another FreeBSD 11.0. It doesn't matter whether I turn the (ipfw) firewall on or off on laptop, server, or both. The firewall allows pretty much everything on the LAN anyway.
The server can get stuff from the Internet via the gateway machine, such as downloading package updates, with no problem.
This smells like a MTU/MSS problem: anything that involves sending a full-size ethernet packet gets stuck, possibly because something along the line is refusing to deal with large packets. But I don't see where that happens on my LAN. (I have seen it happen on DSL modems where it encapsulates the packets, adding a small header, and neither end knows about it, and someone may be blocking ICMP, but the connection to the internet is the part that *is* working. I tried my little program to edit the MSS in TCP connection startup packets involving the server (even localhost!), but it didn't make any noticable difference.
Are there any TCP options that changed defaults between FreeBSD 10.1 and FreeBSD 11.0? like those net.inet.tcp.rfc* sysctl variables? Residual parts of HPN in sshd I need to configure OFF? Changes in the de0 driver?
Any ideas what I should check?