Solved Only able to saturate gigabit link for a few minutes, then drops to 100Mbps

I'm running FreeBSD 14.1-RELEASE-p6 on a server, with two NICs both plugged in, using the igb driver

dmesg for these NICs looks as such:
Code:
igb0: <Intel(R) I210 (Copper)> port 0xe000-0xe01f mem 0xfc700000-0xfc77ffff,0xfc780000-0xfc783fff at device 0.0 on pci5
igb0: EEPROM V3.16-0 eTrack 0x800004d6
igb0: Using 1024 TX descriptors and 1024 RX descriptors
igb0: Using 4 RX queues 4 TX queues
igb0: Using MSI-X interrupts with 5 vectors
igb0: Ethernet address: d0:50:99:ff:c2:41
igb0: netmap queues/slots: TX 4/1024, RX 4/1024
pcib6: <ACPI PCI-PCI bridge> at device 5.0 on pci3
pci6: <ACPI PCI bus> on pcib6
igb1: <Intel(R) I210 (Copper)> port 0xd000-0xd01f mem 0xfc600000-0xfc67ffff,0xfc680000-0xfc683fff at device 0.0 on pci6
igb1: EEPROM V3.16-0 eTrack 0x800004d6
igb1: Using 1024 TX descriptors and 1024 RX descriptors
igb1: Using 4 RX queues 4 TX queues
igb1: Using MSI-X interrupts with 5 vectors
igb1: Ethernet address: d0:50:99:ff:c2:40
igb1: netmap queues/slots: TX 4/1024, RX 4/1024

I have an iSCSI volume exported on the system, being available on the igb0 port (named as "mgmt" on my server), ifconfig is as such:
Code:
mgmt: flags=1008843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,LOWER_UP> metric 0 mtu 1500
    options=4e527bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,WOL_MAGIC,VLAN_HWFILTER,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6,HWSTATS,MEXTPG>
    ether d0:50:99:ff:c2:41
    inet 10.0.5.0 netmask 0xfffff000 broadcast 10.0.15.255
    inet6 fe80::d250:99ff:feff:c241%mgmt prefixlen 64 scopeid 0x1
    inet6 fd2f:972c:cd64:0:d250:99ff:feff:c241 prefixlen 64 autoconf
    media: Ethernet autoselect (1000baseT <full-duplex>)
    status: active
    nd6 options=23<PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL>

I'm not really doing anything super fancy on this interface, it started primarily as a way to always get into the host system even if for whatever reason jails managed to saturate the connection on the other port. Starting from yesterday, I wanted to add an iSCSI volume to extend the storage capacity for my personal desktop (running Linux), that was fairly simple to setup in /etc/ctl.conf and all.

Problem, though: the initial transfer is a couple terabytes large and while on gigabit, it should only take a few hours, the network link drops itself down to 100Mbps after a short period of time. I have replaced every cable between the computers and even replaced and added network switches, but the problem persists. I had noticed that each time I unplug and replug the cables, the speed goes back to full gigabit for a few minutes, before dropping back down again. I have eventually come to a workaround of entering ifconfig mgmt down; ifconfig mgmt up repeatedly, every time the speed drops. It only temporarily restores gigabit performance. (At least with iSCSI operating over TCP, I'm pretty confident the brief downtime is not a real problem.)

I have read tuning(7) and adjusted net.inet.tcp.sendspace and net.inet.tcp.recvspace to no avail (putting those sysctls all the way to 1048576...). I have found old forum posts suggesting things like disabling tso mode on the interface, which also does not correct the problem. I have tried to disable lro too, also did not correct the problem. I'm at a loss for how to fix this permanently so I don't need to babysit the machine and bring the interface down and back up. Babysitting it to finish the initial transfer might work ok, but I definitely don't want to be doing this long-term.
 
Just to be sure that it's not the iscsi / disk can you test it with iperf3.
Also when you said that the network link drops to 100Mbps did you mean the actual media speed drops from 1000baseT to 100base or just the transfer speed drop to 100Mbps?
 
Just to be sure that it's not the iscsi / disk can you test it with iperf3.
I've done a test with nc and it shows the same speed drop.
Also when you said that the network link drops to 100Mbps did you mean the actual media speed drops from 1000baseT to 100base or just the transfer speed drop to 100Mbps?
ifconfig shows 1000baseT <full-duplex> the entire time. It's just the transfer speed.
 
You have to isolate the issue.

I'm assuming that you have freebsd system connected via a switch to a separate linux system. If that's not the case and both hosts are on the same hardware than you absolutely have to remove the nic hardware assists.

I would second the suggestion of first removing the file transfer from the equation and test with something like iperf or iperf3 between the servers. I don't think nc would be the best choice.

If you still see an issue, if possible, remove the switch from the equation.
 
you absolutely have to remove the nic hardware assists.
What are nic hardware assists?
test with something like iperf or iperf3 between the servers. I don't think nc would be the best choice.
I could, but why won't nc be a viable option? It's both part of the FreeBSD base system and it just opens a network socket. I can measure it just by piping /dev/zero through it...

I just don't understand what the 3rd party tool offers that nc doesn't.
If you still see an issue, if possible, remove the switch from the equation.
I don't think it's realistically possible to remove switches from the equation. They have to be on the same network somehow. Given I've already gone through the process of replacing every cable and every switch in an attempt to see if it was a hardware issue, I believe it's all software, possibly the igb(4) driver itself.
 
I don't think it's realistically possible to remove switches from the equation. They have to be on the same network somehow. Given I've already gone through the process of replacing every cable and every switch in an attempt to see if it was a hardware issue, I believe it's all software, possibly the igb(4) driver itself.
I think the suggestion is just for the purpose of testing/elimination. Connect two machines directly with a cable and make sure nothing else is involved (like drives or caches etc.) and the iperf suggestion is related to a tool for the job. Might be worth a try.
 
I'd have to run a very long ethernet cable across multiple rooms to attempt that (plus reconfiguring network on both computers and taking the machine down for its general server purposes...). I have strong doubts that it'll achieve better results, either. For the moment, I'm not willing/able to perform such a test.
 
I suspect that a switch might be responsible, too.

Instead of laying a long new cable you could use a $20 switch where the old cables centralize.
 
What are nic hardware assists?
Hardware offloads.
-TSO -LSO ect....
Whatever your adapter supports.
Might be called CAPS
Query your hardware and see what it offers
Then disable them.

-tso4 -tso6 -lro -vlanhwtso

There are probably more for your ethernet adaprer.
ifconfig igb0 will tell you what you have for "options".

RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWFILTER,RXCSUM_IPV6,TXCSUM_IPV6,HWSTATS
 
Hardware offloads.
-TSO -LSO ect....
Whatever your adapter supports.
Might be called CAPS
Query your hardware and see what it offers
Then disable them.
Ah yeah, I mentioned in the first post that I tried toggling those to no effect.

After some more hard hardware reconfigurations, I got a direct cable across and didn't suffer the problem. There is a switch in my bedroom that I completely passed over and I believe it has been causing my problems. Noticing it with my FreeBSD server caused that whole system to be a red herring.

At least it seems that I've sorted it... besides needing to buy a new switch :)
 
Back
Top