Solved Tor relay keeps running out of nmbclusters

subnetspider · Aug 18, 2022

Hello everyone,

I've been running a Tor relay on a small VPS with 1 vCore and 1 GiB RAM for a few months, but around after upgrading to Tor 0.4.7.8, my FreeBSD host keeps running out of nmbclusters.
Every time this happens tor and sshd stop working and I have to log into the cloud dashboard and manually restart the FreeBSD 13.1 VPS.

Even after Upgrading the VPS to 2 vCores and 2 GiB RAM, reducing the advertised bandwidth from 10 MB/s to 3 MB/s and increasing the kern.ipc.nmbclusters to 200.000, it keeps happening.
I've been trying to fix this issue for almost 2 months now and at this point I don't know if this is due to my settings, FreeBSD, the VPS Hoster or Tor itself (or even DDoS?).

I would be very grateful if someone could help me with this.
If you need more information or logs, I'll send them of course.
Thanks in advance.

rootbert · Aug 18, 2022

did you also increase kern.ipc.nmbjumbop? Maybe it helps to have a small script that dumps the output of netstat -m periodically (e.g. every minute) into a logfile.
relevant logs are no bad idea so please post relevant log entries (dmesg?)

subnetspider · Aug 18, 2022

rootbert said:
did you also increase kern.ipc.nmbjumbop? Maybe it helps to have a small script that dumps the output of netstat -m periodically (e.g. every minute) into a logfile.
relevant logs are no bad idea so please post relevant log entries (dmesg?)

I didn't increase kern.ipc.nmbjumbop as thought this is only useful with frames/packets larger that 1500 bytes - I can try though.
The script with netstat -m sound like a good Idea, will try this later and post the output of dmesg and var/log/messsages as well.

subnetspider · Aug 18, 2022

Here is the output of netstat -m:

Code:

6157/45683/51840 mbufs in use (current/cache/total)
2827/7697/10524/125563 mbuf clusters in use (current/cache/total/max)
772/1006 mbuf+clusters out of packet secondary zone in use (current/cache)
1202/736/1938/62781 4k (page size) jumbo clusters in use (current/cache/total/max)
0/0/0/18602 9k jumbo clusters in use (current/cache/total/max)
0/0/0/10463 16k jumbo clusters in use (current/cache/total/max)
12001K/29758K/41760K bytes allocated to network (current/cache/total)
0/6969201/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/838/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0 sendfile syscalls
0 sendfile syscalls completed without I/O request
0 requests for I/O initiated by sendfile
0 pages read by sendfile as part of a request
0 pages were valid at time of a sendfile request
0 pages were valid and substituted to bogus page
0 pages were requested for read ahead by applications
0 pages were read ahead by sendfile
0 times sendfile encountered an already busy page
0 requests for sfbufs denied
0 requests for sfbufs delayed

I also attached the output of /var/log/messages and dmesg:

rootbert · Aug 19, 2022

I would increase kern.ipc.nmbclusters to 500.000 and dump something like

 echo `date +%Y-%m-%d_%H:%M && (netstat -m | grep -e "requests for mbufs denied" -e "mbuf clusters in use")` >> /var/log/my-netstat.log

(or even more) on a per-minute base via cron into a logfile so you can see when something is happening. However, it could easily be an external factor to be the source of your issues. Oh, and I realized that you use VMware - make sure the Host is at least on version 7.

subnetspider · Aug 19, 2022

rootbert said:
I would increase kern.ipc.nmbclusters to 500.000

I just calculated how high the kern.ipc.nmbclusters my VPS should have:

I have around 6000-7000 TCP connections (typical for Tor), each connection takes up 96 KiB:

sysctl net.inet.tcp.sendspace: 32,768
sysctl net.inet.tcp.recvspace: 65,536

If I multiply 96KiB by the number of connections I will get 672,000 KiB:

7000 x 96 KiB = 672,000 KiB

Each Mbuf cluster occupies 2048 bytes, so If I divide 672,000 KiB by 2 KiB I will get 336,000 mbuf clusters.

672,000 KiB / 2 KiB = 336,000

I think I'll try that first, if that doesn't help I'll increase kern.ipc.nmbclusters to 500,000.
If my VPS runs out of RAM, I'll probably need to upgrade it to 4GB of RAM.

rootbert said:
and dump something like echo `date +%Y-%m-%d_%H:%M && (netstat -m | grep -e "requests for mbufs denied" -e "mbuf clusters in use")` >> /var/log/my-netstat.log (or even more) on a per-minute base via cron into a logfile so you can see when something is happening.

I'm trying to get this to work in cron but haven't been successful (never worked with cron before).
I've tried putting "* * * * * root" and "*/1 * * * * root" in front of the command but it still doesn't do anything.

subnetspider · Aug 19, 2022

I got the crontab working now, had to escape the percent signs with a backslash (\%) and put a newline after the command.
I also increased kern.ipc.nmbclusters to 350,000 to handle peak loads. I hope increasing kern.ipc.nmbclusters fixes the problem, but since it usually takes about 5 days between crashes, we have to wait and see...

Update

It seems like the reason for all my problems is just DDoS attacks on the entire Tor network, specifically on the directory services.
Since this means there is nothing wrong with either my FreeBSD VPS, my settings, or my hosting provider, I can now look for a solution to this problem.
Thanks anyway for the help, the info will still be useful.

Solved Tor relay keeps running out of nmbclusters

subnetspider

rootbert

subnetspider

subnetspider

Attachments

rootbert

subnetspider

subnetspider