Swap on ZFS in 2018?

In the past there were some interesting discussions about swap with ZFS, and its performance when a machine actually runs out of memory. These include:
2012 thread
2016 thread
2015 bug report

Many people, perhaps most, setup swap when they setup ZFS. But have any of you actually put it to the test, completely running out of memory, swapping things like crazy? Did your system go BOOM?!

What version of Freebsd did this happen on? How did you recover? Was your swap space on a file, partition, separate drive?
 
No.

My system uses only 3gig of memory (it physically has 4, but being 32-bit it can only use 3). I run ZFS on it, with 3 pools which have a total of a handful of TB of disk space. I run a variety of standard servers, and occasionally development stuff, but never any GUI (it is purely a server). Swap is to a dedicated partition on the system disk (I think 6gig). Never had any serious problem with it, never had it run out of memory and crash. I think I've been running FreeBSD in production on my server for ~8 years now, and swap has never been the issue. Right now, the swap is completely unused, but my uptime is only a few weeks (had a big power outage a few weeks ago due to rain, and didn't feel like running the generator, since it was the middle of the night).

I think the trick to avoiding swapping is to not run software which leaks memory. And in my (not at all humble) opinion, all GUI software leaks memory (and other resources), which is why desktop machines should be rebooted regularly. Servers are a different kettle of fish.
 
I have 16GB of RAM, and the thing can swap like crazy when building a bunch of big ports. Otherwise, rarely used. :)
 
Interesting topic, made me fire up the build monster and look. Using zfs for the file system but I couldn't find swap until I looked in /etc/fstab and then checked top. It is set at 1gb but no swap is listed when do a zfs list. I also have 96gb of RAM so I have never hit swap and would not ever expect to during a build, unless something runs amock and eats up ram.

Running ports-mgmt/synth and building 700+ plus ports, 16-18 at a time, it never gets above 15 or 16 gb of memory used.
 
I don't see where even parallel builds with a reasonable number of processes (dozens, good match to the number of cores times 2 or 3 safety margin for better IO performance) would use swap on a 16 or 96gig machine. Most compilers use dozens or hundreds of megabytes of core for reasonably-sized programs (even big ones). I think these days to use swap you either are using gigantic technical/scientific/supercomputing workloads (in which case you should have sized your code to match existing memory), or you have a memory leak.

I used to compile the Linux kernel, glibc and gcc on a 4MB 386. Admittedly it was very slow; kernel compiles took several hours, and glibc/gcc several days. A year or so later I upgraded the machine to 16meg, and it ran much better.
 
I think these days to use swap you either are using gigantic technical/scientific/supercomputing workloads (in which case you should have sized your code to match existing memory), or you have a memory leak

Agreed. I’ve never ran out of RAM, ever. I only use swap out of habit. Even a 500GB SSD costs $100 these days so using swap isn’t harming anything. I am particularly curious about people’s experiences with ZFS and swap under heavy loads.
 
I use top. Sometimes, not very often, maybe only after 24+ hours uptime, it reports 10-11 MB of swap usage on this plasma5-plasma desktop I'm running right now. I have 8 tabs open in firefox plus dolphin and one Konsole Terminal, on a 3 GB RAM i386 Dell Dimension 4700 with 12.0-RELEASE-p1. Here's my current top summary:
Code:
last pid:  2124;  load averages:  2.06,  1.09,  0.70    up 0+10:33:15  17:47:18
62 processes:  1 running, 61 sleeping
CPU:  5.1% user,  0.0% nice,  2.1% system,  0.0% interrupt, 92.8% idle
Mem: 386M Active, 1319M Inact, 105M Laundry, 140M Wired, 81M Buf, 1041M Free
Swap: 5953M Total, 5953M Free
It will first go into a state where Inactive memory is high and Free is low- under 100 MB. Then I might have a surge of activity, maybe running something in an open terminal window while still using firefox, and it'll use 10-11 MB of swap which will remain seemingly until the next reboot. When I go back to sddm after an X session, I'm usually using more Active memory than I was before I started the session, which might be partly from memory leaks, although it might also be attributable to other causes. Compared to a near identical setup I have running 11.2-RELEASE on a different partition of the same host machine, this 12.0-RELEASE system seems to be noticeably better at memory usage. This isn't to say that 11.2 wasn't efficient enough because it was, but 12.0 seems to run a little more briskly with fewer and shorter moments of lag time and less overall latency.
 
I also use Swap on ZFS, but not really heavy loaded:
Code:
top
Mem: 8822M Active, 20G Inact, 23G Wired, 280K Cache, 10G Free
ARC: 16G Total, 4633M MFU, 11G MRU, 5916K Anon, 117M Header, 510M Other
Swap: 16G Total, 6496K Used, 16G Free
Not very busy today :)

The swap partition is on ZFS:
Code:
zfs list
NAME                   USED  AVAIL  REFER  MOUNTPOINT
...
zroot/swap            16.5G  54.7G   116M  -
...

I had one swapping related crash back in 9.x days.
Now I disable both caches on swap, and had no swap related issue anymore.
Code:
zfs get all zroot/swap
NAME        PROPERTY              VALUE                  SOURCE
zroot/swap  type                  volume                 -
...
zroot/swap  primarycache          none                   local
zroot/swap  secondarycache        none                   local
...
zroot/swap  org.freebsd:swap      on                     local
 
Your line of questioning is a bit odd in my opinion. You say that many people use swap with ZFS but really, ZFS has little to do with it because even with UFS it is recommended to use swap ;)

I'm running FreeBSD on an older PowerEdge which had 2Gb memory and I set up a 4Gb swap. Within the ZFS pool:
Code:
peter@zefiris:/home/peter $ zfs list zroot/swap
NAME         USED  AVAIL  REFER  MOUNTPOINT
zroot/swap  8.25G  34.1G  8.08G  -
It's now 8 because some things changed over time. When it was still 4 I did occassionally run out of swap space but that only happened when I was building large ports, such as lang/gcc8 or devel/llvm70. The solution was to add some extra swap; I used a swapfile which I stored on a local UFS filesystem.

Still, this isn't really interesting because ZFS had little to do with it. After discovering something wrong with some of my memory banks I even ran this system with 1Gb of ram, still utilizing a large zfs hierarchy, and never ran into any serious issues. Not even whilst running an X environment.

What is much more interesting, in my opinion, is the question if you'd set up swap inside your ZFS pool or as an external partition, because there's something to say for both methods. If you add your swap to your ZFS pool while that ZFS pool is a mirror or raidz then this means your swap is also going to get propagated across the different vdevs, which could cause a little more stress on your system.

So with that in mind it might be more beneficial to use a separate swap partition. This would also come in handy if you had to perform extensive maintenance or recovery tasks from a rescue environment: that rescue environment would be able to utilize that same swap space as well (if your ZFS pool gets damaged then it obviously wouldn't make sense to try and use its included swap space).

Of course if you add your swap space to your ZFS pool then you can easily resize it at a later time when needed, you wouldn't be able to do that with an external swap partition.

Finally: a system doesn't really go "b00m" when it runs out of swap space. The kernel will detect this problem and kill any problematic processes to try and solve it. In my example of compiling gcc8 it simply killed the build process and that was the end of that. Everything just kept on running without issues, even my X environment (which made this problem tricky to find at first because there wasn't any mention of swap space in the build log, only when I examined dmesg did I notice the obvious).
 
I use swap on a compressed zvol. Compressing still saves bandwidth here. The only drawback is when you take a snapshot, the zvol is not COW there, so you need all the space free for a copy of the zvol.
 
Finally: a system doesn't really go "b00m" when it runs out of swap space. The kernel will detect this problem and kill any problematic processes to try and solve it. In my example of compiling gcc8 it simply killed the build process and that was the end of that. Everything just kept on running without issues, even my X environment (which made this problem tricky to find at first because there wasn't any mention of swap space in the build log, only when I examined dmesg did I notice the obvious).
As far I know the FreeBSD kernel does not kill processes in case of memory shortage. This is a Linux only behaver.

Its more like this:
Application: Kernel, give me memory!
Kernel: (I have much free memory, so...) Application, here is your memory.
Application: Kernel, give me memory!
Kernel: (I have no free memory, so let's push some rare used memory to swap first... well now I have free memory...) Application, here is your memory.
Application: Kernel, give me memory!
Kernel: (I have no free memory, so let's push some rare used memory to swap first... Oh! Swap is full!) Application, i can't give you memory!
Application: (I can't work without more memory, I quit!)
 
As far I know the FreeBSD kernel does not kill processes in case of memory shortage. This is a Linux only behaver.
From a /var/log/dmesg backup:

Code:
Failed to fully fault in a core file segment at VA 0x80e200000 with size 0x40000
00000 to be written at offset 0x416a000 for process baloo_file                  
pid 1136 (baloo_file), uid 1001: exited on signal 5 (core dumped)               
pid 24071 (c++), uid 0, was killed: out of swap space
I can't be bothered to look for even older backups, but I have encountered situations where a multitude of processes ended up getting killed, even unrelated ones.
 
Yep. Once upon a time I saw pid 1 being the one. Lots of free memory after that. :)

Edit: the lower pids now are protected. You may check pid 2 for an error message saying someone tried to kill the swapper. In that case, your system really would go boom.
 
Back
Top