Good Morning All,
I have been experiencing some mysterious thrashes on one of my servers, and I have finally gathered enough data about the problem to possibly ask for help (it was very difficult to get data since once it started thrashing there was nothing to be done except reboot...).
The machine has 32 GB of RAM, and an 8 GB swap partition with a latest generation Xeon 1270. Its tasked with running 5 instances of selenium+firefox in jails in an automated fashion. Initially, I had seen problems/thrashing from Firefox grabbing too much RAM (at one point I saw a firefox process with 14.9 GB of reserved memory... easily the largest memory I have ever seen a single process grab). I deduced this was due to the new Firefox quantum versions being more RAM heavy to improve speed, and the fact you are not really supposed to run more than one instance of firefox at a time (the firefox launcher will prevent you from launching a second instance if it can see another, in this case it did not see the other instances because they were in seperate jails). I attempted to solve the problem with rctl() rules to limit each jail to 4 GB of RAM. This was successful in eliminating the massive memory grabs from Firefox, but did not eliminate the thrashes.
I was able to watch this play out in top() and found that with the rctl limit in place, Firefoxes never exceeded 4 GB, but when they attempted to grab memory, the "Laundry" memory would spike dramatically. I watched it climb to 25 GB with 50% of the swap used before the machine froze up in a thrash state.
I suspect rctl may not be the best way to handle the memory limits for the jails, or perhaps there is a way to tune the system to prioritize freeing "Laundry", although I am not really clear as to exactly what circumstances cause for FreeBSD to mark a page as "Laundry" instead of "Inactive".
Unfortunately, this is extremely difficult to reproduce as it happens at the whim of Firefox going wild with the malloc calls. As always, I am most thankful for your expertise.
I have been experiencing some mysterious thrashes on one of my servers, and I have finally gathered enough data about the problem to possibly ask for help (it was very difficult to get data since once it started thrashing there was nothing to be done except reboot...).
The machine has 32 GB of RAM, and an 8 GB swap partition with a latest generation Xeon 1270. Its tasked with running 5 instances of selenium+firefox in jails in an automated fashion. Initially, I had seen problems/thrashing from Firefox grabbing too much RAM (at one point I saw a firefox process with 14.9 GB of reserved memory... easily the largest memory I have ever seen a single process grab). I deduced this was due to the new Firefox quantum versions being more RAM heavy to improve speed, and the fact you are not really supposed to run more than one instance of firefox at a time (the firefox launcher will prevent you from launching a second instance if it can see another, in this case it did not see the other instances because they were in seperate jails). I attempted to solve the problem with rctl() rules to limit each jail to 4 GB of RAM. This was successful in eliminating the massive memory grabs from Firefox, but did not eliminate the thrashes.
I was able to watch this play out in top() and found that with the rctl limit in place, Firefoxes never exceeded 4 GB, but when they attempted to grab memory, the "Laundry" memory would spike dramatically. I watched it climb to 25 GB with 50% of the swap used before the machine froze up in a thrash state.
I suspect rctl may not be the best way to handle the memory limits for the jails, or perhaps there is a way to tune the system to prioritize freeing "Laundry", although I am not really clear as to exactly what circumstances cause for FreeBSD to mark a page as "Laundry" instead of "Inactive".
Unfortunately, this is extremely difficult to reproduce as it happens at the whim of Firefox going wild with the malloc calls. As always, I am most thankful for your expertise.