Hi,
I am currently developing a driver, which uses quite large (2M+) contiguous memory chunks allocations via contigmalloc. I have noticed that contigmalloc as well as contigfree take a lot of time when dealing with large memory chunks and also a rather significant difference in number of cycles required for freeing the memory via contigfree between FreeBSD 11.3 and 12.1.
I have checked this on the same machine with both 11.3-p11 and 12.1-p7 by running contigmalloc (with and without M_NOWAIT) and contigfree in a loop 40 times and the average results are following:
11.3-p11
M_NOWAIT
contigmalloc 179112 contigfree 211302
No flag
contigmalloc 176541 contigfree 199462
12.1-p7
M_NOWAIT
contigmalloc 171769 contigfree 127562
No flag
contigmalloc 171974 contigfree 127448
Following code was used for measuring:
The measurements were done in a following setup:
CPU: Intel(R) Atom(TM) CPU C3958 @ 2.00GHz (2000.06-MHz K8-class CPU)
FreeBSD/SMP: Multiprocessor System Detected: 16 CPUs
8GB RAM (2x4GB):
Type: DDR4
Type Detail: Synchronous Unbuffered (Unregistered)
Speed: 2400 MT/s
Could someone please provide some explanation of discussed performance of contigmalloc and contigfree? Also, what could be the reason for improved performance of contigfree on 12.1? I still have not figured out what is the source of this behavior. Is performance of contigmalloc and contigfree a known issue? I have not found any info about this problem on the forum.
Please find the complete source of the module used for measurements in attachment.
Thank you in advance for any help.
Kind regards,
Julian
I am currently developing a driver, which uses quite large (2M+) contiguous memory chunks allocations via contigmalloc. I have noticed that contigmalloc as well as contigfree take a lot of time when dealing with large memory chunks and also a rather significant difference in number of cycles required for freeing the memory via contigfree between FreeBSD 11.3 and 12.1.
I have checked this on the same machine with both 11.3-p11 and 12.1-p7 by running contigmalloc (with and without M_NOWAIT) and contigfree in a loop 40 times and the average results are following:
11.3-p11
M_NOWAIT
contigmalloc 179112 contigfree 211302
No flag
contigmalloc 176541 contigfree 199462
12.1-p7
M_NOWAIT
contigmalloc 171769 contigfree 127562
No flag
contigmalloc 171974 contigfree 127448
Following code was used for measuring:
Code:
static void runTest() {
const int pageSize = 4096;
unsigned long size;
void *ptr = NULL;
unsigned long long t1, t2;
int i = 0;
unsigned long long avg_malloc = 0, avg_free = 0;
int probes = 40;
size = 2 * 1024 * 1024;
for (i = 0; i < probes; i++) {
t1 = rdtsc();
ptr = contigmalloc(size, M_FOO, M_NOWAIT, 0, ~0,
pageSize, 0);
t2 = rdtsc();
avg_malloc += t2 - t1;
printf("contigmalloc cycles: %llu\n", t2-t1);
if (ptr == NULL)
return;
t1 = rdtsc();
contigfree(ptr, size, M_FOO);
t2 = rdtsc();
avg_free += t2 - t1;
printf("contigfree cycles: %llu\n\n", t2-t1);
uprintf("Loop complete\n");
}
printf("contigmalloc %llu contigfree %llu\n", avg_malloc / probes, avg_free / probes);
}
The measurements were done in a following setup:
CPU: Intel(R) Atom(TM) CPU C3958 @ 2.00GHz (2000.06-MHz K8-class CPU)
FreeBSD/SMP: Multiprocessor System Detected: 16 CPUs
8GB RAM (2x4GB):
Type: DDR4
Type Detail: Synchronous Unbuffered (Unregistered)
Speed: 2400 MT/s
Could someone please provide some explanation of discussed performance of contigmalloc and contigfree? Also, what could be the reason for improved performance of contigfree on 12.1? I still have not figured out what is the source of this behavior. Is performance of contigmalloc and contigfree a known issue? I have not found any info about this problem on the forum.
Please find the complete source of the module used for measurements in attachment.
Thank you in advance for any help.
Kind regards,
Julian