contigmalloc and contigfree issue

Hi,

I am currently developing a driver, which uses quite large (2M+) contiguous memory chunks allocations via contigmalloc. I have noticed that contigmalloc as well as contigfree take a lot of time when dealing with large memory chunks and also a rather significant difference in number of cycles required for freeing the memory via contigfree between FreeBSD 11.3 and 12.1.

I have checked this on the same machine with both 11.3-p11 and 12.1-p7 by running contigmalloc (with and without M_NOWAIT) and contigfree in a loop 40 times and the average results are following:

11.3-p11

M_NOWAIT
contigmalloc 179112 contigfree 211302

No flag
contigmalloc 176541 contigfree 199462

12.1-p7

M_NOWAIT
contigmalloc 171769 contigfree 127562

No flag
contigmalloc 171974 contigfree 127448


Following code was used for measuring:

Code:
static void runTest() {

        const int pageSize = 4096;
        unsigned long size;
        void *ptr = NULL;
        unsigned long long t1, t2;
        int i = 0;
        unsigned long long avg_malloc = 0, avg_free = 0;
        int probes = 40;
        size = 2 * 1024 * 1024;

        for (i = 0; i < probes; i++) {

                t1 = rdtsc();
                ptr = contigmalloc(size, M_FOO, M_NOWAIT, 0, ~0,
                                   pageSize, 0);
                t2 = rdtsc();

                avg_malloc += t2 - t1;

                printf("contigmalloc cycles: %llu\n", t2-t1);

                if (ptr == NULL)
                        return;

                t1 = rdtsc();
                contigfree(ptr, size, M_FOO);
                t2 = rdtsc();
                avg_free += t2 - t1;
                printf("contigfree cycles: %llu\n\n", t2-t1);
                uprintf("Loop complete\n");
        }

        printf("contigmalloc %llu contigfree %llu\n", avg_malloc / probes, avg_free / probes);
}

The measurements were done in a following setup:

CPU: Intel(R) Atom(TM) CPU C3958 @ 2.00GHz (2000.06-MHz K8-class CPU)
FreeBSD/SMP: Multiprocessor System Detected: 16 CPUs

8GB RAM (2x4GB):
Type: DDR4
Type Detail: Synchronous Unbuffered (Unregistered)
Speed: 2400 MT/s

Could someone please provide some explanation of discussed performance of contigmalloc and contigfree? Also, what could be the reason for improved performance of contigfree on 12.1? I still have not figured out what is the source of this behavior. Is performance of contigmalloc and contigfree a known issue? I have not found any info about this problem on the forum.

Please find the complete source of the module used for measurements in attachment.

Thank you in advance for any help.

Kind regards,
Julian
 

Attachments

Could someone please provide some explanation of discussed performance of contigmalloc and contigfree?
I would suggest asking on the mailing lists. There are very few developers on the forums, so I doubt someone will be able to answer properly. The freebsd-drivers mailing lists seems to be the most appropriate in this case.
 
Back
Top