How FreeBSD utilize multicore processors and multi-CPU systems?

I have a Pentium-D which is Dual-Core, it will say:

FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
FreeBSD/SMP: 1 package(s) x 2 core(s)


The second CPU is launched:

SMP: AP CPU #1 Launched!


and everything looks smooth!
 
Are you asking "does it work"? At least reasonably well. No, I have not done extensive benchmarks, nor run it on extreme hardware (like >100 cores or a dozen CPUs or highly NUMA architectures), but on run-of-the-mill single socket Intel/AMD it works good enough for amateur usage. I'm not sure about high performance applications.

This is from my home server:
Code:
FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
FreeBSD/SMP: 1 package(s) x 2 core(s) x 2 hardware threads
Firmware Warning (ACPI): 32/64X length mismatch in FADT/Gpe0Block: 128/64 (20171214/tbfadt-748)
ioapic0: Changing APIC ID to 4
ioapic0 <Version 2.0> irqs 0-23 on motherboard
SMP: AP CPU #1 Launched!
SMP: AP CPU #2 Launched!
SMP: AP CPU #3 Launched!

Are you asking "how is it implemented"? Read Marshall Kirk McKusick et al, "The Design and Implementation of the FreeBSD Operating System", 2nd edition (with the black cover with a daemon on it). The first few chapters talk about it.
 
With MPS 1.4 (or newer).
Thank You for reply! ;)

Specification are great. In the same time we all know how implementation in real hardware (motherboard + cpu) AND software impact on.

Common place the last 10+ years that software development running faster (because of industry rushing, and common data value dramatically increasing) than hardware manufacturer able to Engineering and producing motherboards+CPU.

So, each time when we need a effective and well-balanced solution, we need to “calling to all”: community of users of certain software, community of users of certain operating system, and of course forums/support of hardware manufacturer. :)
 
Are you asking "does it work"? At least reasonably well. No, I have not done extensive benchmarks, nor run it on extreme hardware (like >100 cores or a dozen CPUs or highly NUMA architectures), but on run-of-the-mill single socket Intel/AMD it works good enough for amateur usage. I'm not sure about high performance applications.
Thank You for kindly reply!

The start topic is only first step. :)

Because the main question are: how network-focused software (I’m interesting exactly in
1. firewall pfSense solution
2. balancing HAproxy solution
3. FreeNAS storage solution)
working on systems with multi-CPU systems (which support multi-threading and have 4-6-8-12 cores).

This is complex question because each solution have different software architecture, and different loading strategy on CPU, memory and data bus.

What a You think about this?

Are you asking "how is it implemented"? Read Marshall Kirk McKusick et al, "The Design and Implementation of the FreeBSD Operating System", 2nd edition (with the black cover with a daemon on it). The first few chapters talk about it.
Thank You so much! I’l try To find it.
 
Could You be so please to comment about manage iflib threads on several CPU cores, the last reply on this thread?

How pfSense utilize multicore processors and multi-CPU systems ?​

 
Last edited:
Hm. Looks like hard to find right answer...

I need a little bit to explain the topic start question:

What system is better for network-related operation (i.e. firewall, load balancing, gate, proxy, media stream,...):
a) 1 CPU with 4-10 cores, hi-frequency
b) 2-4 CPU with 4-6 cores, mid-frequency

And how the cache in CPU L2 (2-56Mb) and L3 (2-57Mb) impact on network-related operation (in cooperation with NIC card) ?
 
What system is better for network-related operation (i.e. firewall, load balancing, gate, proxy, media stream,...)
I don't have any answers but - won't it depend on the network and the bandwidth and the amount of traffic and type of traffic?

Any also what you are doing with the traffic - pushing it through as fast as possible? Or trying to analyse it and do more than just pushing packets?

If a LAN with max. 100 MB/s then it probably won't matter what machine or set-up you have - anything modern will (I think) cope with the network.
 
I don't have any answers but - won't it depend on the network and the bandwidth and the amount of traffic and type of traffic?
Because the different type of traffic involve different software chain react to. For example, media streaming packets, VPN sessions and ICMP are very different in processing inside software that are on the top of BSD. Am I wrong?

Any also what you are doing with the traffic - pushing it through as fast as possible? Or trying to analyse it and do more than just pushing packets?
From the networking device point of view the main goal are "processing packets w/o errors and as fast as possible".

My questions related more to situation when FreeBSD used as core of FW, VPN gate or balancer on usual Intel-based servers.
If a LAN with max. 100 MB/s then it probably won't matter what machine or set-up you have - anything modern will (I think) cope with the network.

The speeds we must talking about are starting from 10-20Gb / s
 
Not sure you will get answers to those sorts of questions on these forums.

There's a book:

Design and Implementation of the FreeBSD Operating System, The 2nd Edition

And Netflix work on FreeBSD and networking e.g.


Using FreeBSD and commodity parts, we achieve 90 Gb/s serving TLS-encrypted connections with ~55% CPU on a 16-core 2.6-GHz CPU.

Pretty sure there are other Netflix papers on working with FreeBSD and NUMA etc.

Have a look at https://papers.freebsd.org/ e.g.


I'm not sure if you are just trying to learn how things work or if you have a specific requirement or issue that you need to fix - maybe if you are more specific then someone can help.
 
I know this is a subjective topic but I prefer Single Socket server board.
The second cpu does not bring a linear acceleration. There is a preformance hit for dual cpu.
Witness the synthetic benchmarks.
Single CPU = 11K
Same CPU dual =18K

But where a dual CPU configuration can help is PCIe lanes. Typical Xeon had 40 Lanes. with 2 CPU that means 80 lanes.
For a setup requiring I/O this can be important. The newer LGA3647 Xeon has 48 Lanes.
AMD EPYC has 128 Lanes.

So the single EPYC/2 will smash most Dual CPU setups.

There are benefits to single CPU. Interprocess communication kept on die is superior.
 
And that is why there are DUAL CORE EPYC Boards now.... ;) Enjoying the boundaries being push let's go quantum!!!


Well I have installed FreeBSD on some 8-32 core processors performs relative well.... 4-8% with very heavy media use as desktop environment(100+ tabs of firefox, 40+ chrome, 20+ terminal, 2+ VM) sometimes ram becomes an issue but not CPU on my experience barely breaks 10% ever. :rolleyes:
 
Yea but did you notice the benchmarks?
Single EPYC=33K
Dual EPYC=40K
Yikes imagine buying 2 chips at $4K each and only getting marginal increase....

So is this a testing flaw? Passmark is a Windows thing so not representative of FreeBSD.
But I do feel that NUMA drags pretty hard.

Intel uses QPI for its core interconnect and it is quick. Going off die is costly.
 
Yea but did you notice the benchmarks?
Single EPYC=33K
Dual EPYC=40K
Yikes imagine buying 2 chips at $4K each and only getting marginal increase....
This is a nice catch, and if what the benchmark say is true, I would be MAD as hell :rude:

Honestly though, I think that is a software limitation or bottleneck (Maybe WINDOWS? like you mention )... At the end of the day we need to put them in production environment and let these thing bleed. I like to test in real world environment, call me a benchmark skeptic.

I like the VCORE for cloud's future, the cost is being driven to the ground... I mean pretty soon everyone and their mother will have a VM for a computer, it will just be the most economical way. At these scales, I mean: 128v cores or 256v cores NO ONE in the home will use anything close to 50% of what these CPU will be able to do. I mean yes, :-/ you could use it for crypto mining.
 
I should mention the 7302p EPYC that I posted is middle of the road. $1000 chip not $4000 like their champ, the 64 core EPYC 7702

Threadripper variants 3990X/3995WX are the only thing that tops this. Same price range.
With the PCIe-4 and 128 lanes EPYC really has some legs. Stomping all over their competitor.
14 AMD Chips at the top of the charts. Intels top offering there at $7K for a 15th place CPU.
 
Threadripper variants 3990X/3995WX are the only thing that tops this. Same price range.
With the PCIe-4 and 128 lanes EPYC really has some legs. Stomping all over their competitor.
14 AMD Chips at the top of the charts. Intels top offering there at $7K for a 15th place CPU.
Yup is sad.... Intel lost it, I switch to AMD...

Even GPU they're pushing NVIDIA for first time I am glad, literally have been a slave to nvidia... Actually still am due to NVIDIA support with Freebsd :rolleyes:.. But at least now I give AMD a look ?
 
What system is better for network-related operation (i.e. firewall, load balancing, gate, proxy, media stream,...):
OT, you might already know this... but I hope you do not intend to put the (external) packet filter (often loosely called "firewall") onto the same physical machine than other services. Don't do that. It must be on it's own physical machine, solely for that purpose, and no other services on that host. In contrast, you can merge the internal PF onto the same machine as a DMZ host (with gateway services (proxy, load balancer, mail etc.) jailed or in VMs), but not the external one.
 
In contrast, you can merge the internal PF onto the same machine as a DMZ host (with gateway services (proxy, load balancer, mail etc.) jailed or in VMs), but not the external one.
Thank You for informative reply.

Just from my second post in this thread:

Because the main question are: how network-focused software (I’m interesting exactly in
1. firewall pfSense solution
2. balancing HAproxy solution
3. FreeNAS storage solution)
working on systems with multi-CPU systems (which support multi-threading and have 4-6-8-12 cores).

This is complex question because each solution have different software architecture, and different loading strategy on CPU, memory and data bus.

That mean that we speak about one physical machine.

Of coarse for many reasons (sustainability, redundancy, point of failure, etc...) some functions better keep on separate machines: Firewall+router+DPI on one, balancer+ssl on another, etc...

In this thread I just try to receive the answer for “numbers of cpu, numbers of cores VS main frequency in FreeBSD for routing packets, analyzing packets, enc/decrypting packets and deals with RAID controllers to handle databases/VMs”
 
But where a dual CPU configuration can help is PCIe lanes. Typical Xeon had 40 Lanes. with 2 CPU that means 80 lanes.
For a setup requiring I/O this can be important.

There are benefits to single CPU. Interprocess communication kept on die is superior.
There are another very important practical side of this choice: financial!

I try to explain the key moments:
- price tag on 2 of “previous generation CPU” ALWAYS would bd LESS than price on 1 “new and hottest CPU” and ALWAYS give You much horsepower for comparable or less money;
- the same as previous according RAM: biggest volume for comparable price with not big drawbacks in speed. And prices for upgrade (because You need the same volume modules) also would be less;
- well manufactured rack mount servers from reputable brand (like IBM, Dell, Siemens/Fuji) would be ALWAYS MUCH STABLE in working;
- the same rack servers would be ALWAYS having much more lifetime;
- the QUALITY OF POWER in this rack servers ALWAYS would be better (even w/o Online-Interactive UPS). And in addition POWER source MODULES ARE DOUBLED (hotspot);

Am I wrong with this?
 
And that is why there are DUAL CORE EPYC Boards now.... ;) Enjoying the boundaries being push let's go quantum!!!


Well I have installed FreeBSD on some 8-32 core processors performs relative well.... 4-8% with very heavy media use as desktop environment(100+ tabs of firefox, 40+ chrome, 20+ terminal, 2+ VM) sometimes ram becomes an issue but not CPU on my experience barely breaks 10% ever. :rolleyes:
Surfing and SysAdmin? ;)

Heh! Just install xCode, Lightroom, Logic, Motion, FinalCut with I normal bunch of plug-ins, or Premier, and You would be impressed by the CPU loading…
 
- the same as previous according RAM: biggest volume for comparable price with not big drawbacks in speed. And prices for upgrade (because You need the same volume modules) also would be less;
Many workloads run MUCH better when there is a lot of RAM available. In particular those that have a wide file system working set, and benefit from file system caching. This is something that really needs benchmarking (with your real workload and your real file system / OS) to check.

Other workloads have a smaller working set (both of the program itself, and its file system cache usage), and then faster RAM is more important, since more RAM simply doesn't help.

The fastest single machine I've ever used at work had 1/2 TB of RAM (and several dozen PCI lanes); for the workload it was amazingly powerful (and I have no idea how much it cost, but I remember that each DIMM was $3K, and it had several dozen of them). My home server has 4 GB and a 4-core Atom CPU. The latter is optimized for low power consumption, small physical space, and reasonable cost.

- well manufactured rack mount servers from reputable brand (like IBM, Dell, Siemens/Fuji) would be ALWAYS MUCH STABLE in working;
- the same rack servers would be ALWAYS having much more lifetime;
- the QUALITY OF POWER in this rack servers ALWAYS would be better (even w/o Online-Interactive UPS). And in addition POWER source MODULES ARE DOUBLED (hotspot);
This is actually a very important observation. Amateur computers, built using desk side cases, cheap fans (often only 1 or 2 fans), and whatever power supply is on sale at NewEgg this week, tend to be somewhat unreliable. Enterprise-class servers may have less CPU power or slower RAM, but they may up for it by having about 10 or 12 fans (each individually pretty small), dual power supplies (which can be connected to two independent power sources, like one utility power + UPS, the other a generator-protected power source), and N+1 redundancy in fans and internal power distribution. Plus if you stay within a single brand (you buy all your expansion cards from the same vendor as the rack mount computer and motherboard), they tend to have very good BIOS support, for example for fan speed control. They are pretty much indestructible, exceedingly reliable (physical uptimes of a decade are common), and can be well managed (like integrating BIOS monitoring into an alarm infrastructure). The drawback is: bought new they are very expensive, and they tend to be very noisy.

Mandatory anecdote: When we got a set of new servers in the office, they had to be shipped without memory, because the server itself came from one color of money (capital investment), while the DIMMs could be bought from another color of money (as they were under $5K each, we bought them using the same pot of money used for office supplies like pencils). Which meant that someone had to volunteer to spend an afternoon in the computer room installing about a hundred DIMMs. Since I'm a fool, I did it. And one of the servers was installed at the top of the rack, so I had to use a ladder to install the DIMMs. Because getting the DIMMs up the ladder was painful, I unpacked them on the floor, then carried them up on the ladder a few at a time, without wearing an anti-static grounding strap. And managed to kill one of the DIMMs. Since we were an in-house customer, we had no warranty coverage, and my manager had to buy one extra DIMM for $3K, and he was REALLY MAD AT ME, since that looked bad on his budget. The lesson is: When dealing with expensive stuff, always wear anti-static wrist straps, connect them to the machine you're working on, and transport components in their original packaging until you're plugged in. Not sure that rule applies when using a $5 Raspberry Pi Zero though.
 
At the first I need to say BIG THANKS TO YOU about patience and so detailed answering!

I more than sure the so detailed conversations help others “BSD’s geeks” or SysAdmins over the world, really!

TOTALLY AGREE with all You wrote!

But with a few corrections:
if you stay within a single brand (you buy all your expansion cards from the same vendor as the rack mount computer and motherboard), they tend to have very good BIOS support
Not so sure: I have experience personally and a lot of facts on users forums that this ‘good and well tested BIOS support” are just not more than “really good compatibility between parts in a limited set”.
Where quality of this set hardly belongs from marketing department’s means.
As e result even in good and reputable brands we have situation “some previous bugs + some new ones”. ;)

But anyway they are only one choice if You need stability and amazing lifetime.


The drawback is: bought new they are very expensive,
Not agree! Look at the IBM M3/M4 series on eBay: $150-250/each (with 2xPSU, not bad RAID, eventually good CPUs and amount of RAM)+ shipping.

Only fraction of managers really know HOW MUCH may serve this one M4 (or even M3) 10+ years old server!
And newcomers DevOps only training their ego and spending company’s budget by using bento “AWS + recipes from internet” to building a ton of VMs…
Mad new IT-world…!

and they tend to be very noisy.
But we using it in DC, or separate server room.
Amateurs have a garage in s basement floor or up roof room…:)

P.S.
Anecdote make my day a little more shiny! Thank You!
Of coarse, serious equipment need serious mindful.
 
Back
Top