Why would `make world` slow down with dual channel RAM?

cracauer@

Developer
Slightly puzzled here. I have some laptops with 16 GB soldered and a DIMM slot. If you run them with a 16 GB DIMM you get dual channel for a total of 32 GB, if you put a 32 GB DIMM you get a total of 48 GB but lose dual channel.

I verified that the dual-channel config has more bandwidth using the Stream benchmark:
Code:
32gb-dual-channel-1.stream:Triad:      19917.5497       0.0025       0.0024       0.0035
32gb-dual-channel-2.stream:Triad:      19974.8578       0.0024       0.0024       0.0024
48gb-single-channel-1.stream:Triad:      10825.7564       0.0045       0.0044       0.0045
48gb-single-channel-2.stream:Triad:      10530.7350       0.0046       0.0046       0.0047
So far so good.

But `make buildworld buildkernel` is faster on the single channel combo. There is plenty free RAM, the 32 GB dual-channel combo does not run out of RAM.
Code:
32gb-dual-channel-B1.log:1:41:27 6087.66 real 91571.23 user 3175.95 sys 1556% CPU 121753/374592621 faults
32gb-dual-channel-B2.log:1:42:32 6152.17 real 92415.25 user 3237.12 sys 1554% CPU 116197/374577008 faults
32gb-dual-channel-B3.log:1:42:02 6122.60 real 91941.23 user 3336.57 sys 1556% CPU 126880/375855137 faults
32gb-dual-channel-B4.log:1:42:54 6174.30 real 92468.09 user 3671.83 sys 1557% CPU 118462/374585003 faults
48gb-single-channel-B1.log:1:36:41 5801.88 real 87072.46 user 3137.95 sys 1554% CPU 121707/374534669 faults
48gb-single-channel-B2.log:1:37:22 5842.06 real 87530.63 user 3215.07 sys 1553% CPU 116238/374487863 faults
48gb-single-channel-B3.log:1:33:24 5604.63 real 84040.38 user 3030.13 sys 1553% CPU 121721/374601905 faults
48gb-single-channel-B4.log:1:32:28 5548.32 real 82996.29 user 3049.67 sys 1550% CPU 116219/374577978 faults

What the heck?
 
Memory timings could be different too right?

If they are not matched with soldered on RAM I could see problems right there.

Mismatched timings. Maybe defaults to slowest timing or worse for compatibility.
 
I am a little shocked you can see 48GB. I know many laptops didn't go above 32GB.

But in that case you have memory size mismash as well as timing.

I am a RAM matching fanatic. Just like drives. Down to identical firmware please.

I look at the numbers after the part number. You know what I mean.
 
Looking at the settings for RAS. Maybe when you add second module you get LockStep Mode
RAS Mode
When Independent is selected, all memory channels will operate independently.
When Mirror is selected, the motherboard maintains two identical copies of all
data in memory for data backup. When Lockstep is selected, the motherboard
uses two areas of memory to run the same set of operations in parallel to boost
performance. The options are Independent, Mirror, and Lockstep Mode.
 
No, both cases had RAM reported free. The lower case is 32 GB.

If lack of RAM capacity had been a problem the CPU busy stats would have gone down and the pagefault stat up.
 
What about filesystem caches? Like ZFS ARC stats or tmpfs size if you're building in memory? IIRC ZFS scales its ARC according to available system memory.
 
CPU time can easily go up a bit (~5% here?) if there's more juggling between processes, e.g. due to IO latencies / less disk cache. Not sure this is a good explanation in your case, though.
 
As another interesting observation, the `make world` time goes up significantly when I properly close the laptop case with all screws:

Code:
32gb-dual-channel-B1.log:1:41:27 6087.66 real 91571.23 user 3175.95 sys 1556% CPU 121753/374592621 faults
32gb-dual-channel-B2.log:1:42:32 6152.17 real 92415.25 user 3237.12 sys 1554% CPU 116197/374577008 faults
32gb-dual-channel-B3.log:1:42:02 6122.60 real 91941.23 user 3336.57 sys 1556% CPU 126880/375855137 faults
32gb-dual-channel-B4.log:1:42:54 6174.30 real 92468.09 user 3671.83 sys 1557% CPU 118462/374585003 faults
48gb-single-channel-B1.log:1:36:41 5801.88 real 87072.46 user 3137.95 sys 1554% CPU 121707/374534669 faults
48gb-single-channel-B2.log:1:37:22 5842.06 real 87530.63 user 3215.07 sys 1553% CPU 116238/374487863 faults
48gb-single-channel-B3.log:1:33:24 5604.63 real 84040.38 user 3030.13 sys 1553% CPU 121721/374601905 faults
48gb-single-channel-B4.log:1:32:28 5548.32 real 82996.29 user 3049.67 sys 1550% CPU 116219/374577978 faults
48gb-single-channel-C1.log:1:49:35 6575.57 real 99149.79 user 3412.55 sys 1559% CPU 121820/374620500 faults
48gb-single-channel-C2.log:1:49:44 6584.74 real 99074.71 user 3478.49 sys 1557% CPU 116207/374569222 faults

The B<n> times are with loose case, the C<n> times with properly set case.
 
Wild guess: is that 16GB module samsung and the 32GB another vendor?
Samsung memory just as their flash drives are space heaters - I avoid them like the plague for that reason. *Especially* because their "superior" on-paper-speeds are quickly diminished when reaching working temperature - i.e. thermal throttling (or lots of ECC errors for memory...).
First time I observed that with memory was when upgrading from 8x16GB Samsung to 8x32GB Micron in my home server and noticing that power draw was the same or even a few watts less, system temperatures also dropped and even 'smaller' (in terms of number of ports/parallel builds) poudriere bulk jobs that also wouldn't have filled up the 128GB of RAM finished considerably faster.
 
The 16 GB module is SK Hynix 2666 MHz. The other module (G.Skill ripjaw, can't check the chips right now) and the soldered RAM are 3200 MHz.

This is still not an explanation since the Stream benchmark results report higher throughput for the 16 GB module.
 
As another interesting observation, the `make world` time goes up significantly when I properly close the laptop case with all screws:

Code:
32gb-dual-channel-B1.log:1:41:27 6087.66 real 91571.23 user 3175.95 sys 1556% CPU 121753/374592621 faults
32gb-dual-channel-B2.log:1:42:32 6152.17 real 92415.25 user 3237.12 sys 1554% CPU 116197/374577008 faults
32gb-dual-channel-B3.log:1:42:02 6122.60 real 91941.23 user 3336.57 sys 1556% CPU 126880/375855137 faults
32gb-dual-channel-B4.log:1:42:54 6174.30 real 92468.09 user 3671.83 sys 1557% CPU 118462/374585003 faults
48gb-single-channel-B1.log:1:36:41 5801.88 real 87072.46 user 3137.95 sys 1554% CPU 121707/374534669 faults
48gb-single-channel-B2.log:1:37:22 5842.06 real 87530.63 user 3215.07 sys 1553% CPU 116238/374487863 faults
48gb-single-channel-B3.log:1:33:24 5604.63 real 84040.38 user 3030.13 sys 1553% CPU 121721/374601905 faults
48gb-single-channel-B4.log:1:32:28 5548.32 real 82996.29 user 3049.67 sys 1550% CPU 116219/374577978 faults
48gb-single-channel-C1.log:1:49:35 6575.57 real 99149.79 user 3412.55 sys 1559% CPU 121820/374620500 faults
48gb-single-channel-C2.log:1:49:44 6584.74 real 99074.71 user 3478.49 sys 1557% CPU 116207/374569222 faults

The B<n> times are with loose case, the C<n> times with properly set case.
Can it be grounding ? Interference ?
 
Can you name names here? Is this HP laptop? They have dubious record of poisoning things. Think bios whitelist.

Is the 16GB that works faster a HP Part Number stick?

If laptop manufacturer not HP same thing.

Mem has eeprom. The BIOS can tell if HP sticks. Maybe they degraded speed on OEM sticks. Look at brand and part numbers.

Just a tin hat thing. Once bitten twice shy.
 
Can you name names here? Is this HP laptop? They have dubious record of poisoning things. Think bios whitelist.

Is the 16GB that works faster a HP Part Number stick?

If laptop manufacturer not HP same thing.

Mem has eeprom. The BIOS can tell if HP sticks. Maybe they degraded speed on OEM sticks. Look at brand and part numbers.

Just a tin hat thing. Once bitten twice shy.

No problem. It is a Thinkpad T14 gen 1 with an AMD 7 PRO 4750U CPU. Coincidentally my HP Laptop died a short while ago.

You misunderstand. The 16 GB module results in slower `make world` than the 32 GB module.

The supposedly slow 16 GB DIMM is Lenovo branded with SK Hynix chips. The 32 GB DIMM is G.Skill Ripjaw, can't check the chips right now.

Yeah, I think it is time to read the SPD info off that module. I wonder where I put the Winblows install for that laptop...
 
Shouldn't it stop working in that case? And why would it get faster, not slower, with interference?

Is there a way to read the DIMM's SPD info in FreeBSD? I don't want to drive this as far as re-activating Windows.
I read your post wrong i thought it got faster.
Im not sure but can it be heat ?
Did you tried to run memtest ?
 
I read your post wrong i thought it got faster.
Im not sure but can it be heat ?
Did you tried to run memtest ?

The 16 GB module that leads to 32 GB total with dual channel has faster Stream time and slower `make world` time.
|
The 32 GB module that leads to 48 GB total with single channel has slower Stream benchmark time (as expected), but faster `make world` time (not expected).

Heat plays a role, witness the increased time with the case screwed close. But the tests in my OP are all very carefully done in a climate controlled setting.

I did not try memtest86 yet. I don't suspect memory errors since it survived dozens of `make world` runs. But maybe memtest86 can give me the timing/latency information.
 
The T14 does support 32GB in the slot (for 48GB total). The 16GB (for 32GB total) is just the most that Lenovo will pre-configure/sell it to you with. Note that if you put a higher-capacity module into the upgradable slot than is soldered in the other slot, only part of the memory will be dual-channel (double bandwidth for reads/writes). For example, 16GB soldered + 32GB module gives you 32GB of dual-channel memory and the remaining 16GB are single-channel. In practice, this won't matter much except for workloads that rely to an unusual extent on memory bandwidth.
Hope this helps a bit
 
Back
Top