ZFS raidz2 read speed is much lower with compression enabled but I don't see cpu load

hello,

so I set up NAS: RAIDz2 of 8 x WD hc530 with special vdev (Samsung PM963), Ryzen 8400f, 32 GB

ashift=12, recordsize=4M, others are quite default

playing with compression types

I clearly see reasonable cpu load with compressions during write. But I don't see it for reading (while speed is much lower). You might say disks are limiting speed, but almost or no compression gives decent read speed.

No (or almost) compression gives 250-350 MB/s, zstd compression - 100 MB/s. But zstd with recordsize=16M gives 260 MB/s

If disks are able to deliver 300 MB/s, why can't they with compression which should only depend on cpu (and helps disks as there's less data to deal) ??? curiosity kills me :D

I do testing by copying 200 photos in raw format (10.2 GB) from zfs to ramdisk

compressionrecordsize, MBtime, sspeed, MB/scompressed, GB*
off42936010.2
zle4412549.8
lz44432429.7
lz416224749.7
zstd4891177.8
zstd8531978.1
zstd16392677.9
gzip4911157.2

* files (photos) are different, thus photo set for each case slightly varies in size (and compressibility)

edit: added column with compressed folder size
 
Presumably your raw photos are not compressible, or at least not by a lot.
...
well they are not quite - zstd results in 7.8 out of 10.2; but being or not compressible hardly does affect read speed, as I see.

speaking of bottleneck it always has been either CPU or disk (well ok, internet speed, thermal throttling or video card isn't the case here, ha-ha)
 
well they are not quite - zstd results in 7.8 out of 10.2; but being or not compressible hardly does affect read speed, as I see.

speaking of bottleneck it always has been either CPU or disk (well ok, internet speed, thermal throttling or video card isn't the case here, ha-ha)

I will admit that I have great difficulty debugging ZFS performance in these situations. While there are great tools to find out where you spend CPU time there isn't much (that I am aware of) to tell you who is waiting on what, when the CPU is idle during what but different kinds of idle.
 
RAIDz2 of 8 x WD hc530
That means 6 data disks, 2 redundancy disks.

If disks are able to deliver 300 MB/s, ...
Each disk is capable of doing 100-200 MB/s on the head. With 6 data disks, your array should be capable of much more than 300 MB/s, if your workload is intense enough. If you have enough parallelism to keep all data disks working, and ideally to have multiple IOs queued up on each disk so it can do seek optimization, you should be getting somewhere between 1/2 and 1 GB/s. Where is the bottleneck? I have no idea. As Cracauer said: Debugging and tuning storage performance is hard.
 
But doesn't this with lz4 compression rank as fastest?
I put (took into account) speeds for recordsize=16M later just FYI, the original situation was around rec size 4

Each disk is capable of doing 100-200 MB/s on the head ... you should be getting somewhere between 1/2 and 1 GB/s
the question was about invisible bottleneck for specific cases. But I did other kinds of tests today. For example for dd if='3gb-mkv' of=/dev/null I easily get 1+ GB/s for any compression and blocksize (bs for dd) from 512K to 4M (speeds don't vary much (1-1.4 GB/s)

Another test was writing of dd of=/dev/random (bs from 128K to 16M) it also was quite the same for compressions and whatever (480 - 530 MB/s); hardly (I hope) I missed it was being written later

Then I copied all those generated random data (4 x 2 GB) from zfs to ramdisk; speeds were even again 1-1.3 GB/s slightly favoring lz4,off and recordsize=16M (vs 4M)

The last one was of two parts:
1. copying 8 GB of mp3 (+some album art jpg) via 2.5g ethernet to nas
2. and then reading it off to ramdisk;
TO speeds were all around 200 MB/s. FROM zfs to ram were in range: from 220 for recordsize=4 to 350-400 for recordsize=16 (recordsize value was the most important here; probably compression(s) had been giving up quickly). It was interesting. Though files smaller than 1M (album art) was stored at special vdev ssd

All these test give me some vision for datasets setup (for different data). Results are interesting, but hardly surprising

Short conclusions so far:
- big files (1GB+) do not care much about recordsize or size of reading/writing block (bs for dd) (might be important for torrents)
- smaller files (10+ MB) love recordsize=16M

I'll add that I track CPU load with btop set to 100-200 ms refresh rate. For those raw (ARW) photos from Sony cameras reading tests cpu mostly stayed at 1.6-1.9 GHz with tiny spikes of load across cores. I believe it should be sufficient enough (to see cpu idling)
 

Attachments

  • Screenshot 2025-01-26 at 02.01.32.png
    Screenshot 2025-01-26 at 02.01.32.png
    140.3 KB · Views: 14
  • Screenshot 2025-01-26 at 02.04.16.png
    Screenshot 2025-01-26 at 02.04.16.png
    140.3 KB · Views: 14
and an update

I replaced special metadata vdev with Intel Optane 1600X. It wasn't connected directly to cpu though. But I have the same speeds for reading raw photos from zfs to ramdisk. I suspected the bottleneck were special vdev nvme(s). They are not.
 
Back
Top