Hi. I'm building a new system for a small data warehouse and have been testing disk performance in various zpool configurations using up to 14 drives. Every configuration seems to be performing as expected except for sequential reads across mirror sets.
Could somebody please either correct my expectations or perhaps provide some performance tips. I've used ZFS for years but this is the first time where system performance is important.
Here's a basic summary of my findings.
Overall I was impressed with ZFS when striping reads and writes across VDEVs. And I was surprised at how well the various RAIDz configurations performed. Everything in these configurations performed as expected based an single drive performance (when allowing for some overhead and accounting for various parity levels).
Writing to a mirrored VDEV seems acceptable at ~150 MiB/s. However resilvering and scrubbing both occurred at >180 MiB/s so I imagine faster sequential write speeds are theoretically attainable.
Read performance from a mirrored VDEV is disappointing. I would have expected something close to maybe >80% of the theoretical maximum but the results show more like 55%.
You can see that reads from an 8-drive zpool of double mirrored VDEVs are slower than reads from a 9-drive zpool of triple mirrored VDEVs. This seems fundamentally flawed. And comparing the 8-drive RAIDz2 pool (having 6 data drives only) with the 8-drive 2-way mirror you would expect the mirrored pool to outpace the RAIDz.
It seems that each mirror-set is performing roughly as if it were a single drive rather than as a striped set for reads. Monitoring reads with 'zpool iostat -v' and 'gstat' anecdotally confirms that the individual drives appear to be under-utilized in these configurations (55-65% usage under mirrors whereas under RAIDz the per-drive utilization is a fairly constant 99-100%). Random seeks are definitely better with mirrors but don't seem to improve with triple mirrors. Performance does not improve with multiple concurrent reader threads.
Can anybody knowledgeable about the internals of how ZFS is implemented on FreeBSD shed some light on whether this is expected behavior? It seems that a lot of read performance is being left on the table. Is there any detailed information about this?
Otherwise are there any settings I should look at tweaking? I know I could present a set of hardware-RAID mirrors to ZFS and use those instead (this should boost performance as expected but I would prefer not to for many reasons).
The numbers are making me seriously consider RAIDz over mirrors for at least part of the database - which seems all kinds of wrong.
These values were for sequential reads/writes to data at the beginning / "fast part" of the disk. These are just very basic tests using dd reading from /dev/zero or /dev/random. I've listed only the 1MiB block size results. I do have more comprehensive tests and their results mirror the values above.
I'm running a fresh install of FreeBSD 11.0 on brand new commodity hardware using brand new 3TB SATA HDDs (4096 sector size) connected to a mixture of both onboard and PCIe SATA adaptors.
I have confirmed that the data was fairly evenly spread across all drives during the tests and reads per drive were pretty much exactly the same. "zdb -m" confirms that data was being read from the beginning of each disk.
Most system/ZFS settings were left as default other than:
ahci_load="YES"
vfs.vmiodirenable
vfs.zfs.min_auto_ashift=12
ashift: 12 (confirmed via "zdb" and alignment looks correct also)
compression=off
atime=off
recordsize={various} - larger size does improve throughput although only by 10% ish
Could somebody please either correct my expectations or perhaps provide some performance tips. I've used ZFS for years but this is the first time where system performance is important.
Here's a basic summary of my findings.
Code:
Raw/single drive speed:
write 180 MiB/s
read 200 MiB/s
9x1 striped set - (9 disks total, no redundancy)
write 1440 MiB/s (160 MiB /data-drive)
read 1691 MiB/s ([B]188 MiB /data-drive[/B])
8x RAIDz2 - (8 disks total, 2 parity drives)
write 922 MiB/s (154 MiB /data-drive)
read 1031 MiB/s ([B]172 MiB /data-drive[/B])
4x2 mirror - (8 disks total, 2 drives per mirror)
write 613 MiB/s (152 MiB /data-set - expecting 4x write gain)
read 804 MiB/s ([B]101 MiB /data-drive[/B] - expecting 8x read gain)
3x3 mirror - (9 disks total, 3 drives per mirror)
write 442 MiB/s (147 MiB /data-set - expecting 3x write gain)
read 686 MiB/s ([B]76 MiB /data-drive[/B] - expecting 9x read gain)
Overall I was impressed with ZFS when striping reads and writes across VDEVs. And I was surprised at how well the various RAIDz configurations performed. Everything in these configurations performed as expected based an single drive performance (when allowing for some overhead and accounting for various parity levels).
Writing to a mirrored VDEV seems acceptable at ~150 MiB/s. However resilvering and scrubbing both occurred at >180 MiB/s so I imagine faster sequential write speeds are theoretically attainable.
Read performance from a mirrored VDEV is disappointing. I would have expected something close to maybe >80% of the theoretical maximum but the results show more like 55%.
You can see that reads from an 8-drive zpool of double mirrored VDEVs are slower than reads from a 9-drive zpool of triple mirrored VDEVs. This seems fundamentally flawed. And comparing the 8-drive RAIDz2 pool (having 6 data drives only) with the 8-drive 2-way mirror you would expect the mirrored pool to outpace the RAIDz.
It seems that each mirror-set is performing roughly as if it were a single drive rather than as a striped set for reads. Monitoring reads with 'zpool iostat -v' and 'gstat' anecdotally confirms that the individual drives appear to be under-utilized in these configurations (55-65% usage under mirrors whereas under RAIDz the per-drive utilization is a fairly constant 99-100%). Random seeks are definitely better with mirrors but don't seem to improve with triple mirrors. Performance does not improve with multiple concurrent reader threads.
Can anybody knowledgeable about the internals of how ZFS is implemented on FreeBSD shed some light on whether this is expected behavior? It seems that a lot of read performance is being left on the table. Is there any detailed information about this?
Otherwise are there any settings I should look at tweaking? I know I could present a set of hardware-RAID mirrors to ZFS and use those instead (this should boost performance as expected but I would prefer not to for many reasons).
The numbers are making me seriously consider RAIDz over mirrors for at least part of the database - which seems all kinds of wrong.
These values were for sequential reads/writes to data at the beginning / "fast part" of the disk. These are just very basic tests using dd reading from /dev/zero or /dev/random. I've listed only the 1MiB block size results. I do have more comprehensive tests and their results mirror the values above.
I'm running a fresh install of FreeBSD 11.0 on brand new commodity hardware using brand new 3TB SATA HDDs (4096 sector size) connected to a mixture of both onboard and PCIe SATA adaptors.
I have confirmed that the data was fairly evenly spread across all drives during the tests and reads per drive were pretty much exactly the same. "zdb -m" confirms that data was being read from the beginning of each disk.
Most system/ZFS settings were left as default other than:
ahci_load="YES"
vfs.vmiodirenable
vfs.zfs.min_auto_ashift=12
ashift: 12 (confirmed via "zdb" and alignment looks correct also)
compression=off
atime=off
recordsize={various} - larger size does improve throughput although only by 10% ish
Last edited by a moderator: