UFS Slow real disk write results

weldlive · Nov 14, 2024

On the same machine (Lenovo ThinkCentre M600 Tiny Thin Client N3000 terminal), installing pure FreeBSD 14.1 (UFS fs) and Debian (EXT3 fs) distributions, the disk performance results are dramatically different.

Result from dmesg (FreeBSD):

Code:

ada0 at ahcich1 bus 0 scbus1 target 0 lun 0
ada0: <SanDisk SSD U110 16GB U21B001> ACS-2 ATA SATA 3.x device
ada0: Serial Number 162937402388
ada0: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 512bytes)
ada0: Command Queueing enabled
ada0: 15272MB (31277232 512 byte sectors)

Result from diskinfo:

Code:

diskinfo -tv /dev/ada0
/dev/ada0
        512             # sectorsize
        16013942784     # mediasize in bytes (15G)
        31277232        # mediasize in sectors
        0               # stripesize
        0               # stripeoffset
        31029           # Cylinders according to firmware.
        16              # Heads according to firmware.
        63              # Sectors according to firmware.
        SanDisk SSD U110 16GB   # Disk descr.
        162937402388    # Disk ident.
        ahcich1         # Attachment
        Yes             # TRIM/UNMAP support
        0               # Rotation rate in RPM
        Not_Zoned       # Zone Mode

Seek times:
        Full stroke:      250 iter in   0.043551 sec =    0.174 msec
        Half stroke:      250 iter in   0.053353 sec =    0.213 msec
        Quarter stroke:   500 iter in   0.106147 sec =    0.212 msec
        Short forward:    400 iter in   0.072598 sec =    0.181 msec
        Short backward:   400 iter in   0.082549 sec =    0.206 msec
        Seq outer:       2048 iter in   0.235632 sec =    0.115 msec
        Seq inner:       2048 iter in   0.231310 sec =    0.113 msec

Transfer rates:
        outside:       102400 kbytes in   0.410430 sec =   249494 kbytes/sec
        middle:        102400 kbytes in   0.420552 sec =   243490 kbytes/sec
        inside:        102400 kbytes in   0.253176 sec =   404462 kbytes/sec

Result from dd (FreeBSD - 50 MB/s):

Code:

time dd if=/dev/zero of=/tmp/test.file bs=1M count=500
500+0 records in
500+0 records out
524288000 bytes transferred in 10.349240 secs (50659564 bytes/sec)

real    0m10.359s
user    0m0.010s
sys     0m1.747s

Result from dd (Debian - 300 MB/s):

Code:

time dd if=/dev/zero of=/tmp/test.file bs=1M count=500
500+0 records in
500+0 records out
524288000 bytes (524 MB, 500 MiB) copied, 1.7096 s, 307 MB/s

real    0m1.716s
user    0m0.010s
sys     0m1.706s

You can see that the machine achieves a 300MB/s write transfer without any problems. I run all commands with root rights. However, it is significantly lower on FreeBSD. Why? What else should I check and how can I tune the system to achieve better disk performance? Thx!

ralphbsz · Nov 14, 2024

I assume the machine has enough memory, probably a dozen GB. In that case, a file write of 1/2 GB without sync can go mostly or all to memory. The question here is what fraction of the file actually made it to disk.

If you want to benchmark the disk performance, then do your IO not to a file system, but to the raw disk (without any buffer cache).

If you want to benchmark file system performance, you need to ask the question what operation is realistic, and sensible. The biggest part of that question is whether you want the file to be durable, meaning whether you sync it to disk at the end or not. The operation you might want to try is dd with the conv=fsync option on both platforms (I don't know whether Linux dd supports that, or how it can be told to perform an fsync after writing).

nxjoseph · Nov 14, 2024

On debian, is /tmp mounted as tmpfs?

Code:

mount -v | grep tmp

rafael_grether · Nov 14, 2024

Try to test benchmark with fio inside your home directory on booth O.S. (don't use tmp for that).

cracauer@ · Nov 14, 2024

You need to try a benchmark that has an fsync(2) at the end.

Or you could add the time that a sync(1) after the dd command takes.

Of course if /tmp is a memory file system you can't use that.

Is the drive really 16 GB only?

weldlive · Nov 14, 2024

ralphbsz said:
If you want to benchmark the disk performance, then do your IO not to a file system, but to the raw disk (without any buffer cache).

If you want to benchmark file system performance, you need to ask the question what operation is realistic, and sensible. The biggest part of that question is whether you want the file to be durable, meaning whether you sync it to disk at the end or not. The operation you might want to try is dd with the conv=fsync option on both platforms (I don't know whether Linux dd supports that, or how it can be told to perform an fsync after writing).

I am not so much interested in strictly benchmarking the drive, but in real writing times. A simple file copy operation (cp) gives the same poor results. The comparison with Linux was only to indicate that the hardware is OK and the system (FreeBSD) is responsible for the slow transfer. I don't know what causes this and if there is any way to reduce write times.

The /tmp is the regular directory of the main directory tree on FreeBSD and Debian.

And there you go, here are the fio results:

Code:

fio --name=write_throughput --directory=/mnt --numjobs=8 --size=500M --time_based --run time=60s --ramp_time=2s --direct=1 --verify=0 --bs=1M --iodepth=64 --rw=write --group_reporting=1 

write_throughput: (g=0): rw=write, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=psync, iodepth=64
...
fio-3.38
Starting 8 processes
write_throughput: Laying out IO file (1 file / 500MiB)
write_throughput: Laying out IO file (1 file / 500MiB)
write_throughput: Laying out IO file (1 file / 500MiB)
write_throughput: Laying out IO file (1 file / 500MiB)
write_throughput: Laying out IO file (1 file / 500MiB)
write_throughput: Laying out IO file (1 file / 500MiB)
write_throughput: Laying out IO file (1 file / 500MiB)
write_throughput: Laying out IO file (1 file / 500MiB)
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1
Jobs: 6 (f=6): [_(1),W(5),_(1),W(1)][34.4%][w=44.0MiB/s][w=43 IOPS][eta 02m:00s]
write_throughput: (groupid=0, jobs=8): err= 0: pid=1115: Thu Nov 14 19:33:16 2024
  write: IOPS=40, BW=40.3MiB/s (42.2MB/s)(2424MiB/60207msec); 0 zone resets
    clat (usec): min=1260, max=39547k, avg=198126.74, stdev=1120487.00
     lat (usec): min=1367, max=39547k, avg=198313.45, stdev=1120488.77
    clat percentiles (usec):
     |  1.00th=[    1352],  5.00th=[    2147], 10.00th=[    3097],
     | 20.00th=[    3228], 30.00th=[    3294], 40.00th=[   10683],
     | 50.00th=[  158335], 60.00th=[  229639], 70.00th=[  254804],
     | 80.00th=[  274727], 90.00th=[  316670], 95.00th=[  387974],
     | 99.00th=[  650118], 99.50th=[  801113], 99.90th=[17112761],
     | 99.95th=[17112761], 99.99th=[17112761]
   bw (  KiB/s): min=16277, max=143347, per=100.00%, avg=52022.67, stdev=3635.48, samples=739
   iops        : min=    9, max=  139, avg=48.77, stdev= 3.57, samples=739
  lat (msec)   : 2=4.29%, 4=33.37%, 10=2.06%, 20=1.32%, 100=0.58%
  lat (msec)   : 250=26.28%, 500=29.25%, 750=2.19%, 1000=0.37%, 2000=0.08%
  lat (msec)   : >=2000=0.21%
  cpu          : usr=0.12%, sys=1.54%, ctx=5167, majf=0, minf=1
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,2424,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
  WRITE: bw=40.3MiB/s (42.2MB/s), 40.3MiB/s-40.3MiB/s (42.2MB/s-42.2MB/s), io=2424MiB (2542MB), run=60207-6020  7msec

VladiBG · Nov 14, 2024

+1 for fio

cy@ · Nov 14, 2024

weldlive said:

On the same machine (Lenovo ThinkCentre M600 Tiny Thin Client N3000 terminal), installing pure FreeBSD 14.1 (UFS fs) and Debian (EXT3 fs) distributions, the disk performance results are dramatically different.

Result from dmesg (FreeBSD):

Code:

ada0 at ahcich1 bus 0 scbus1 target 0 lun 0
ada0: <SanDisk SSD U110 16GB U21B001> ACS-2 ATA SATA 3.x device
ada0: Serial Number 162937402388
ada0: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 512bytes)
ada0: Command Queueing enabled
ada0: 15272MB (31277232 512 byte sectors)

Result from diskinfo:

Code:

diskinfo -tv /dev/ada0
/dev/ada0
        512             # sectorsize
        16013942784     # mediasize in bytes (15G)
        31277232        # mediasize in sectors
        0               # stripesize
        0               # stripeoffset
        31029           # Cylinders according to firmware.
        16              # Heads according to firmware.
        63              # Sectors according to firmware.
        SanDisk SSD U110 16GB   # Disk descr.
        162937402388    # Disk ident.
        ahcich1         # Attachment
        Yes             # TRIM/UNMAP support
        0               # Rotation rate in RPM
        Not_Zoned       # Zone Mode

Seek times:
        Full stroke:      250 iter in   0.043551 sec =    0.174 msec
        Half stroke:      250 iter in   0.053353 sec =    0.213 msec
        Quarter stroke:   500 iter in   0.106147 sec =    0.212 msec
        Short forward:    400 iter in   0.072598 sec =    0.181 msec
        Short backward:   400 iter in   0.082549 sec =    0.206 msec
        Seq outer:       2048 iter in   0.235632 sec =    0.115 msec
        Seq inner:       2048 iter in   0.231310 sec =    0.113 msec

Transfer rates:
        outside:       102400 kbytes in   0.410430 sec =   249494 kbytes/sec
        middle:        102400 kbytes in   0.420552 sec =   243490 kbytes/sec
        inside:        102400 kbytes in   0.253176 sec =   404462 kbytes/sec

Result from dd (FreeBSD - 50 MB/s):

Code:

time dd if=/dev/zero of=/tmp/test.file bs=1M count=500
500+0 records in
500+0 records out
524288000 bytes transferred in 10.349240 secs (50659564 bytes/sec)

real    0m10.359s
user    0m0.010s
sys     0m1.747s

Result from dd (Debian - 300 MB/s):

Code:

time dd if=/dev/zero of=/tmp/test.file bs=1M count=500
500+0 records in
500+0 records out
524288000 bytes (524 MB, 500 MiB) copied, 1.7096 s, 307 MB/s

real    0m1.716s
user    0m0.010s
sys     0m1.706s

You can see that the machine achieves a 300MB/s write transfer without any problems. I run all commands with root rights. However, it is significantly lower on FreeBSD. Why? What else should I check and how can I tune the system to achieve better disk performance? Thx!

My results on my laptop:

Code:

slippy# uname -a
FreeBSD slippy 15.0-CURRENT FreeBSD 15.0-CURRENT #36 komquats-n273622-9d4428ad0239: Thu Nov 14 07:29:01 PST 2024     root@slippy:/export/obj/opt/src/git-src/amd64.amd64/sys/BREAK amd64
slippy#
slippy# time dd if=/dev/zero of=/mnt/test.file bs=1M count=500
500+0 records in
500+0 records out
524288000 bytes transferred in 0.106060 secs (4943332117 bytes/sec)

real    0m0.112s
user    0m0.003s
sys     0m0.109s
slippy#

Can you provide the output of df -h /tmp on the FreeBSD and Debian systems, please?

rafael_grether · Nov 14, 2024

Plus, I don't know if it could cause significant differences in this case (just 500MiB), perhaps a developer can confirm this, but I would check the UFS Block Size and EXT3 Block Size from booth.

Larger block size will perform less I/O operations on disk partition.

On FreeBSD you can check this using dumpfs -s /dev/ada0p1 (considering 1 is your index partition for UFS). The default Block Size is 32768 bytes
On GNU/Linux I don't have idea.

cy@ · Nov 14, 2024

rafael_grether said:
Plus, I don't know if it could cause significant differences in this case (just 500MiB), perhaps a developer can confirm this, but I would check the UFS Block Size and EXT3 Block Size from booth.

Larger block size will perform less I/O operations on disk partition.

On FreeBSD you can check this using dumpfs -s /dev/ada0p1 (considering 1 is your index partition for UFS). The default Block Size is 32768 bytes
On GNU/Linux I don't have idea.

It's not that. Debian puts /tmp on tmpfs. My little show&tell above did the same, as you can see it was lightning fast. I could have used /tmp but I limit my /tmp to 300MB. You can do the same as debian with the following in fstab.

tmpfs /tmp tmpfs rw,nosuid,size=300m,uid=0,gid=0,mode=1777 0 0

Solaris uses tmpfs as did Tru64. Tru64 had a limit of 20MB while Solaris let you fill all of virtual memory (RAM+SWAP) with /tmp. The Tru64 approach was better, I had used 20 MB for years until a few years ago when I bumped it up to 300MB. You do the same as above without size=300m to give you the same functionality as Debian and Solaris.

The O/P shouldn't have compared apples with oranges. You can't expect spinning disk to have the same performance as RAM.

Espionage724 · Friday at 3:58 AM

Anyone have a command I could run to do a quick I/O test between Linux FS and UFS?

Ideally something small and non-destructive for UFS/FreeBSD? (I'd be testing from Linux LiveUSB direct to NVMe and not caring about existing internal drive contents ext4/XFS/etc, but the UFS FreeBSD install would be real)

Don Y · Friday at 5:20 AM

Espionage724 said:
Anyone have a command I could run to do a quick I/O test between Linux FS and UFS?

Maybe tar the filesystem into /dev/null and iostat while that is happening?

ralphbsz · Friday at 6:17 AM

Yes. Your workload. On your system.

There are lots of microbenchmarks. But they are typically not representative of the performance of real-world workloads. Those are usually more affected by caching, file system data layout (more for spinning rust, less for SSD), and scheduling between foreground workload and background IO tasks. And real world performance differences are affected by many other things (compiler, network subsystem, ...), and those effects can be larger than IO stack differences.

There are file system benchmark packages. The only ones I'm familiar with are closed-source internal tools. I hear about IOZone, but have not used it in a long time.

cracauer@ · Friday at 7:59 AM

Espionage724 said:
Anyone have a command I could run to do a quick I/O test between Linux FS and UFS?

Attached is my personal version of bonnie. You have to run with `-s <n>` where n is gigabytes, with a size larger than your RAM. Otherwise only the first write benchmark is valid. It issues fsync(2) at the end. I tuned it to be useful to me.

On Linux you can supply a "dropthedamncaches" shell command that empties the filesystem cache, then you can benchmark with sizes smaller than RAM. On FreeBSD you would have to call unmount on the filesystem, even if that fails it will empty the cache.

Should probably put it on github...

weldlive · Friday at 8:08 AM

Gentlemen, read again what I wrote above: clean FreeBSD installation, physical /tmp directory in the root directory tree (on FreeBSD and Debian), but this is also irrelevant, because the slow transfer results refer to file copy operations anywhere (fio tests in the /mnt directory). The workload is close to zero.

The comparison with Linux in this case was only to indicate that the hardware under another system has the correct performance, I might as well have done the tests using Hiren's BootCD (that is, Windows).

On all my VMs, VPSs or PC, I have transfer results that are not objectionable. Only in this case (ThinkCentre M600) it is so poor.

That's why I'm wondering what the cause is, and I clearly suspect hardware, or some system tuning parameters. Unfortunately, here my knowledge is running out and I ask you

Once again, the results from fio:

Code:

fio --name=write_throughput --directory=/mnt --numjobs=8 --size=500M --time_based --run time=60s --ramp_time=2s --direct=1 --verify=0 --bs=1M --iodepth=64 --rw=write --group_reporting=1

...
Jobs: 6 (f=6): [_(1),W(5),_(1),W(1)][34.4%][w=44.0MiB/s][w=43 IOPS][eta 02m:00s]
write_throughput: (groupid=0, jobs=8): err= 0: pid=1115: Thu Nov 14 19:33:16 2024
  write: IOPS=40, BW=40.3MiB/s (42.2MB/s)(2424MiB/60207msec); 0 zone resets
    clat (usec): min=1260, max=39547k, avg=198126.74, stdev=1120487.00
     lat (usec): min=1367, max=39547k, avg=198313.45, stdev=1120488.77
    clat percentiles (usec):
     |  1.00th=[    1352],  5.00th=[    2147], 10.00th=[    3097],
     | 20.00th=[    3228], 30.00th=[    3294], 40.00th=[   10683],
     | 50.00th=[  158335], 60.00th=[  229639], 70.00th=[  254804],
     | 80.00th=[  274727], 90.00th=[  316670], 95.00th=[  387974],
     | 99.00th=[  650118], 99.50th=[  801113], 99.90th=[17112761],
     | 99.95th=[17112761], 99.99th=[17112761]
   bw (  KiB/s): min=16277, max=143347, per=100.00%, avg=52022.67, stdev=3635.48, samples=739
   iops        : min=    9, max=  139, avg=48.77, stdev= 3.57, samples=739
  lat (msec)   : 2=4.29%, 4=33.37%, 10=2.06%, 20=1.32%, 100=0.58%
  lat (msec)   : 250=26.28%, 500=29.25%, 750=2.19%, 1000=0.37%, 2000=0.08%
  lat (msec)   : >=2000=0.21%
  cpu          : usr=0.12%, sys=1.54%, ctx=5167, majf=0, minf=1
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,2424,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
  WRITE: bw=40.3MiB/s (42.2MB/s), 40.3MiB/s-40.3MiB/s (42.2MB/s-42.2MB/s), io=2424MiB (2542MB), run=60207-6020  7msec

VladiBG · Friday at 10:57 AM

IOPS=40 for the SSD disk? Can you try with different disk or controller?

weldlive · Friday at 11:01 AM

VladiBG said:
IOPS=40 for the SSD disk? Can you try with different disk or controller?

IOPS=40 for the SSD disk! Exactly! The machine is a thin client, I can't replace either the disk or the controller.

Erichans · Friday at 11:13 AM

weldlive said:
ada0: <SanDisk SSD U110 16GB U21B001> ACS-2 ATA SATA 3.x device

weldlive said:
I can't replace either the disk or the controller.

SanDisk SSD U110 suggests it's just an m.2 SATA SSD; is it "glued" in or otherwise not replaceable?

weldlive · Friday at 11:17 AM

Erichans said:
SanDisk SSD U110 suggests it's just an m.2 SATA SSD; is it "glued" in or otherwise not replaceable?

It is replaceable, but what is the point, since only under the control of FreeBSD the transfer results are so poor. Other systems on the same equipment function properly.

Don Y · Friday at 12:10 PM

Benchmarking write operations is fraught with "details".
If your implied goal is to compare the "efficiencies" of EXT2 vs. UFS file systems, then:

Create a tarball that is about the size of the filesystem on some machine. This gives you some "structured content" to install.

for fs in EXT2 UFS

Erase the partition on which you will be creating the filesystem. This ensures the same "parts" of the media will be used.

Create an $fs filesystem on the target machine (so the same hardware and software are in play)

Unpack the aforementioned tarball into the filesystem on the target machine.

sync and then unmount the filesystem (the goal being to get the disk cache to forget about it)

Remount the filesystem.

time(1) a tar of the contents of the filesystem to /dev/null. This tells you how efficiently you can READ the filesystem.

Unmount the filesystem.

This still leaves you vulnerable to read-ahead caching. But, write-behind caching is a lot more of a problem (using /dev/null as the target means the time required for that "write" is consistent).

weldlive · Friday at 12:16 PM

Don Y said:
Benchmarking write operations is fraught with "details".
If your implied goal is to compare the "efficiencies" of EXT2 vs. UFS file systems, then:

...

Thank you, but I keep repeating: I don't compare UFS to EXT3! It's all about the poor results of disk operations most likely resulting from the hardware specifications and its support by FreeBSD, but I don't know how to prove it and whether there is a way to improve these values.

CeXP1917 · Friday at 1:22 PM

weldlive said:
It is replaceable, but what is the point, since only under the control of FreeBSD the transfer results are so poor.

First: FreeBSD dont like some SSDs in some configurations, second: this is Intel machine.

For example: mini-PC on J1900 with 120G SATA SSD, on FreeBSD and Windows and Linux write speed is ~7 mb/sec, on USB hub (SSD in box) and on SATA port. In usb box on AMD machine with 3020e write speed is 360 mb/sec.

If SSD replaceable in your machine you can take another SSD and test it to see where is problem, in FreeBSD or in specific hardware combination.

cracauer@ · Friday at 1:33 PM

How much RAM does the "machine" have?

Have you considered running the benchmark I attached to the post above?

weldlive · Friday at 2:22 PM

CeXP1917 said:
For example: mini-PC on J1900 with 120G SATA SSD, on FreeBSD and Windows and Linux write speed is ~7 mb/sec

Of course, this is a situation in which only the hardware is responsible for such results. In my case, however, it is only about the system, FreeBSD, which has an impact on this.

Crivens · Friday at 2:25 PM

You can do some nondestructive tests using dd on an unmounted partition. Use

dd if=/dev/X of=/dev/X bs=16m

Which will read the partition and write it back to the same place. On FreeBSD you can use gstat to see what the drive is doing.

UFS Slow real disk write results

weldlive

ralphbsz

nxjoseph

rafael_grether

cracauer@

weldlive

VladiBG

cy@

rafael_grether

cy@

Espionage724

Don Y

ralphbsz

cracauer@

Attachments

weldlive

VladiBG

weldlive

Erichans

weldlive

Don Y

weldlive

CeXP1917

cracauer@

weldlive

Crivens

Administrator