UFS Slow real disk write results

On the same machine (Lenovo ThinkCentre M600 Tiny Thin Client N3000 terminal), installing pure FreeBSD 14.1 (UFS fs) and Debian (EXT3 fs) distributions, the disk performance results are dramatically different.

Result from dmesg (FreeBSD):

Code:
ada0 at ahcich1 bus 0 scbus1 target 0 lun 0
ada0: <SanDisk SSD U110 16GB U21B001> ACS-2 ATA SATA 3.x device
ada0: Serial Number 162937402388
ada0: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 512bytes)
ada0: Command Queueing enabled
ada0: 15272MB (31277232 512 byte sectors)

Result from diskinfo:

Code:
diskinfo -tv /dev/ada0
/dev/ada0
        512             # sectorsize
        16013942784     # mediasize in bytes (15G)
        31277232        # mediasize in sectors
        0               # stripesize
        0               # stripeoffset
        31029           # Cylinders according to firmware.
        16              # Heads according to firmware.
        63              # Sectors according to firmware.
        SanDisk SSD U110 16GB   # Disk descr.
        162937402388    # Disk ident.
        ahcich1         # Attachment
        Yes             # TRIM/UNMAP support
        0               # Rotation rate in RPM
        Not_Zoned       # Zone Mode

Seek times:
        Full stroke:      250 iter in   0.043551 sec =    0.174 msec
        Half stroke:      250 iter in   0.053353 sec =    0.213 msec
        Quarter stroke:   500 iter in   0.106147 sec =    0.212 msec
        Short forward:    400 iter in   0.072598 sec =    0.181 msec
        Short backward:   400 iter in   0.082549 sec =    0.206 msec
        Seq outer:       2048 iter in   0.235632 sec =    0.115 msec
        Seq inner:       2048 iter in   0.231310 sec =    0.113 msec

Transfer rates:
        outside:       102400 kbytes in   0.410430 sec =   249494 kbytes/sec
        middle:        102400 kbytes in   0.420552 sec =   243490 kbytes/sec
        inside:        102400 kbytes in   0.253176 sec =   404462 kbytes/sec

Result from dd (FreeBSD - 50 MB/s):

Code:
time dd if=/dev/zero of=/tmp/test.file bs=1M count=500
500+0 records in
500+0 records out
524288000 bytes transferred in 10.349240 secs (50659564 bytes/sec)

real    0m10.359s
user    0m0.010s
sys     0m1.747s

Result from dd (Debian - 300 MB/s):

Code:
time dd if=/dev/zero of=/tmp/test.file bs=1M count=500
500+0 records in
500+0 records out
524288000 bytes (524 MB, 500 MiB) copied, 1.7096 s, 307 MB/s

real    0m1.716s
user    0m0.010s
sys     0m1.706s

You can see that the machine achieves a 300MB/s write transfer without any problems. I run all commands with root rights. However, it is significantly lower on FreeBSD. Why? What else should I check and how can I tune the system to achieve better disk performance? Thx!
 
I assume the machine has enough memory, probably a dozen GB. In that case, a file write of 1/2 GB without sync can go mostly or all to memory. The question here is what fraction of the file actually made it to disk.

If you want to benchmark the disk performance, then do your IO not to a file system, but to the raw disk (without any buffer cache).

If you want to benchmark file system performance, you need to ask the question what operation is realistic, and sensible. The biggest part of that question is whether you want the file to be durable, meaning whether you sync it to disk at the end or not. The operation you might want to try is dd with the conv=fsync option on both platforms (I don't know whether Linux dd supports that, or how it can be told to perform an fsync after writing).
 
You need to try a benchmark that has an fsync(2) at the end.

Or you could add the time that a sync(1) after the dd command takes.

Of course if /tmp is a memory file system you can't use that.

Is the drive really 16 GB only?
 
If you want to benchmark the disk performance, then do your IO not to a file system, but to the raw disk (without any buffer cache).

If you want to benchmark file system performance, you need to ask the question what operation is realistic, and sensible. The biggest part of that question is whether you want the file to be durable, meaning whether you sync it to disk at the end or not. The operation you might want to try is dd with the conv=fsync option on both platforms (I don't know whether Linux dd supports that, or how it can be told to perform an fsync after writing).

I am not so much interested in strictly benchmarking the drive, but in real writing times. A simple file copy operation (cp) gives the same poor results. The comparison with Linux was only to indicate that the hardware is OK and the system (FreeBSD) is responsible for the slow transfer. I don't know what causes this and if there is any way to reduce write times.

The /tmp is the regular directory of the main directory tree on FreeBSD and Debian.

And there you go, here are the fio results:

Code:
fio --name=write_throughput --directory=/mnt --numjobs=8 --size=500M --time_based --run time=60s --ramp_time=2s --direct=1 --verify=0 --bs=1M --iodepth=64 --rw=write --group_reporting=1 

write_throughput: (g=0): rw=write, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=psync, iodepth=64
...
fio-3.38
Starting 8 processes
write_throughput: Laying out IO file (1 file / 500MiB)
write_throughput: Laying out IO file (1 file / 500MiB)
write_throughput: Laying out IO file (1 file / 500MiB)
write_throughput: Laying out IO file (1 file / 500MiB)
write_throughput: Laying out IO file (1 file / 500MiB)
write_throughput: Laying out IO file (1 file / 500MiB)
write_throughput: Laying out IO file (1 file / 500MiB)
write_throughput: Laying out IO file (1 file / 500MiB)
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1
Jobs: 6 (f=6): [_(1),W(5),_(1),W(1)][34.4%][w=44.0MiB/s][w=43 IOPS][eta 02m:00s]
write_throughput: (groupid=0, jobs=8): err= 0: pid=1115: Thu Nov 14 19:33:16 2024
  write: IOPS=40, BW=40.3MiB/s (42.2MB/s)(2424MiB/60207msec); 0 zone resets
    clat (usec): min=1260, max=39547k, avg=198126.74, stdev=1120487.00
     lat (usec): min=1367, max=39547k, avg=198313.45, stdev=1120488.77
    clat percentiles (usec):
     |  1.00th=[    1352],  5.00th=[    2147], 10.00th=[    3097],
     | 20.00th=[    3228], 30.00th=[    3294], 40.00th=[   10683],
     | 50.00th=[  158335], 60.00th=[  229639], 70.00th=[  254804],
     | 80.00th=[  274727], 90.00th=[  316670], 95.00th=[  387974],
     | 99.00th=[  650118], 99.50th=[  801113], 99.90th=[17112761],
     | 99.95th=[17112761], 99.99th=[17112761]
   bw (  KiB/s): min=16277, max=143347, per=100.00%, avg=52022.67, stdev=3635.48, samples=739
   iops        : min=    9, max=  139, avg=48.77, stdev= 3.57, samples=739
  lat (msec)   : 2=4.29%, 4=33.37%, 10=2.06%, 20=1.32%, 100=0.58%
  lat (msec)   : 250=26.28%, 500=29.25%, 750=2.19%, 1000=0.37%, 2000=0.08%
  lat (msec)   : >=2000=0.21%
  cpu          : usr=0.12%, sys=1.54%, ctx=5167, majf=0, minf=1
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,2424,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
  WRITE: bw=40.3MiB/s (42.2MB/s), 40.3MiB/s-40.3MiB/s (42.2MB/s-42.2MB/s), io=2424MiB (2542MB), run=60207-6020  7msec
 
On the same machine (Lenovo ThinkCentre M600 Tiny Thin Client N3000 terminal), installing pure FreeBSD 14.1 (UFS fs) and Debian (EXT3 fs) distributions, the disk performance results are dramatically different.

Result from dmesg (FreeBSD):

Code:
ada0 at ahcich1 bus 0 scbus1 target 0 lun 0
ada0: <SanDisk SSD U110 16GB U21B001> ACS-2 ATA SATA 3.x device
ada0: Serial Number 162937402388
ada0: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 512bytes)
ada0: Command Queueing enabled
ada0: 15272MB (31277232 512 byte sectors)

Result from diskinfo:

Code:
diskinfo -tv /dev/ada0
/dev/ada0
        512             # sectorsize
        16013942784     # mediasize in bytes (15G)
        31277232        # mediasize in sectors
        0               # stripesize
        0               # stripeoffset
        31029           # Cylinders according to firmware.
        16              # Heads according to firmware.
        63              # Sectors according to firmware.
        SanDisk SSD U110 16GB   # Disk descr.
        162937402388    # Disk ident.
        ahcich1         # Attachment
        Yes             # TRIM/UNMAP support
        0               # Rotation rate in RPM
        Not_Zoned       # Zone Mode

Seek times:
        Full stroke:      250 iter in   0.043551 sec =    0.174 msec
        Half stroke:      250 iter in   0.053353 sec =    0.213 msec
        Quarter stroke:   500 iter in   0.106147 sec =    0.212 msec
        Short forward:    400 iter in   0.072598 sec =    0.181 msec
        Short backward:   400 iter in   0.082549 sec =    0.206 msec
        Seq outer:       2048 iter in   0.235632 sec =    0.115 msec
        Seq inner:       2048 iter in   0.231310 sec =    0.113 msec

Transfer rates:
        outside:       102400 kbytes in   0.410430 sec =   249494 kbytes/sec
        middle:        102400 kbytes in   0.420552 sec =   243490 kbytes/sec
        inside:        102400 kbytes in   0.253176 sec =   404462 kbytes/sec

Result from dd (FreeBSD - 50 MB/s):

Code:
time dd if=/dev/zero of=/tmp/test.file bs=1M count=500
500+0 records in
500+0 records out
524288000 bytes transferred in 10.349240 secs (50659564 bytes/sec)

real    0m10.359s
user    0m0.010s
sys     0m1.747s

Result from dd (Debian - 300 MB/s):

Code:
time dd if=/dev/zero of=/tmp/test.file bs=1M count=500
500+0 records in
500+0 records out
524288000 bytes (524 MB, 500 MiB) copied, 1.7096 s, 307 MB/s

real    0m1.716s
user    0m0.010s
sys     0m1.706s

You can see that the machine achieves a 300MB/s write transfer without any problems. I run all commands with root rights. However, it is significantly lower on FreeBSD. Why? What else should I check and how can I tune the system to achieve better disk performance? Thx!

My results on my laptop:

Code:
slippy# uname -a
FreeBSD slippy 15.0-CURRENT FreeBSD 15.0-CURRENT #36 komquats-n273622-9d4428ad0239: Thu Nov 14 07:29:01 PST 2024     root@slippy:/export/obj/opt/src/git-src/amd64.amd64/sys/BREAK amd64
slippy#
slippy# time dd if=/dev/zero of=/mnt/test.file bs=1M count=500
500+0 records in
500+0 records out
524288000 bytes transferred in 0.106060 secs (4943332117 bytes/sec)

real    0m0.112s
user    0m0.003s
sys     0m0.109s
slippy#

Can you provide the output of df -h /tmp on the FreeBSD and Debian systems, please?
 
Plus, I don't know if it could cause significant differences in this case (just 500MiB), perhaps a developer can confirm this, but I would check the UFS Block Size and EXT3 Block Size from booth.

Larger block size will perform less I/O operations on disk partition.

On FreeBSD you can check this using dumpfs -s /dev/ada0p1 (considering 1 is your index partition for UFS). The default Block Size is 32768 bytes
On GNU/Linux I don't have idea.
 
Plus, I don't know if it could cause significant differences in this case (just 500MiB), perhaps a developer can confirm this, but I would check the UFS Block Size and EXT3 Block Size from booth.

Larger block size will perform less I/O operations on disk partition.

On FreeBSD you can check this using dumpfs -s /dev/ada0p1 (considering 1 is your index partition for UFS). The default Block Size is 32768 bytes
On GNU/Linux I don't have idea.
It's not that. Debian puts /tmp on tmpfs. My little show&tell above did the same, as you can see it was lightning fast. I could have used /tmp but I limit my /tmp to 300MB. You can do the same as debian with the following in fstab.

tmpfs /tmp tmpfs rw,nosuid,size=300m,uid=0,gid=0,mode=1777 0 0

Solaris uses tmpfs as did Tru64. Tru64 had a limit of 20MB while Solaris let you fill all of virtual memory (RAM+SWAP) with /tmp. The Tru64 approach was better, I had used 20 MB for years until a few years ago when I bumped it up to 300MB. You do the same as above without size=300m to give you the same functionality as Debian and Solaris.

The O/P shouldn't have compared apples with oranges. You can't expect spinning disk to have the same performance as RAM.
 
Anyone have a command I could run to do a quick I/O test between Linux FS and UFS?

Ideally something small and non-destructive for UFS/FreeBSD? (I'd be testing from Linux LiveUSB direct to NVMe and not caring about existing internal drive contents ext4/XFS/etc, but the UFS FreeBSD install would be real)
 
Yes. Your workload. On your system.

There are lots of microbenchmarks. But they are typically not representative of the performance of real-world workloads. Those are usually more affected by caching, file system data layout (more for spinning rust, less for SSD), and scheduling between foreground workload and background IO tasks. And real world performance differences are affected by many other things (compiler, network subsystem, ...), and those effects can be larger than IO stack differences.

There are file system benchmark packages. The only ones I'm familiar with are closed-source internal tools. I hear about IOZone, but have not used it in a long time.
 
Anyone have a command I could run to do a quick I/O test between Linux FS and UFS?

Attached is my personal version of bonnie. You have to run with `-s <n>` where n is gigabytes, with a size larger than your RAM. Otherwise only the first write benchmark is valid. It issues fsync(2) at the end. I tuned it to be useful to me.

On Linux you can supply a "dropthedamncaches" shell command that empties the filesystem cache, then you can benchmark with sizes smaller than RAM. On FreeBSD you would have to call unmount on the filesystem, even if that fails it will empty the cache.

Should probably put it on github...
 

Attachments

Gentlemen, read again what I wrote above: clean FreeBSD installation, physical /tmp directory in the root directory tree (on FreeBSD and Debian), but this is also irrelevant, because the slow transfer results refer to file copy operations anywhere (fio tests in the /mnt directory). The workload is close to zero.

The comparison with Linux in this case was only to indicate that the hardware under another system has the correct performance, I might as well have done the tests using Hiren's BootCD (that is, Windows).

On all my VMs, VPSs or PC, I have transfer results that are not objectionable. Only in this case (ThinkCentre M600) it is so poor.

That's why I'm wondering what the cause is, and I clearly suspect hardware, or some system tuning parameters. Unfortunately, here my knowledge is running out and I ask you ;)

Once again, the results from fio:

Code:
fio --name=write_throughput --directory=/mnt --numjobs=8 --size=500M --time_based --run time=60s --ramp_time=2s --direct=1 --verify=0 --bs=1M --iodepth=64 --rw=write --group_reporting=1

...
Jobs: 6 (f=6): [_(1),W(5),_(1),W(1)][34.4%][w=44.0MiB/s][w=43 IOPS][eta 02m:00s]
write_throughput: (groupid=0, jobs=8): err= 0: pid=1115: Thu Nov 14 19:33:16 2024
  write: IOPS=40, BW=40.3MiB/s (42.2MB/s)(2424MiB/60207msec); 0 zone resets
    clat (usec): min=1260, max=39547k, avg=198126.74, stdev=1120487.00
     lat (usec): min=1367, max=39547k, avg=198313.45, stdev=1120488.77
    clat percentiles (usec):
     |  1.00th=[    1352],  5.00th=[    2147], 10.00th=[    3097],
     | 20.00th=[    3228], 30.00th=[    3294], 40.00th=[   10683],
     | 50.00th=[  158335], 60.00th=[  229639], 70.00th=[  254804],
     | 80.00th=[  274727], 90.00th=[  316670], 95.00th=[  387974],
     | 99.00th=[  650118], 99.50th=[  801113], 99.90th=[17112761],
     | 99.95th=[17112761], 99.99th=[17112761]
   bw (  KiB/s): min=16277, max=143347, per=100.00%, avg=52022.67, stdev=3635.48, samples=739
   iops        : min=    9, max=  139, avg=48.77, stdev= 3.57, samples=739
  lat (msec)   : 2=4.29%, 4=33.37%, 10=2.06%, 20=1.32%, 100=0.58%
  lat (msec)   : 250=26.28%, 500=29.25%, 750=2.19%, 1000=0.37%, 2000=0.08%
  lat (msec)   : >=2000=0.21%
  cpu          : usr=0.12%, sys=1.54%, ctx=5167, majf=0, minf=1
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,2424,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
  WRITE: bw=40.3MiB/s (42.2MB/s), 40.3MiB/s-40.3MiB/s (42.2MB/s-42.2MB/s), io=2424MiB (2542MB), run=60207-6020  7msec
 
Benchmarking write operations is fraught with "details".
If your implied goal is to compare the "efficiencies" of EXT2 vs. UFS file systems, then:

Create a tarball that is about the size of the filesystem on some machine. This gives you some "structured content" to install.

for fs in EXT2 UFS
Erase the partition on which you will be creating the filesystem. This ensures the same "parts" of the media will be used.​
Create an $fs filesystem on the target machine (so the same hardware and software are in play)​
Unpack the aforementioned tarball into the filesystem on the target machine.​
sync and then unmount the filesystem (the goal being to get the disk cache to forget about it)​
Remount the filesystem.​
time(1) a tar of the contents of the filesystem to /dev/null. This tells you how efficiently you can READ the filesystem.​
Unmount the filesystem.​

This still leaves you vulnerable to read-ahead caching. But, write-behind caching is a lot more of a problem (using /dev/null as the target means the time required for that "write" is consistent).
 
Benchmarking write operations is fraught with "details".
If your implied goal is to compare the "efficiencies" of EXT2 vs. UFS file systems, then:

...

Thank you, but I keep repeating: I don't compare UFS to EXT3! It's all about the poor results of disk operations most likely resulting from the hardware specifications and its support by FreeBSD, but I don't know how to prove it and whether there is a way to improve these values.
 
It is replaceable, but what is the point, since only under the control of FreeBSD the transfer results are so poor.
First: FreeBSD dont like some SSDs in some configurations, second: this is Intel machine.

For example: mini-PC on J1900 with 120G SATA SSD, on FreeBSD and Windows and Linux write speed is ~7 mb/sec, on USB hub (SSD in box) and on SATA port. In usb box on AMD machine with 3020e write speed is 360 mb/sec.

If SSD replaceable in your machine you can take another SSD and test it to see where is problem, in FreeBSD or in specific hardware combination.
 
For example: mini-PC on J1900 with 120G SATA SSD, on FreeBSD and Windows and Linux write speed is ~7 mb/sec

Of course, this is a situation in which only the hardware is responsible for such results. In my case, however, it is only about the system, FreeBSD, which has an impact on this.
 
You can do some nondestructive tests using dd on an unmounted partition. Use

dd if=/dev/X of=/dev/X bs=16m

Which will read the partition and write it back to the same place. On FreeBSD you can use gstat to see what the drive is doing.
 
Back
Top