Other badblocks on large device

distancesprinter · Jun 22, 2023

I've always done burn-in testing on new hard drives with the badblocks utility in e2fsprogs. I've run into the following problem on very large drives. Any suggestions?

Code:

sysctl kern.geom.debugflags=0x10
badblocks -b 4096 -ws /dev/da4
badblocks: Value too large to be stored in data type invalid end block (4882956288): must be 32-bit value

Disk info:

Code:

Geom name: da4
Providers:
1. Name: da4
   Mediasize: 20000588955648 (18T)
   Sectorsize: 512
   Stripesize: 4096
   Stripeoffset: 0
   Mode: r0w0e0
   descr: ATA ST20000NE000-3G5
   lunid: 5000c500e67273f7
   ident: xxxxxxxx
   rotationrate: 7200
   fwsectors: 63
   fwheads: 255

Running: 13.2-RELEASE FreeBSD 13.2-RELEASE releng/13.2-n254617-525ecfdad597 GENERIC amd64

kent_dorfman766 · Jun 22, 2023

suggestions?

Most people don't bother, but instead let the drive's internal errror corrections handle it. I find that keeping an eye on SMART data, using RAID for critical stuff, and regular backups is adequate. My spidey sense just says you're making it more work than it has to be.

ralphbsz · Jun 22, 2023

I didn't even know the badblocks utility still existed. It is a remnant from the days when disk drives didn't have the ability to work around bad sectors internally. So what a file system had to do was: Whenever a bad sector (or block) was found, the file system had to make sure that area of the disk is never used again, either by marking the area "unusable" (or allocated) in its internal allocation bitmap, or by creating fake files that were placed on that area. In the last ~20 years, all disks have been able to internally remap (a.k.a. revector) those bad blocks, so a utility like badblocks is simply not needed any longer.

On the other hand, doing burn-in testing on disk drives is actually valuable, and will reduce their long-term error rate, by weeding out defective drives early. But you really don't need to use the badblocks utility for it; you can instead use dd to write and read the disk a few times.

The the original question: Yes, you have found a bug in the badblocks utility. It doesn't seem to be able to handle addresses this large. Since it is part of e2fsprogs (which is a port, not part of the base operating system), the bug probably has to be fixed by the maintainers. If you look at the bottom of the man page, the maintainers are listed (and they are the usual suspects of Linux file systems). My suggestion would be to open a FreeBSD PR against it (so the port maintainer for e2fsprogs is alerted), and the ping the Linux maintainers (Remy, Ted and David) to let them know.

distancesprinter · Jun 29, 2023

Thanks to both of you for your replies.

ralphbsz said:
On the other hand, doing burn-in testing on disk drives is actually valuable, and will reduce their long-term error rate, by weeding out defective drives early. But you really don't need to use the badblocks utility for it; you can instead use dd to write and read the disk a few times.

ralphbsz my purpose was in fact to do burn in exercise on the disk. Do you have any recommendations on the proper dd commands to use to perform this sort of work? My initial thought is to dd if=/dev/random of=/dev/da4 but I'm not sure whether this would work or how to track its progress. I have no gauge of how long it would take to write the surface of a 20TB drive without some sort of progress bar. I also suspect there may be a better way?

Thanks for your advice.

yuripv79 · Jun 29, 2023

distancesprinter said:
without some sort of progress bar

dd(1):

Code:

     status=value
              Where value is one of the symbols from the following list.

              noxfer    Do not print the transfer statistics as the last line
                        of status output.

              none      Do not print the status output.  Error messages are
                        shown; informational messages are not.

              progress  Print basic transfer statistics once per second.

gpw928 · Jun 29, 2023

It used to be that /dev/random could be a bottleneck, but it reads at 450MB/sec, even on my VM. So, not likely a worry.

With reasonably modern hardware, I would assume a write speed of at least 130MB/sec to a SATA disk, provided the records are long enough. So, roughly 40 hours.

You could get an estimate with a 10GB trial run: time dd if=/dev/random of=/dev/da4 bs=1024k status=progress count=10240

distancesprinter · Jun 29, 2023

It was sloth-like at 14 MB/s but I increased my

Code:

bs=64M

and it's humming along at 165 MB/s now.

Code:

root:~ # dd if=/dev/random of=/dev/da4 bs=64M status=progress
  201259483136 bytes (201 GB, 187 GiB) transferred 1222.273s, 165 MB/s

Thanks all!

ralphbsz · Jun 29, 2023

My advice was going to be that the only necessary improvement is to do the writing in large blocks, but ...

distancesprinter said:
It was sloth-like at 14 MB/s but I increased my

Code:

bs=64M

and it's humming along at 165 MB/s now.

then you beat me to it.

How long is it going to take? You can roughly pre-calculate it (take into account that the speed will drop by about half towards the end), or you can get a report every second (see above), or you can just press control-T, and dd will print a progress update.

The only other thing to do would be to spend a few hours exercising the actuator, by doing lots of random IOs. That's a bit tricky, as there is no pre-cooked program I know of that does that (some of the disk benchmarking tools like FIO might), so I would suggest writing a 5-line program in Python or Perl: Calculate the number of disk sectors on the disk by dividing its capacity by 4096, open the device, then in a loop choose a random number between 0 and that capacity, seek to address 4096 * the random number, and read or write 4096 bytes.