I have a FreeBSD production system with 12 3TB SATA drives in a a JBoD triple-parity ZFS configuration spanning around 30TB. This was working well for months, through numerous reboots, however after a power outage Friday it's been unable to boot, giving 3 or more (it varies) `zio_read error: 97` entries, followed by or with interspersed `ZFS: i/o error - all block copies unavailable` in what appears to be every fourth entry (e.g. 3 error 97 entries then 1 all block copies unavailable entry, repeating 0 or more times.) The subsequent message is usually `ZFS: can not read MOS config` followed by:
`Can't find /boot/zfsloader`
I am able to boot this machine via a live USB shell then run the following:
To make the root file system visible under `/tmpzroot/zzz` - inclusive of all the files on the machine and the `/boot` directory containing the "missing" `/boot/zfsloader` file. I've come across this link and similar suggesting it might be the Dell's RAID controller (in spite of it running in a JBoD configuration,) and tried turning on the Lifecycle management and all the system diagnostics to slow the boot process, but to no effect. I've also run a `zpool scrub` with no errors detected from the live USB after importing the `zroot` pool but nothing useful there either after rebooting.
My question is: is there some other mechanism by which I might stall the zpool import when booting live to get it to pick it up, or otherwise is there a way to correct this?
Alternatively, as a stop-gap, is there I way in which I can take the live usb image and boot the `/boot/zfsloader` from the imported `zroot`, bringing up the HDD filesystem in the process? (I know this would be a horrible "solution," but it's a production system and I need it to be running during weekdays.)
It strikes me as beyond bizarre that this would only happen after a power outage, I didn't have any issue rebooting the machine pretty much every 3-4 days over the course of 2 months, inclusive of the day before the power died. I'm inclined to believe that linked thread might be throwing me off and there might be some repair operation which needs to run which I'm unaware of given the ungraceful shutdown and all the disks showing good status plus all the data being available, but don't know what that might be.
`Can't find /boot/zfsloader`
I am able to boot this machine via a live USB shell then run the following:
Bash:
# boot FreeBSD live multiuser
# make live fs rw
mount -u -o rw /
# make temp
mkdir /tmpzroot
# import zroot
zpool import -R /tmpzroot zroot
# make temp root
mkdir /tmpzroot/zzz
# mount root
mount -t zfs zroot/ROOT/default /tmpzroot/zzz
# show root
ls /tmpzroot/zzz
To make the root file system visible under `/tmpzroot/zzz` - inclusive of all the files on the machine and the `/boot` directory containing the "missing" `/boot/zfsloader` file. I've come across this link and similar suggesting it might be the Dell's RAID controller (in spite of it running in a JBoD configuration,) and tried turning on the Lifecycle management and all the system diagnostics to slow the boot process, but to no effect. I've also run a `zpool scrub` with no errors detected from the live USB after importing the `zroot` pool but nothing useful there either after rebooting.
My question is: is there some other mechanism by which I might stall the zpool import when booting live to get it to pick it up, or otherwise is there a way to correct this?
Alternatively, as a stop-gap, is there I way in which I can take the live usb image and boot the `/boot/zfsloader` from the imported `zroot`, bringing up the HDD filesystem in the process? (I know this would be a horrible "solution," but it's a production system and I need it to be running during weekdays.)
It strikes me as beyond bizarre that this would only happen after a power outage, I didn't have any issue rebooting the machine pretty much every 3-4 days over the course of 2 months, inclusive of the day before the power died. I'm inclined to believe that linked thread might be throwing me off and there might be some repair operation which needs to run which I'm unaware of given the ungraceful shutdown and all the disks showing good status plus all the data being available, but don't know what that might be.