Solved Unable to boot after interrupted upgrade

First off, I am aware that much of this is my own fault -- the machine in question is my personal desktop and I have backups of the important stuff. I would like to use this as a learning opportunity, rather than blowing everything away and reinstalling. I have included a description of the events leading to this and the current state below. I am happy to provide any more detail or followup. I thank you in advance for any help you can provide.

I have an AMD64 desktop machine with ZFS on root. The relevant history of the machine is as follows
  1. Install 13.0-RELEASE using the installer with ZFS on root, with a single mirror vdev, consisting of ada2p4 and ada3p4, using GELI encryption
  2. Add second mirror vdev of ada0p1 and ada1p1, each GELI encrypted
  3. Begin upgrade to 13.1-RELEASE (up to first freebsd-update install)
  4. Be interrupted and forget I started the install process, continue using the machine for some time (days)
  5. Try upgrade again, and continue with freebsd-update install
  6. Left the computer. In the time that I was away, we had a brief power surge in a thunderstorm causing the computer to lose power. I do not know if the update command was interrupted.
  7. Power up the computer -- errors indicating 'ZFS: i/o error - all block copies unavailable' and unable to boot
  8. Boot from 13.1 install media and geli attach ada0p1 ada1p1 ada2p4 ada3p4 and import and scrub the pool (no errors, normal runtime for scrub). I am presuming that the successful import, scrub, and typical performance together mean that there is not hardware damage from the brief power outage, but I do not know how to further check this.
  9. chroot into the zpool root and finish with freebsd-update install and upgrade packages
  10. still boot errors
  11. Following instructions from some forum posts, write gptzfsboot data with gpart (accidentally to the efi partition, because I did not pay enough attention) (commands below)
  12. dd if=/boot/boot1.efi of=/dev/ada2p1; dd if=/boot/boot1.efi of=/dev/ada3p1
  13. geli configure -b ada0p1 ada1p1 ada2p4 ada3p4

Regarding item 10, the following occurred:
Code:
# gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ada2 # note using the efi partition here
# gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ada3
...restart and boot errors...
# gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ada0
# gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ada1
# gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ada2
# gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ada3
... restart and boot errors...
# gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 2 ada2 # note using the freebsd-boot partition here, not the efi
# gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 2 ada2

At this point, I get the following at boot:

Code:
BIOS drive C: is disk0
BIOS drive D: is disk1
...
BIOS drive I: is disk6
GELI Passphrase for disk0p4:

Upon entering the passphrase, I get two Calculating GELI Decryption Key for disk0p4: ... iterations ... for two disks. Then I get more than a screenful of zio_read error : 5. After the zio_reads, I get a single ZFS: i/o error - all block copies unavailable and the boot prompt:

Code:
Calculating GELI Decryption Key for disk0p4: N iterations...
Calculating GELI Decryption Key for disk3p4: N iterations...

zio_read error : 5
zio_read error : 5
... more than a screen of this...
zio_read error : 5
ZFS: i/o error - all block copies unavailable

FreeBSD/x86 boot
Default: zfs:zroot/ROOT/default:/boot/kernel/kernel
boot: _

Below is the output of gpart show. You can ignore nvd0, da0, and da1. The first is not part of the FreeBSD system, the second is from my USB DAC, and the third is the USB drive with the 13.1-RELEASE install media.

Code:
=>        34  1000215149  nvd0  GPT  (477G)
          34        2014        - free -  (1.0M)
        2048     2097152     1  efi  (1.0G)
     2099200   998115983     2  linux-data  (476G)

=>        34  1000215149  diskid/DISK-S5JYNG0N105162L  GPT  (477G)
          34        2014                               - free -  (1.0M)
        2048     2097152                            1  efi  (1.0G)
     2099200   998115983                            2  linux-data  (476G)

=>         40  19532873648  ada0  GPT  (9.1T)
           40  19327352832     1  freebsd-zfs  (9.0T)
  19327352872    205520816        - free -  (98G)

=>         40  19532873648  ada1  GPT  (9.1T)
           40  19327352832     1  freebsd-zfs  (9.0T)
  19327352872    205520816        - free -  (98G)

=>         40  19532873648  ada2  GPT  (9.1T)
           40       409600     1  efi  (200M)
       409640         1024     2  freebsd-boot  (512K)
       410664          984        - free -  (492K)
       411648      4194304     3  freebsd-swap  (2.0G)
      4605952  19528265728     4  freebsd-zfs  (9.1T)
  19532871680         2008        - free -  (1.0M)

=>         40  19532873648  ada3  GPT  (9.1T)
           40       409600     1  efi  (200M)
       409640         1024     2  freebsd-boot  (512K)
       410664          984        - free -  (492K)
       411648      4194304     3  freebsd-swap  (2.0G)
      4605952  19528265728     4  freebsd-zfs  (9.1T)
  19532871680         2008        - free -  (1.0M)

=>         40  19532873648  diskid/DISK-98M0A041FB4G  GPT  (9.1T)
           40  19327352832                         1  freebsd-zfs  (9.0T)
  19327352872    205520816                            - free -  (98G)

=>         40  19532873648  diskid/DISK-1EHT9M1Z  GPT  (9.1T)
           40  19327352832                     1  freebsd-zfs  (9.0T)
  19327352872    205520816                        - free -  (98G)

=>         40  19532873648  diskid/DISK-X990A02VFBDG  GPT  (9.1T)
           40       409600                         1  efi  (200M)
       409640         1024                         2  freebsd-boot  (512K)
       410664          984                            - free -  (492K)
       411648      4194304                         3  freebsd-swap  (2.0G)
      4605952  19528265728                         4  freebsd-zfs  (9.1T)
  19532871680         2008                            - free -  (1.0M)

=>         40  19532873648  diskid/DISK-ZHZ5AVAQ  GPT  (9.1T)
           40       409600                     1  efi  (200M)
       409640         1024                     2  freebsd-boot  (512K)
       410664          984                        - free -  (492K)
       411648      4194304                     3  freebsd-swap  (2.0G)
      4605952  19528265728                     4  freebsd-zfs  (9.1T)
  19532871680         2008                        - free -  (1.0M)

=> 32  352  da0  MBR  (192K)
   32   31       - free -  (16K)
   63  321    1  !14  (161K)

=> 32  352  diskid/DISK-Y7XP1F113BE597  MBR  (192K)
   32   31                              - free -  (16K)
   63  321                           1  !14  (161K)

=>       1  61800447  da1  MBR  (29G)
         1     66584    1  efi  (33M)
     66585   2222800    2  freebsd  [active]  (1.1G)
   2289385  59511063       - free -  (28G)

=>      0  2222800  da1s2  BSD  (1.1G)
        0       16         - free -  (8.0K)
       16  2222784      1  freebsd-ufs  (1.1G)

=>       1  61800447  diskid/DISK-070846100BC6A036  MBR  (29G)
         1     66584                             1  efi  (33M)
     66585   2222800                             2  freebsd  [active]  (1.1G)
   2289385  59511063                                - free -  (28G)

=>      0  2222800  diskid/DISK-070846100BC6A036s2  BSD  (1.1G)
        0       16                                  - free -  (8.0K)
       16  2222784                               1  freebsd-ufs  (1.1G)
 
  1. Following instructions from some forum posts, write gptzfsboot data with gpart (accidentally to the efi partition, because I did not pay enough attention) (commands below)
  2. dd if=/boot/boot1.efi of=/dev/ada2p1; dd if=/boot/boot1.efi of=/dev/ada3p1
These are very wrong but you already figured that out. You don't appear to need them (CSM/BIOS boot appears to work) but lets fix this anyway. The efi partition needs to be formatted with FAT32, it's an actual filesystem, not some "raw" binary content like freebsd-boot. So, use newfs_msdos(8) to format it. Then mount it on /boot/efi (doesn't need to be permanent, we just need to write to it). Create a directory structure there: EFI/BOOT/ (capitalization doesn't matter, FAT is not case sensitive). Copy /boot/loader.efi to EFI/BOOT/bootx64.efi. Repeat for the other efi partition(s).

# gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 2 ada2 # note using the freebsd-boot partition here, not the efi
gptzfsboot(8) needs to be written to the freebsd-boot partition. Always double check the correct index number! And repeat this for every disk that has a freebsd-boot partition. It's unclear from which drive the system actually boots though, double check and reconfigure your boot settings. And just make sure all freebsd-boot partitions are written correctly, then it shouldn't really matter from which disk it boots.

I'm a little fuzzy on the details of a fully encrypted setup though. But the above should at least get you to a working loader(8) prompt.
 
  • Thanks
Reactions: ggb
The EFI partitions are 200M in size, and appear to be too small for FAT32, so I formatted as FAT16. Additionally, it was unclear to me if the path after mounting should be /boot/efi/boot/bootx64.efi or /boot/efi/efi/boot/bootx64.efi, so I tried both. Each yields the same error condition as in the original post (GELI prompt, a stream of 'zio_read error : 5', 'ZFS: i/o error - all block copies unavailable' and the boot prompt).

Regarding boot disk:
It's unclear from which drive the system actually boots though, double check and reconfigure your boot settings. And just make sure all freebsd-boot partitions are written correctly, then it shouldn't really matter from which disk it boots.
I select my boot disk from the motherboard's boot menu. The order of disks is reliable in that menu. There are two that give me "no bootloader" errors, and two that yield the same errors I have. I do not know how the drives map to one another (i.e., I do not know which in the motherboard boot menu map to ada2 and ada3, but I know which two are ada2 and 3 and which two are ada0 and 1).

I'm a little fuzzy on the details of a fully encrypted setup though. But the above should at least get you to a working loader(8) prompt.
The ZFS partitions are encrypted, but not the efi or freebsd-boot partitions. Just stating what I think is obvious for clarity's sake.


My process to restore the efi partition:

Code:
# gpart show ada2 ada3 # the only two drives with EFI and freebsd-boot partitions, from original install
=>         40  19532873648  ada2  GPT  (9.1T)
           40       409600     1  efi  (200M)
       409640         1024     2  freebsd-boot  (512K)
       410664          984        - free -  (492K)
       411648      4194304     3  freebsd-swap  (2.0G)
      4605952  19528265728     4  freebsd-zfs  (9.1T)
  19532871680         2008        - free -  (1.0M)

=>         40  19532873648  ada3  GPT  (9.1T)
           40       409600     1  efi  (200M)
       409640         1024     2  freebsd-boot  (512K)
       410664          984        - free -  (492K)
       411648      4194304     3  freebsd-swap  (2.0G)
      4605952  19528265728     4  freebsd-zfs  (9.1T)
  19532871680         2008        - free -  (1.0M)
# newfs_msdos -F 32 /dev/ada2p1
newfs_msdos: 25573 clusters too few clusters for FAT32, need 65525
# newfs_msdos -F 16 /dev/ada2p1
/dev/ada2p1: 409360 sectors in 25585 FAT16 clusters (8192 bytes/cluster)
BytesPerSec=512 SecPerClust=16 ResSectors=1 FATs=2 RootDirEnts=512 Media=0xf0 FATsecs=100 SecPerTrack=63 Heads=16 HiddenSecs=0 HugeSectors=409600
# newfs_msdos -F 16 /dev/ada3p1
...same output...
# mount -t msdosfs /dev/ada2p1 /boot/efi
# mkdir /boot/efi/boot
# cp /boot/loader.efi /boot/efi/boot/bootx64.efi
# umount /dev/ada2p1
# mount -t msdosfs /dev/ada3p1 /boot/efi
# mkdir /boot/efi/boot
# cp /boot/loader.efi /boot/efi/boot/bootx64.efi
# umount /dev/ada3p1
# gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 2 ada2
partcode written to ada2p2
bootcode written to ada2
# gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 2 ada3
...same output, but for ada3...
# reboot

With the alternate path. This second time, I did not also do gpart bootcode .... as this definitely should not matter when I am only touching the efi partitions.

(edit to fix copy/paste error in this section. Fixing the mkdirs)
Code:
# newfs_msdos -F 16 /dev/ada2p1
/dev/ada2p1: 409360 sectors in 25585 FAT16 clusters (8192 bytes/cluster)
BytesPerSec=512 SecPerClust=16 ResSectors=1 FATs=2 RootDirEnts=512 Media=0xf0 FATsecs=100 SecPerTrack=63 Heads=16 HiddenSecs=0 HugeSectors=409600
# newfs_msdos -F 16 /dev/ada3p1
...same output...
# mount -t msdosfs /dev/ada2p1 /boot/efi
# mkdir /boot/efi/efi/boot
# cp /boot/loader.efi /boot/efi/efi/boot/bootx64.efi
# umount /dev/ada2p1
# mount -t msdosfs /dev/ada3p1 /boot/efi
# mkdir /boot/efi/efi/boot
# cp /boot/loader.efi /boot/efi/efi/boot/bootx64.efi
# umount /dev/ada3p1
# reboot
 
Additionally, it was unclear to me if the path after mounting should be /boot/efi/boot/bootx64.efi or /boot/efi/efi/boot/bootx64.efi, so I tried both.
When UEFI boots it looks for a file in EFI/BOOT/bootx64.efi on the first efi partition it finds. If you mounted the partition on /boot/efi, the complete path would be /boot/efi/EFI/BOOT/bootx64.efi.

Each yields the same error condition as in the original post
Yes, your system is CSM/BIOS booting. So it really doesn't matter what's in the efi partition. Set this correctly and configure your system to actually UEFI boot (turn CSM off). Not necessary to do right now, it's not going to change or fix the errors you have later on. But getting the efi partition correctly helps if you do want to switch between EUFI or CSM booting.
 
  • Thanks
Reactions: ggb
Understood and thank you for your patient help thus far. I have updated my boot settings to disable CSM. Upon booting I get a slightly different prompt with GELI and still a long stream of zio_read errors. I had to snap a picture of the output after the GELI decryption, as that gets pushed off the screen nearly immediately.

What seems to be new/different having switched over to EFI is the specific error regarding /boot/lua/loader.lua.

What should I be looking at next to further troubleshoot and remediate this?

Edit: I forgot to mention that the named file is present with ls /boot/lua. I see that file and others in the /boot/lua directory, and it seems that /boot is also populated. I do not know if it has all expected files, but it is not empty. Running ls from the prompt gives a string of 'zio_read error: 5's, but does return the expected output.

Code:
Consoles: EFI Console
GELI Passphrase for disk5p4:

Calculating GELI Decryption Key for disk5p4: N iterations...
Calculating GELI Decryption Key for disk6p4: N iterations...
    Reading loader env vars from /efi/freebsd/loader.env
    Setting currdev to disk6p1:
FreeBSD/amd64 EFI loader, Revision 1.1

    Command line arguments: loader.efi
    Image base: 0xa3bf3000
    EFI version: 2.70
    EFI Firmware: American Megatrends (rev 5.15)
    Load Path: \EFI\BOOT\BOOTX64.EFI
    Load Device: PciRoot(0x2)/Pci(0x1,0x1/Pci(0x0, 0x0)/Pci(0x9,0x0)/Pci(0x0,x ##cut off
    BootCurrent: 0011
    BootOrder: 0011(*) 000e 0010 000b 0012 0006 0000
    BootInfo Path: HD(1,GPT,<uuid>,0x28,0x64000) ## cust off
Ignoring Boot0011: Only one DP found
Trying ESP: PciRoot(0x2)/Pci(0x1,0x1)/Pci(0x0,0x0)/Pci(0x9,0x0)/Pci(0x0,0x0)/Sa ## cut off
setting currdev to disk6p1
Trying: PciRoot(0x2)/Pci(0x1,0x1)/Pci(0x0,0x0)/Pci(0x9,0x0)/Pci(0x0,0x0)Sata(0 ##cut off
Setting currdev to disk6p2
Trying: PciRoot(0x2)/Pci(0x1,0x1)/Pci(0x0,0x0)/Pci(0x9,0x0)/Pci(0x0,0x0)Sata(0 ##cut off
Setting currdev to disk6p3
Trying: PciRoot(0x2)/Pci(0x1,0x1)/Pci(0x0,0x0)/Pci(0x9,0x0)/Pci(0x0,0x0)Sata(0 ##cut off
zio_read error: 5
Setting currdev to zfs:zroot/ROOT/default:
zio_read error: 5
zio_read error: 5
...many more zio_read error:5...
failed to add font: boot font ...
zio_read error: 5
zio_read error: 5
...interleaved...
zfs: i/o error - all blocks unavailable
zio_read error: 5
zio_read error: 5
...many more zio_read error: 5
zio_read error: 5
ERROR: cannot open /boot/lua/loader.lua: no such file or directory

type '?' for a list of commands, 'help' for more detailed help.
OK _
 
… 13.0-RELEASE … Following instructions from some forum posts, write gptzfsboot data with gpart …

For the future: <https://www.freebsd.org/releases/13.0R/relnotes/#boot> the note about not using gpart(8).

… second mirror vdev of ada0p1 and ada1p1, each GELI encrypted …

Are you certain that those two devices were at the same numbers (0 and 1) before you used the dd command?

dd if=/boot/boot1.efi of=/dev/ada2p1; dd if=/boot/boot1.efi of=/dev/ada3p1
 
  • Thanks
Reactions: ggb
grahamperrin@ said:
ggb said:
… second mirror vdev of ada0p1 and ada1p1, each GELI encrypted …
Are you certain that those two devices were at the same numbers (0 and 1) before you used the dd command?

Yes. The HDDs have been reliably numbered since first installation of 13.0-RELEASE several months ago. The original installation was on ada2/3 with the four installer-configured partitions you see (each). At installation time, ada0 was just a bulk storage disk used from Linux. Thus, I installed onto a ZFS mirror vdev of ada2/3. After installation, I moved everything from ada0 onto the ZFS mirror and then partitioned ada0/1 as you see above, GELI-encrypted the partitions, and added them as a second mirror vdev.

grahamperrin@ said:
… 13.0-RELEASE … Following instructions from some forum posts, write gptzfsboot data with gpart …
For the future: <https://www.freebsd.org/releases/13.0R/relnotes/#boot> the note about not using gpart(8).

The gpart bits were mistaken - I didn't realize that the instructions I was following had the freebsd-boot partition at index 1, unlike my boot drives. I believe following SirDice's instructions has cleaned up the efi and freebsd-boot partitions appropriately. My reading of the linked note you have shared indicates that gpart should not be used to create a "raw" partition for the efi partition. It is my understanding that gpart is still the correct tool for the freebsd-boot partition. Is this correct?

For better or worse, this has been a useful learning opportunity.


At this point, I am seeking further input/troubleshooting guidance. From searches, it appears that similar 'zio_read error' issues have been solved by reseating the SATA cables (although where that was a solution, it seemed to go along with performance issues I am not seeing). I'll be giving that a try sometime in the next few days.
 
Bumping this, as I have had a chance to try reseating the SATA cables (delay d/t moving). I saw similar errors in the thread linked below -- in that thread, reseating the SATA cables was a solution. This has made no difference to my boot errors, described upthread.

At this point I do not know what to do to further troubleshoot. Any feedback or guidance would be much appreciated.


 
I can definitely do that. May I ask first, though, what indicates to you that this is hardware failure? I can boot the system with install media and can attach to the zpool and operate on it without issue. There is no apparent performance difference in zfs scrub compared to before I saw issues. Similarly, performance for file operations seems unchanged based on an rsync I did to get the latest data off the machine when I first had issues.
 
Based on the information you have provided so far (or the lack of information) we don't know if this is brand new system (which could have faulty parts) or a system with a few years of use (which can have developed hardware faults in some parts). I simply stated what the cheapest next step hardware-wise was.
You did not say if you had tested the hardware (BTW, just booting from install media and fool around for half an hour is not a reliable test) or not.

If you want to test the hardware, one way to do that is to try a different version of FreeBSD. If it works on another version, then you have proved that the problem is with a specific version.
 
  • Thanks
Reactions: ggb
I apologize for the lack of clarity on usage of the machine. It was built from all new parts in 2020 and ran several different Linux distributions through 2022. I installed FreeBSD 13.0 earlier this year (I believe March timeframe, but could boot it up later and check when the zpool was created, which should get us within a couple minutes of install time) and have been using it uneventfully until the botched upgrade to 13.1 described above.

There were no hardware issues through the life of the machine, which is why I did not begin with hardware testing. I am open to it being a hardware issue, especially given that there was a power surge that may have coincided with some of the update process (details in first post of that timeline). That said, it boots and runs without issue with the Linux installation on an M.2 drive (which drive you can see in the first post). The reason I questioned the assumption of hardware failure is that it would have to be localized to the HDDs or their cables, while having affected no other components of the system.

zfs scrub completes in the same time frame (~1.5hr) as before failure. There are no SMART issues on the drives. I rsynced a couple hundred GB to another machine, and that data is all intact. So the drives *seem* to me to be good. I have not run any IO benchmarks in this failed state, but could do so. I would need some guidance on doing so (what to use and maybe a link to a quality primer), as I do not typically benchmark hardware beyond prime95 on new installs for CPU and RAM stability.
 
if booting from external media works ok then it's most likely not the hardware
i'll speculate there is something wrong with the zpool cache/hint file or the new bootcode has some compatibility problem with your hardware
before kernel boots / gptzfsbootcode / loader / efi loader will use bios/efi calls to access disks, and disk numbering/naming is/might be different than after the kernel is booted
 
  • Thanks
Reactions: ggb
Thank you for the feedback. Unfortunately, my research on zpool cache/hint files and ZFS in the boot process has led to nothing new and actionable for me. I am not sure how to continue.

With regard to disk numbering, here is what I know:
  1. In a booted system (either from disk pre-upgrade-and-errors, or with the installation media), the disks are reliably numbered as /dev/ada0-3.
  2. /dev/ada2 and /dev/ada3 are the two that FreeBSD was initially installed on, each with four partitions, as described upthread.
  3. /dev/ada0 and /dev/ada1 are the two that make up a mirror vdev I added after initial install (and before the failed upgrade). These each have one partition.
  4. When booting from disk (not installation media), I am prompted for the GELI passphrase for disk3p4. I see messages about disk3p4 and disk4p4 being decrypted (I do not, and never did see, messages about any other disks). I do not know where in the boot process this numbering comes from, but it has been reliable from the first install.
    1. If I boot from disk with the installation media USB plugged in, I see a GELI prompt and messages about disk4p4 and disk5p4 -- the installation media USB drive becomes disk1
  5. If I let boot proceed from disk, I eventually get my stream of zio_read errors, and then get dropped to the boot prompt, as described above.
    1. I can see my filesystem, though it is incompletely mounted. E.g., /usr/home/ is there, but my home directory is not mounted
    2. lszfs shows all datasets in my zpool (including my home directory's dataset)
    3. It complains about the file I mentioned above, but that file appears with ls, but if I try to use more /boot/lua/loader.lua, I get the same error: "can't open '/boot/lua/loader.lua': no such file or directory". This is despite that very same file appearing in the listing with ls

What should I be doing to troubleshoot the bootcode or the zpool cache or hints file?
 
If you suspect the installation is botched (a power surge during the install could lead to corrupted files) the simplest fix is to re-do the installation.
 
Thank you for the feedback. I appreciate that this thread would be unnecessary if I simply reinstall from scratch, and I will probably be doing so if I cannot resolve within the next couple weeks.

As I mentioned at the top of the thread, I want to treat this as a learning opportunity. I have already learned much more in troubleshooting and thanks to everyone in this thread about the boot process than multiple reads through the handbook ever taught me. I want to try to recover in-place if feasible. This is a luxury as I have other machines available for work and this is just my dev machine. I have already dealt with all critical data on the system -- I do keep backups; and with booting the install media, I was able to take a fully-up-to-date backup.

I do understand that helping with my admittedly unnecessary efforts here might not be a priority, and I appreciate the guidance and attention that this thread has gotten.
 
I have made great progress! I am now in a reboot loop, which will take some more troubleshooting, but wanted to drop an update here in the meantime. Something I mentioned above was really bugging me:
When booting from disk (not installation media), I am prompted for the GELI passphrase for disk3p4. I see messages about disk3p4 and disk4p4 being decrypted (I do not, and never did see, messages about any other disks). I do not know where in the boot process this numbering comes from, but it has been reliable from the first install.
This felt wrong. Long story short, I booted into the installation media once more and

Code:
# geli attach /dev/ada0p1 /dev/ada1p1 /dev/ada2p4 /dev/ada3p4
# geli list | less
####above showed that ada0p1 and ada1p1 had BOOT flag, but not GELIBOOT
# geli configure -g ada0p1
# geli configure -g ada1p1
# geli list | less
####above now shows GELIBOOT on all four GELI partitions
# reboot ## into boot from disk

Now, I am prompted for the password for DISK1P1 for GELI and I see all four partitions decrypted before proceeding.
At this point it goes through a normal startup process and reboots somewhere soon after initializing USB devices (I see a bunch of 'uhub' scroll by in the console). It does this for single-user mode as well.

So I am not out of the woods yet, but I see some different trees.
 
And the reboot loop was caused by messing around I had done with /boot/loader.conf, specifically to vfs.root.mountfrom, messing around with the ZFS default root. Removing that setting (which I do not typically specify) got me booting successfully.

So, the whole thing summarized:
  1. Installed 13.0 -- known good hardware, onto a mirror vdev, with the ZFS partitions GELI-encrypted, with efi and freebsd-boot partitions on each drive (ada2 and ada3)
  2. Add second vdev after the fact (ada0 and ada1).
    1. Used only -b for decryption at boot, but needed -g flag
    2. Issue was masked until upgrade, because all the files critical for boot lived in the first vdevh (from installation time), and could be accessed with only that vdev's drives decrypted
  3. Upgrade to 13.1, power failure during upgrade
  4. ZFS errors out the wazoo (maybe some actual interrupted upgrade -- not sure if the interruption was ever a causal factor with hindsight) during boot
  5. Neglectful / rash attempts at fixing, trashing my EFI partitions
  6. Boot from install media and confirmed functioning filesystem, also continued freebsd-update in chroot
  7. Fix EFI partition with help in this thread
  8. ZFS errors out the wazoo
  9. geli configure -g ada0p1; geli configure -g ada1p1
  10. Fix /boot/loader.conf
  11. Success
Right now it is in the middle of pkg-static upgrade -f.

Thanks to everyone for the patience and help in troubleshooting and fixing this issue!
 
For the future: <https://www.freebsd.org/releases/13.0R/relnotes/#boot> the note about not using gpart(8).
And the handbook still tell us to use it:
1722995884163.png


I have to upgrade my zroot pool:
1722996028779.png


But I don't have any of the files used in any example:
1722996102257.png


How to proper update bootcode in 14.1? Do I do it with gpart or not? If not, what tool do I use it and how? How about the process sequence, do I update the bootcode before or after upgrading the zpool? (If it is before, a message about it should appear in the output of # zpool status).

Could someone kindly advice, please?
 
The problem the original poster had experienced after updating/upgrading his system was most likely caused by the 2TB limitation of his computer's firmware (BIOS/UEFI) combined with the reliance of gptzfsboot/loader on firmware interfaces for reading from disk devices.
The disks were 9.1TB in size — much bigger than 2TB and only the first 2TB could be read during boot attempts.

This problem is observed after a new boot environment is created (freebsd-update automatically creates one) and the system is rebooted. Some or all blocks allocated for the new boot environment happen to be lying beyond the 2TB boundary. The typical error messages include zio_read errors and all block copies unavailable.

Here's a link to another forum discussion about this problem: https://forums.freebsd.org/threads/...han-2tb-without-the-right-bios-setting.93101/
 
The problem the original poster had experienced after updating/upgrading his system was most likely caused by the 2TB limitation of his computer's firmware (BIOS/UEFI) combined with the reliance of gptzfsboot/loader on firmware interfaces for reading from disk devices.
The disks were 9.1TB in size — much bigger than 2TB and only the first 2TB could be read during boot attempts.

This problem is observed after a new boot environment is created (freebsd-update automatically creates one) and the system is rebooted. Some or all blocks allocated for the new boot environment happen to be lying beyond the 2TB boundary. The typical error messages include zio_read errors and all block copies unavailable.

Here's a link to another forum discussion about this problem: https://forums.freebsd.org/threads/...han-2tb-without-the-right-bios-setting.93101/
It very explicitly was not to do with the size of the disk. If you read through to my solution, you'll see that the problem was that the second VDEV (added *after* zpool creation) was not configured correctly for GELI decryption at boot. Until `freebsd-update`, all files necessary for boot were on the first VDEV only (by virtue of it having been the only one when they were written during installation).

The issue was that updating to 13.1 caused some of these files to be on the second VDEV. But boot time decryption was only configured for one VDEV. It turns out that ZFS doesn't like trying to read data from an encrypted drive.

The reason that the issue showed up only during boot from zpool, and not from installation media is that I was decrypting all drives via GELI commands from the installation media boot environment, so that the zpool was fine when accessed from that environment.

The solution was to correct the GELI configuration so that all drives participating in VDEVs are decrypted at boot.
 
I am willing to accept that encryption misconfiguration prevented ZFS from accessing some of the blocks it needed for a successful boot but I'd like to have a positive confirmation that the 2TB limitation doesn't exist on your system.

Can you do a simple experiment?
  1. Provided that the disk ada0 still has plenty of space after the ZFS partition, create a new partition there with a FAT16 or FAT32 filesystem.
  2. Put some files on the new filesystem.
  3. Reboot
  4. At the loader prompt execute the command set currdev=disk0p1: (replace disk0p1 with the correct value found in the output of lsdev) Setting currdev is the loader's equivalent of mounting a filesystem.
  5. Use the ls command to confirm that the files you put there are visible to the loader.
 
I am willing to accept that encryption misconfiguration prevented ZFS from accessing some of the blocks it needed for a successful boot but I'd like to have a positive confirmation that the 2TB limitation doesn't exist on your system.

Can you do a simple experiment?
  1. Provided that the disk ada0 still has plenty of space after the ZFS partition, create a new partition there with a FAT16 or FAT32 filesystem.
  2. Put some files on the new filesystem.
  3. Reboot
  4. At the loader prompt execute the command set currdev=disk0p1: (replace disk0p1 with the correct value found in the output of lsdev) Setting currdev is the loader's equivalent of mounting a filesystem.
  5. Use the ls command to confirm that the files you put there are visible to the loader.
What hardware are you investigating for this issue? The thread you linked referred to specific hardware with specific motherboard firmware settings for a compatibility mode. There is nothing in this two year old thread to indicate that I am on the same or similar hardware.

The reason that there's no indication of hardware specifics is because hardware was agreed upon as an unlikely causal factor. This hypothesis (hardware was not causal) was borne out when this thread was concluded and solved based on software configuration.

The thread you linked indicates ZFS read errors only when sufficient data is written, such that specific files' contents are pushed past 2TiB in disk location. The fix in this thread would not move files around or cause data to be un-written to before the 2TiB threshold in question in that other, unrelated thread.

Additionally, boot environments do not allocate space. A BE is a ZFS snapshot. zfs-update will write to a whole bunch of files, which, as ZFS is a COW FS, causes new blocks to be written, and those new blocks will obviously be in different locations than the originals; it is possible that these writes shift the physical location of file data past 2TiB.

zio_read errors simply indicate that that ZFS cannot access blocks. There are many potential causes for a block being inaccessible. In this two year old thread, the confirmed cause was misconfiguration of GELI. Both this thread and the thread you linked to both reference another thread. The OP of that other thread identified a hardware cause: they fixed their problem by re-seating SATA cables. The thread you linked identified that some specific UEFI firmware has configuration settings which affect reading disks larger than 2TiB during boot. These represent three distinct causes of the same zio_read error.

As I said, my issue here was a misconfiguration of GELI. My motherboard's firmware does not have a setting that conflicts with UEFI boot of disks larger than 2TiB. Files in an ms-basic-data partition with a FAT32 filesystem 9TiB into a disk are readable at the loader prompt without issue on my machine.
 
Back
Top