ZFS Boot Environment Weirdness

Just upgraded a system from 13.1 to 13.2.

All straight forward:

Code:
freebsd-update -r 13.2-RELEASE upgrade
freebsd-update install
shutdown -r now
freebsd-update install
pkg update
pkg upgrade
shutdown -r now

The system booted as normal, prompting for the zfs encryption password. FreeBSD loader. Boot.

It's at this point this went very very strange. I started having issues in RC after the kernel booted and was dropped out to sh. I found myself in a system for a hostname that wasn't the local machine!!!! At this stage I'm wondering did I slip into a parallel universe. Checking the zfs pool I noted that zroot appeared to be on `ada0` rather than the NVME disk. When I issued a `zpool list` I could see the zroot pool I was expecting but was not what was booted.

This should have been the correct zroot:
Code:
  pool: zroot
 state: ONLINE
config:

        NAME          STATE     READ WRITE CKSUM
        zroot         ONLINE       0     0     0
          nvd0p4.eli  ONLINE       0     0     0
          nvd1p4.eli  ONLINE       0     0     0

errors: No known data errors

No idea why this wasn't booted. I clearly hit this disk first time since I was prompted for the zfs password but after that I seem to have booted a disk plugged into the system from another host that wasn't wiped...
Some how the system had booted "something"? as zroot from ada0 which is part of another pool used for storage.

Code:
  pool: ssdpool
 state: ONLINE
  scan: scrub repaired 0B in 00:00:00 with 0 errors on Tue May  3 23:24:34 2022
config:

        NAME        STATE     READ WRITE CKSUM
        ssdpool     ONLINE       0     0     0
          raidz1-0  ONLINE       0     0     0
            da0     ONLINE       0     0     0
            da1     ONLINE       0     0     0
            da7     ONLINE       0     0     0
            ada0    ONLINE       0     0     0
            ada1    ONLINE       0     0     0
            ada2    ONLINE       0     0     0

I rebooted and escaped out to the loader prompt. Issuing a `lszfs zroot` I could see file systems I expected from the correct system such as `ROOT`, `bastille` `vm` and `poudriere`. I checked the boot environments and I had the following:

Code:
root@pegasus:~ # bectl list -a
BE/Dataset/Snapshot                                                    Active Mountpoint Space Created

13.1-RELEASE-p2_2023-01-21_162440
  zroot/ROOT/13.1-RELEASE-p2_2023-01-21_162440                         -      -          8K    2023-01-21 16:24
    zroot/ROOT/13.2-RELEASE-p3_2023-09-07_182150@2023-01-21-16:24:41-0 -      -          6.24G 2023-01-21 16:24

13.1-RELEASE-p5_2023-04-16_095139
  zroot/ROOT/13.1-RELEASE-p5_2023-04-16_095139                         -      -          8K    2023-04-16 09:51
    zroot/ROOT/13.2-RELEASE-p3_2023-09-07_182150@2023-04-16-09:51:39-0 -      -          5.80G 2023-04-16 09:51

13.1-RELEASE-p7_2023-09-07_181826
  zroot/ROOT/13.1-RELEASE-p7_2023-09-07_181826                         -      -          8K    2023-09-07 18:18
    zroot/ROOT/13.2-RELEASE-p3_2023-09-07_182150@2023-09-07-18:18:26-0 -      -          7.74M 2023-09-07 18:18

13.2-RELEASE-p3_2023-09-07_182150
  zroot/ROOT/13.2-RELEASE-p3_2023-09-07_182150                         NR     /          53.6G 2023-09-07 18:21

default
  zroot/ROOT/default                                                   -      -          7.32G 2022-03-25 12:24
    zroot/ROOT/13.2-RELEASE-p3_2023-09-07_182150@2023-09-07-18:21:50-0 -      -          2.50M 2023-09-07 18:21

I was unaware of any boot environment being on this system. Or perhaps I've suffered some memory loss and don't remember configuring them myself. Issued a `bectl activate 13.2-RELEASE-p3_2023-09-07_182150` and reboot and the system booted correctly.

What I don't know and what I really need help with is what state this system is currently in with regards future upgrades, boot environments and tidying this up. Clearly I want and should be booting `default` but I'm not familiar with using boot environments and I would very much appreciate some help on sorting this mess out. Ideally I want to get back to booting "default" and clearing out any of the remaining boot environments.

Thanks in advance.
 
Quick update. I tried to clean up the boot environments as follows:

Code:
root@pegasus:~ # bectl destroy -o zroot/ROOT/13.1-RELEASE-p2_2023-01-21_162440
root@pegasus:~ # bectl list -a
BE/Dataset/Snapshot                                                    Active Mountpoint Space Created

13.1-RELEASE-p5_2023-04-16_095139
  zroot/ROOT/13.1-RELEASE-p5_2023-04-16_095139                         -      -          8K    2023-04-16 09:51
    zroot/ROOT/13.2-RELEASE-p3_2023-09-07_182150@2023-04-16-09:51:39-0 -      -          9.93G 2023-04-16 09:51

13.1-RELEASE-p7_2023-09-07_181826
  zroot/ROOT/13.1-RELEASE-p7_2023-09-07_181826                         -      -          8K    2023-09-07 18:18
    zroot/ROOT/13.2-RELEASE-p3_2023-09-07_182150@2023-09-07-18:18:26-0 -      -          7.74M 2023-09-07 18:18

13.2-RELEASE-p3_2023-09-07_182150
  zroot/ROOT/13.2-RELEASE-p3_2023-09-07_182150                         NR     /          47.4G 2023-09-07 18:21

default
  zroot/ROOT/default                                                   -      -          7.32G 2022-03-25 12:24
    zroot/ROOT/13.2-RELEASE-p3_2023-09-07_182150@2023-09-07-18:21:50-0 -      -          2.51M 2023-09-07 18:21

root@pegasus:~ # bectl destroy -o zroot/ROOT/13.1-RELEASE-p5_2023-04-16_095139
root@pegasus:~ # bectl list -a
BE/Dataset/Snapshot                                                    Active Mountpoint Space Created

13.1-RELEASE-p7_2023-09-07_181826
  zroot/ROOT/13.1-RELEASE-p7_2023-09-07_181826                         -      -          8K    2023-09-07 18:18
    zroot/ROOT/13.2-RELEASE-p3_2023-09-07_182150@2023-09-07-18:18:26-0 -      -          29.6M 2023-09-07 18:18

13.2-RELEASE-p3_2023-09-07_182150
  zroot/ROOT/13.2-RELEASE-p3_2023-09-07_182150                         NR     /          37.4G 2023-09-07 18:21

default
  zroot/ROOT/default                                                   -      -          7.32G 2022-03-25 12:24
    zroot/ROOT/13.2-RELEASE-p3_2023-09-07_182150@2023-09-07-18:21:50-0 -      -          2.51M 2023-09-07 18:21
root@pegasus:~ #

root@pegasus:~ # bectl destroy -o zroot/ROOT/13.1-RELEASE-p7_2023-09-07_181826
root@pegasus:~ # bectl list -a
BE/Dataset/Snapshot                                                    Active Mountpoint Space Created

13.2-RELEASE-p3_2023-09-07_182150
  zroot/ROOT/13.2-RELEASE-p3_2023-09-07_182150                         NR     /          37.4G 2023-09-07 18:21

default
  zroot/ROOT/default                                                   -      -          7.32G 2022-03-25 12:24
    zroot/ROOT/13.2-RELEASE-p3_2023-09-07_182150@2023-09-07-18:21:50-0 -      -          11.8M 2023-09-07 18:21

root@pegasus:~ # bectl destroy -o default
root@pegasus:~ # bectl list -a
BE/Dataset/Snapshot                            Active Mountpoint Space Created

13.2-RELEASE-p3_2023-09-07_182150
  zroot/ROOT/13.2-RELEASE-p3_2023-09-07_182150 NR     /          37.4G 2023-09-07 18:21

root@pegasus:~ # bectl rename 13.2-RELEASE-p3_2023-09-07_182150 default
root@pegasus:~ # bectl list -a
BE/Dataset/Snapshot  Active Mountpoint Space Created

default
  zroot/ROOT/default NR     /          37.4G 2023-09-07 18:21

When the system reboots it's back to the same broken state where it's booted the zroot picked up from another disk. Again, dropping out to the loader prompt and doing `lszfs zroot` I can see the correct zroot layout but I can't seem to boot this now that the boot environments are blown away.
 
Looks like this might be solved now. In the end just booted the system with the FreeBSD memstick image. Could see that in this scenario that the zroot pool was the incorrect pool and was containing only "ada1p3" Forced creating a new pool with the partition and then destroyed it:

Code:
zpool create -f killme ada1p3
zfs destroy killme

Another reboot and the system booted correctly.

This was a wild wide for me and I hope this information helps someone. If at least provides for a funny WTF thread. Moral of the story is not to blindly upgrade a system without checking it's current state.

I may be back.... =)
 
Back
Top