Solved Degraded ZFS pool reports wrong disks information

Hello, i am testing ZFS mirror installation in a server, but when i get the mirror degraded the disk information reported by zpool status seems incorrect.
Its a fresh installation of: 13.0-RELEASE
The partition layout generated by the installer for both disks, ada0 and ada1 are:
Code:
root@odissey:~ # gpart show ada0
=>       40  976773088  ada0  GPT  (466G)
         40       1024     1  freebsd-boot  (512K)
       1064        984        - free -  (492K)
       2048    4194304     2  freebsd-swap  (2.0G)
    4196352  972576768     3  freebsd-zfs  (464G)
  976773120          8        - free -  (4.0K)
Code:
root@odissey:~ # gpart show ada1
=>       40  976773088  ada1  GPT  (466G)
         40       1024     1  freebsd-boot  (512K)
       1064        984        - free -  (492K)
       2048    4194304     2  freebsd-swap  (2.0G)
    4196352  972576768     3  freebsd-zfs  (464G)
  976773120          8        - free -  (4.0K)
After installation pool information is:
Code:
root@odissey:~ # zpool status
  pool: zroot
 state: ONLINE
config:

    NAME        STATE     READ WRITE CKSUM
    zroot       ONLINE       0     0     0
      mirror-0  ONLINE       0     0     0
        ada0p3  ONLINE       0     0     0
        ada1p3  ONLINE       0     0     0

errors: No known data errors
If I remove the disk ada0 I get:
Code:
root@odissey:~ # zpool status
  pool: zroot
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
    invalid.  Sufficient replicas exist for the pool to continue
    functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-4J
config:

    NAME        STATE     READ WRITE CKSUM
    zroot       DEGRADED     0     0     0
      mirror-0  DEGRADED     0     0     0
        ada0p3  FAULTED      0     0     0  corrupted data
        ada0p3  ONLINE       0     0     0

errors: No known data errors

In the link reporter by zpool status it seems to report correct information.
Why zpool status is reporting ada0p3 as ONLINE and FAULTED simultaneously?
Why is not showing ada1p3 as ONLINE and ada0p3 as FAULTED?

Best regards.
 
Last edited by a moderator:
I think that the reason is that device names are remapped when a disk is missing, my first stage was two healthy disks:
Code:
root@odissey:~ # camcontrol devlist
<WDC WDS500G1B0A-00H9H0 X41100WD>  at scbus0 target 0 lun 0 (ada0,pass0)
<Samsung SSD 850 EVO 500GB EMT02B6Q>  at scbus1 target 0 lun 0 (ada1,pass1)
<AHCI SGPIO Enclosure 2.00 0001>   at scbus6 target 0 lun 0 (ses0,pass2)
<HP iLO LUN 00 Media 0 2.09>       at scbus7 target 0 lun 0 (da0,pass3)
I disconnect Westerndigital disk:
Code:
root@odissey:~ # camcontrol devlist
<Samsung SSD 850 EVO 500GB EMT02B6Q>  at scbus1 target 0 lun 0 (ada0,pass0)
<AHCI SGPIO Enclosure 2.00 0001>   at scbus6 target 0 lun 0 (ses0,pass1)
<HP iLO LUN 00 Media 0 2.09>       at scbus7 target 0 lun 0 (da0,pass2)

As you can see, Samsung disk has been remapped to ada0
 
That is completely normal, expected, and correct. Disk names of the style "ada0, ada1..." (Linux uses for example "sda, sdb...") are named in the order they are discovered. If the device that was named ada0 is not there today, then the device that was ada1 yesterday will become ada0 today.

ZFS itself doesn't care, it looks at all the disks it can find, and uses (or doesn't use them) as appropriate. The human sys admin might get confused if they don't know that.

Ways around it? The best one is: Use gpart to give human-readable string labels to your disks, which are stored in the GPT on the disk. Then ZFS will display those. For example:
Code:
  pool: home
 state: ONLINE
  scan: scrub repaired 0 in 0 days 02:57:32 with 0 errors on Mon May  2 05:42:32 2022
config:

        NAME               STATE     READ WRITE CKSUM
        home               ONLINE       0     0     0
          mirror-0         ONLINE       0     0     0
            gpt/hd14_home  ONLINE       0     0     0
            gpt/hd16_home  ONLINE       0     0     0
The two hard disks are called HD (for hard disk), 14 and 16 (for the year I bought them), and then the partition on them are labelled "home". If you open the case of my computer (please don't!), you will find a big sticky label attached to the disk, and it says "HD14" on it.
 
Its a good advice to use labels, but the FreeBSD installer allows it or should i boot the installer iso with shell prompt, generate labels and then start the installation wizzard?
 
It's not going to matter in the current situation, just something to keep in mind next time you do an installation.

For now you just need to keep in mind if one of those drive fails their nomination can and will shift around. ZFS isn't going to care about it, it'll find the drives regardless of their actual nomination. It's just a bit confusing for us humans when you have to replace that drive.

This is even worse when dealing with hotswap disks. The new disk you slide in might be added as ada3 but after a reboot could end up as ada0 (due to the order in which they're connected). I've made the mistake of trying to add the wrong disk this way numerous times. So now I check and double check (even going as far as noting the drive's serial number; which you can check with smartctl(8)).
 
Its a good advice to use labels, but the FreeBSD installer allows it or should i boot the installer iso with shell prompt, generate labels and then start the installation wizzard?
You can do that manually after the fact with gpart. The man page is sufficient; I think there used to be a "todo help file" that explained how.
 
Back
Top