My zpool exploded

tarkhil · Dec 27, 2021

Severl hours ago my zpool suddenly exploded. The server rebooted and could not boot.

Code:

root@:~ # zdb -AAA -F -d -e iile-boot
zdb: can't open 'iile-boot': Integrity check failed

Code:

root@:~ # zpool import
   pool: iile-boot
     id: 4380822407036168996
  state: FAULTED
status: One or more devices are missing from the system.
 action: The pool cannot be imported. Attach the missing
        devices and try again.
        The pool may be active on another system, but can be imported using
        the '-f' flag.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-3C
 config:

        iile-boot            FAULTED  corrupted data
          mirror-0           FAULTED  corrupted data
            gpt/iile-boot-1  UNAVAIL  cannot open
            gpt/iile-boot-0  ONLINE

   pool: iile
     id: 4721818964728306628
  state: FAULTED
status: One or more devices are missing from the system.
 action: The pool cannot be imported. Attach the missing
        devices and try again.
        The pool may be active on another system, but can be imported using
        the '-f' flag.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-3C
 config:

        iile                     FAULTED  corrupted data
          mirror-0               DEGRADED
            8445171921478463808  UNAVAIL  cannot open
            gpt/iile-0           ONLINE

Code:

root@:~ # zdb -AAA -e iile-boot

Configuration for import:
        vdev_children: 1
        version: 5000
        pool_guid: 4380822407036168996
        name: 'iile-boot'
        state: 0
        vdev_tree:
            type: 'root'
            id: 0
            guid: 4380822407036168996
            children[0]:
                type: 'mirror'
                id: 0
                guid: 15675021958327973475
                whole_disk: 0
                metaslab_array: 256
                metaslab_shift: 32
                ashift: 9
                asize: 536866193408
                is_log: 0
                create_txg: 4
                children[0]:
                    type: 'disk'
                    id: 0
                    guid: 1991294491525726088
                    path: '/dev/gpt/iile-boot-1'
                    whole_disk: 1
                    DTL: 2866
                    create_txg: 4
                children[1]:
                    type: 'disk'
                    id: 1
                    guid: 6740896146295478304
                    whole_disk: 1
                    DTL: 2172
                    create_txg: 4
                    path: '/dev/gpt/iile-boot-0'
        load-policy:
            load-request-txg: 18446744073709551615
            load-rewind-policy: 2
(very long time)
zdb: can't open 'iile-boot': Integrity check failed

ZFS_DBGMSG(zdb) START:
spa.c:5998:spa_import(): spa_import: importing iile-boot
spa_misc.c:411:spa_load_note(): spa_load(iile-boot, config trusted): LOADING
vdev.c:131:vdev_dbgmsg(): disk vdev '/dev/gpt/iile-boot-0': best uberblock found for spa iile-boot. txg 3110443
spa_misc.c:411:spa_load_note(): spa_load(iile-boot, config untrusted): using uberblock with txg=3110443
vdev.c:136:vdev_dbgmsg(): mirror-0 vdev (guid 15675021958327973475): metaslab_init failed [error=97]
vdev.c:136:vdev_dbgmsg(): mirror-0 vdev (guid 15675021958327973475): vdev_load: metaslab_init failed [error=97]
spa_misc.c:396:spa_load_failed(): spa_load(iile-boot, config trusted): FAILED: vdev_load failed [error=97]
spa_misc.c:411:spa_load_note(): spa_load(iile-boot, config trusted): UNLOADING
ZFS_DBGMSG(zdb) END

I'm currently saving zpool images, but are there any chances? What should I try? Or the kernel managed to destroy all data, leaving nothing?

UPD.

Code:

root@:~ # sysctl vfs.zfs.spa.load_verify_metadata=0
vfs.zfs.spa.load_verify_metadata: 1 -> 0
root@:~ # sysctl vfs.zfs.spa.load_verify_data=0
vfs.zfs.spa.load_verify_data: 1 -> 0

root@:~ # zpool import -f -R /mnt -o readonly -N iile-boot
internal error: cannot import 'iile-boot': Integrity check failed
Abort (core dumped)

root@:~ # tail /var/log/messages
Dec 27 13:41:52  ZFS[2167]: pool I/O failure, zpool=iile-boot error=97
Dec 27 13:41:52  ZFS[2171]: vdev problem, zpool=iile-boot path= type=ereport.fs.zfs.vdev.corrupt_data
Dec 27 13:41:52  ZFS[2175]: failed to load zpool iile-boot
Dec 27 13:41:54  ZFS[2183]: pool I/O failure, zpool=iile-boot error=97
Dec 27 13:41:54  ZFS[2187]: vdev problem, zpool=iile-boot path= type=ereport.fs.zfs.vdev.corrupt_data
Dec 27 13:41:54  ZFS[2191]: failed to load zpool iile-boot
Dec 27 13:41:54  ZFS[2199]: pool I/O failure, zpool=iile-boot error=97
Dec 27 13:41:54  ZFS[2203]: vdev problem, zpool=iile-boot path= type=ereport.fs.zfs.vdev.corrupt_data
Dec 27 13:41:54  ZFS[2207]: failed to load zpool iile-boot
Dec 27 13:41:54  kernel: pid 2131 (zpool), jid 0, uid 0: exited on signal 6 (core dumped)

UPD2. Looks like I've hit https://github.com/openzfs/zfs/issues/12559 and this issue is not in kernel yet. So beware of zstd!

tarkhil · Dec 28, 2021

Attempt to recover with

 zpool import -R /mnt -o readonly -f -N -FX  iile

resulted in panic. However, zdb -l shows some pretty alive uberblock. Looking for help from ZFS developers.

Cath O'Deray · Dec 28, 2021

zfs --version
sysrc -f /boot/loader.conf zfs_load openzfs_load

Also, maybe not directly relevant at the moment, but always good to know:

uname -aKU

diizzy · Dec 28, 2021

One of your drives seems dead?

Cath O'Deray · Dec 28, 2021

tarkhil said:
… like I've hit https://github.com/openzfs/zfs/issues/12559 and this issue is not in kernel yet. …

From GitHub there:

… fixed in PR #12177. …

– Livelist logic should handle dedup blkptrs #12177

<https://cgit.freebsd.org/src/commit/?id=86b5f4c121885001a472b2c5acf9cb25c81685c9> (2021-06-07)

Livelist logic should handle dedup blkptrs

<https://github.com/freebsd/freebsd-src/commit/86b5f4c121885001a472b2c5acf9cb25c81685c9> shows the commit in main.

<https://forums.freebsd.org/posts/536019> tarkhil mentioned 13.0RC1, forum searches find no mention of STABLE or CURRENT so let's assume:

13.0-RELEASE in this case
patch level not yet known

<https://bokut.in/freebsd-patch-level-table/#releng/13.0>

ralphbsz · Dec 28, 2021

Recover the data? Theoretically perhaps, practically difficult. Not without developers with internals knowledge. That becomes a tradeoff between the value of the data, and the value of the time of a developer.

How did this happen? What kernel / ZFS version were you running when it happened? Did you have dedup and/or compression enabled? Were you creating/destroying datasets? This kind of information is useful for two reasons: (a) to help people identify what could be the root cause (which bug, which hardware failure), which might help you avoid the problem in the future; (b) to help others deciding how to run their system. For example, if you found this problem in FreeBSD version 314-PI, then I might decide to instead run version 2718-E. Or I might avoid using "squish" compression, and use "stomp" instead.

blind0ne · Dec 28, 2021

Really, why I always hear "don't use software raid's", after such posts the "scare" is much more larger. On the other hand heard about few projects that already few decades or less running zfs and are fine as far as I "checked".
Don't know how lucky or smart you should be to run both variants, hope 'op' won't lost data.

Cath O'Deray · Dec 29, 2021

ralphbsz said:
… avoid the problem …

There's discussion of avoidance in two or more of the issues in GitHub.

tarkhil · Dec 29, 2021

diizzy said:
One of your drives seems dead?

Long ago, that's not THE problem.

ralphbsz said:
Recover the data? Theoretically perhaps, practically difficult. Not without developers with internals knowledge. That becomes a tradeoff between the value of the data, and the value of the time of a developer.

How did this happen? What kernel / ZFS version were you running when it happened? Did you have dedup and/or compression enabled? Were you creating/destroying datasets? This kind of information is useful for two reasons: (a) to help people identify what could be the root cause (which bug, which hardware failure), which might help you avoid the problem in the future; (b) to help others deciding how to run their system. For example, if you found this problem in FreeBSD version 314-PI, then I might decide to instead run version 2718-E. Or I might avoid using "squish" compression, and use "stomp" instead.

FreeBSD 13.0, no dedup, comression, can't recall if it was lz4 or zstd. I've thought of trying FreeBSD 12.2 to read the pool, yes.

blind0ne said:
Really, why I always hear "don't use software raid's", after such posts the "scare" is much more larger. On the other hand heard about few projects that already few decades or less running zfs and are fine as far as I "checked".
Don't know how lucky or smart you should be to run both variants, hope 'op' won't lost data.

The problem is not in software raid. zfs somehow exploded. I was able to import one pool, but that was boot one, so data on it are of very little value.

grahamperrin said:
There's discussion of avoidance in two or more of the issues in GitHub.

Avoid the problem is the best solution. Can I rent your time machine?

tarkhil · Dec 29, 2021

/usr/local/sbin/zdb -u -l /dev/vtbd1p3 > /root/uber.txt

shows a pretty nice uberblocks. After panic pool seems to be a bit more alive.

Now I'll try txgs as listed in zdb -u output

Cath O'Deray · Dec 29, 2021

tarkhil said:
Can I rent your time machine?

;-) the avoidance advice was intended for other readers.

grahamperrin said:
zfs --version
sysrc -f /boot/loader.conf zfs_load openzfs_load
uname -aKU

Compare with <https://forums.freebsd.org/threads/...s-freebsd-13-0-release-use.81726/#post-548577>

tarkhil said:
… Looking for help from ZFS developers.

For data recovery in your case, I'm no expert, but I reckon:

current data should be recoverable, with or without an import – excluding data that was, or should have been, intentionally destroyed; zfs-destroy(8) | zfs-destroy(8)
snapshot content might be trickier – gut feeling.

tarkhil · Dec 29, 2021

grahamperrin said:
;-) the avoidance advice was intended for other readers.

Compare with <https://forums.freebsd.org/threads/...s-freebsd-13-0-release-use.81726/#post-548577>

For data recovery in your case, I'm no expert, but I reckon:

current data should be recoverable, with or without an import – excluding data that was, or should have been, intentionally destroyed; zfs-destroy(8) | zfs-destroy(8)

snapshot content might be trickier – gut feeling.

 

root@gw:/usr/ports # zfs --version

zfs-2.0.0-FreeBSD_gf11b09dec

zfs-kmod-v2021121500-zfs_f291fa658

root@gw:/usr/ports # sysrc -f /boot/loader.conf zfs_load openzfs_load

sysrc: unknown variable 'zfs_load'

openzfs_load: YES

root@gw:/usr/ports # uname -aKU

FreeBSD gw.iile.ru 13.0-RELEASE FreeBSD 13.0-RELEASE #0 releng/13.0-n244733-ea31abc261f: Fri Apr  9 04:24:09 UTC 2021     root@releng1.nyi.freebsd.org:/usr/obj/usr/src/amd64.amd64/sys/GENERIC  amd64 1300139 1300139

I don't care of zfs destroy'ed data, they were intended to be destroyed. I'll continue experiments, I have a copy of data and New Year vacations anyway.

tarkhil · Dec 29, 2021

Okay, after some panics and attempts, zdb shows plenty of data, files, directories.

But

/usr/local/sbin/zpool import -o readonly -R /mnt -f -N iile

produces in logs

 

Dec 29 15:42:04 recover kernel: vtbd1: hard error cmd=write 1048577080-1048577095

Dec 29 15:42:04 recover kernel: vtbd1: hard error cmd=write 11721044024-11721044039

Dec 29 15:42:04 recover kernel: vtbd1: hard error cmd=write 11721044536-11721044551

Dec 29 15:42:04 recover ZFS[780]: vdev I/O failure, zpool=iile path=/dev/gpt/iile-0 offset=270336 size=8192 error=5

Dec 29 15:42:04 recover ZFS[784]: vdev I/O failure, zpool=iile path=/dev/gpt/iile-0 offset=5464303345664 size=8192 error=5

Dec 29 15:42:04 recover ZFS[788]: vdev I/O failure, zpool=iile path=/dev/gpt/iile-0 offset=5464303607808 size=8192 error=5

Dec 29 15:42:04 recover ZFS[792]: vdev probe failure, zpool=iile path=/dev/gpt/iile-0

Dec 29 15:42:04 recover ZFS[796]: vdev state changed, pool_guid=4721818964728306628 vdev_guid=1211236719488683795

Dec 29 15:42:04 recover ZFS[800]: vdev problem, zpool=iile path= type=ereport.fs.zfs.vdev.no_replicas

Dec 29 15:42:04 recover ZFS[804]: failed to load zpool iile

Dec 29 15:42:04 recover ZFS[808]: failed to load zpool iile

Dec 29 15:42:04 recover ZFS[812]: failed to load zpool iile

Disk is pretty healthy, dd copied it without a problem.

 

# /usr/local/sbin/zpool import -o readonly -R /mnt -f -F -N iile



Dec 29 15:45:21 recover kernel: vtbd1: hard error cmd=write 11721044024-11721044039

Dec 29 15:45:21 recover kernel: vtbd1: hard error cmd=write 11721044536-11721044551

Dec 29 15:45:21 recover kernel: vtbd1: hard error cmd=write 1048577080-1048577095

Dec 29 15:45:21 recover ZFS[842]: vdev state changed, pool_guid=4721818964728306628 vdev_guid=1211236719488683795

Is it possible (not "in theory", but with some description) to read files with zdb?

Cath O'Deray · Dec 29, 2021

Thanks,

tarkhil said:
ea31abc261f

<{link removed}> was a few months ago, if that's your ~~boot~~ usual boot pool I should recommend updating the base OS.

Cath O'Deray · Dec 29, 2021

If you're confident that there's not a hardware issue, have you tried an extreme rewind?

(Proceed with caution; I haven't done so for years.)

tarkhil said:
Is it possible (not "in theory", but with some description) to read files with zdb?

Does <{link removed}> or some other combination of those search phrases lead to anything relevant? (I took a quick look at a handful, nothing immediately promising.)

If you have not already seen it: {link removed} (2018-03-14)

Besides being able to display the new debug information, zdb has another new feature …

– some discussion of extreme rewinds, although that's not how I found the post.

<{link removed}> or zdb(8).

tarkhil · Dec 29, 2021

grahamperrin said:
If you're confident that there's not a hardware issue, have you tried an extreme rewind?

(Proceed with caution; I haven't done so for years.)

Does <https://www.google.com/search?q=zdb+recover+import+site:reddit.com/r/zfs/&tbs=li:1#unfucked> or some other combination of those search phrases lead to anything relevant? (I took a quick look at a handful, nothing immediately promising.)

If you have not already seen it: Turbocharging ZFS Data Recovery | Delphix (2018-03-14)

– some discussion of extreme rewinds, although that's not how I found the post.

<https://openzfs.github.io/openzfs-docs/man/8/zdb.8.html> or zdb(8).

I've copied the data out of the disk and now running
/usr/local/sbin/zpool import -R /mnt -o readonly -f -N -FX iile

On the latest openzfs from ports. For 3.5 hours it's running, no panic, no fail, no visible result (yet). Anyway image is on ZFS, snapshoted, I can always to rollback and start over.

Cath O'Deray · Dec 29, 2021

Thanks, do I understand this correctly?

zfs-2.1.99-1 and zfs-kmod-v2021121500-zfs_f291fa658 were enough to import (read only) with an -X extreme rewind, after the inferior versions failed to import with (outdated) 13.0-RELEASE #0 releng/13.0-n244733-ea31abc261f

(Sorry, I'm being slightly lazy with my readings of your notes, without looking properly at the manual pages!)

tarkhil · Dec 30, 2021

grahamperrin said:
Thanks, do I understand this correctly?

zfs-2.1.99-1 and zfs-kmod-v2021121500-zfs_f291fa658 were enough to import (read only) with an -X extreme rewind, after the inferior versions failed to import with (outdated) 13.0-RELEASE #0 releng/13.0-n244733-ea31abc261f

(Sorry, I'm being slightly lazy with my readings of your notes, without looking properly at the manual pages!)

Yes. But unfortunately only for one, not important pool. The second one crashed after several hours of importing, now I'm trying to rewind to the oldest transaction found by zdb.

Cath O'Deray · Dec 30, 2021

tarkhil said:
… Looking for help from ZFS developers. …

I pinged a couple of chat areas (Discourse and IRC), where those experts are likeliest to hang out, with reference to your most recent post:

{link removed}
{link removed}

tarkhil said:
… now I'm trying to rewind to the oldest transaction found by zdb.

I'll edit responses from IRC and Discord, as they arrive, into this post. (Don't want to flood this topic.) Here, pasted with permission:

Recent zdb can just copy out whole files, …

zdb -r

it can try, anyway. obviously subject to the same issues as any other reading of the pool.

if they want to not wait hours for the import, they could just set vfs.zfs.spa.load_verify_metadata=0 and do it, as they tried before.

I'm also curious because I've never actually seen "integrity check failed", so I'm guessing that's a FBSD-specific error code

tarkhil · Dec 30, 2021

Ultimately, it helped. I don't know how much data I've lost and actually don't care, I have the most important data intact (or I hope to find it intact after copying from R/O-mounted pool).
So, zdb -u -l ...
gives list of possible txg's, and
/usr/local/sbin/zpool import -R /mnt -o readonly -f -N -FX -T txg pool
after some time returns seemingly working pool.

And again I think that ZFS data-protection was inspired by Stalingrad defence.

Cath O'Deray · Dec 30, 2021

Code:

 ________________________________
< cowsay cow c o w copy on write >
 --------------------------------
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||