Some background info on my issue:
I observed the hardware issue I had during the restore operations as well, namely my machine just did a hard reset out of nowhere, no panic no alerts, just a reset, probably a power supply issue, or the start of some kind of fault in my motherboard. As I already wrote the initial state was that the system did a boot tried to import zroot that did a panic did a reboot and it got back to square one, at this point zpool import was not possible.
Importing the zroot pool (when it was working) and then doing a reset sometime and doing this multiple times took a toll on my pool. My part in the fault was that I did not know about this, so if I would had some kind of watchdog that sends a mail maybe that the system just booted, or last reboot was not done by a user, or something like that this could have been avoided.
I did some research on zdb and looked into rescuing zfs pools in the last couple of days, and hopefully that was enough to bring back my data.
I just copied my most valuable stuff to safety, and I'm running an all inclusive sync to another pool just to keep me sane. I'm decomissioning the system that had this issue, however probably I'm keeping the disks because they operate just fine and show no issues with smartctl.
The rescue process:
I have a standard ZFS auto installation which has a freebsd-zfs type "parition" on ada#p3
I exported the uberblocks to a text file:
zdb -ul /dev/ada0p3 > /tmp/uberblocks.txt
I looked into the timestamps there were mostly two timestamps, some from 5:22 AM and some from 03:29AM , because I had no big disk ops the last day I went with the transactional group that shown 03:29AM , because I though the farther away it was done from the issue the healthier the pool was. The time difference probably counts because it took almost a day to roll back the pool to that point. So I picked a txg that I used with the pool import (-T txg#) and it started the process, my commands was:
zpool import -N -o readonly=on -f -R /mnt -F -T 21347495 zroot
-N : import without mounting
-o readonly=on : pool will be imported in readonly mode
-f : force import, this is needed if there is a sign that the pool is in use or there was a crash so export was not done properly, also needed if there are missing devices in the pool afaik
-R /mnt : altroot , and disables cachefile
-F : Recovery mode for a non-importable pool, zfs will try to discard the last few transactions
-T : I found no entry for this switch in the manual, but I found it in some articles and maillist threads that zfs will try to roll back to this transaction if given
I monitored disk usage with gstat and most of the time disk usage was 100%, no other zfs related command went through which is understandable (I tried to create a new volume on another pool just for kicks). This took around 16 or 17 hours for me on a 3TB disk.
And voila my pool was there in read-only mode (it's degraded because I just plugged in half of the mirror so if I mess up I still have a chance):
Code:
root@back1:~ # zpool status
pool: zroot
state: DEGRADED
status: One or more devices could not be opened. Sufficient replicas exist for
the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
see: http://illumos.org/msg/ZFS-8000-2Q
scan: scrub repaired 0 in 0 days 06:57:25 with 0 errors on Sun Aug 2 10:10:54 2020
config:
NAME STATE READ WRITE CKSUM
zroot DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
gpt/zfs2 ONLINE 0 0 0
15526455252259167746 UNAVAIL 0 0 0 was /dev/gpt/zfs1
errors: No known data errors
I'm pretty impressed how this came out, I thought my pool was toast... I had multiple disk failures since I'm using ZFS (usually every 2 or 3 years), and I just popped in new bigger disks resilvered the mirror and it was good to go, I never had a catastrophic failure like this before. I learned my lesson, I will categorize my data, and do some kind of offsite or cloud backup for the stuff that I value the most. I will also monitor my machine for any strange hw issue.