Solved GPT invalid or corrupt partition table on reboot

This might be a silly question, but ... I'm on older hardware and my hard drives are no different. I recently replaced 2 500GB drives that appeared to be failing as disk access slowed tremendously. Upon further inspection, the raw error rate skyrocketed.

The drive in question did NOT appear to have a significantly higher raw error rate as I did just check that yesterday because disk operations did seems to have slowed. It seemed pretty close to what it was when I installed FreeBSD on the disk (I modified my install / restore scripts to also take a snapshot of smartctl data so I could identify problems earlier and hopefully recover before I am in the spot I am now). Anyways, I rebooted this morning to apply package updates (I create new Boot Environments for any type of update) and am not able to boot up. I swapped out for a 2 week old backup and tried to pull my files off of that disk, but there is no partition table or GPT is complaining it is invalid or corrupt. I did not make a backup of the partition table.


1. Would it be wise to make a backup of the partition table?
2. Is it possible to recover the GPT partition table? While I have backups (from yesterday), not all of my work was committed to git, I have some uncommitted changes that I'd like to recover if possible.
3. Would it be useful to check the partition table prior to rebooting to verify it is still intact? I suppose that since I can't even see the partition table and am seeing CAM errors, the drive is dead. I wonder if these errors were in the logs the night before and I failed to notice.

usb_msc_auto_quirk: UQ_MSC_NO_GETMAXLUN set for USB mass storage device Seagate FA GoFlex Desk (0x0bc2:0x5070)
usb_msc_auto_quirk: UQ_MSC_NO_PREVENT_ALLOW set for USB mass storage device Seagate FA GoFlex Desk (0x0bc2:0x5070)
ugen0.2: <Seagate FA GoFlex Desk> at usbus0
umass0 on uhub1
umass0: <Seagate FA GoFlex Desk, class 0/0, rev 2.00/1.55, addr 2> on usbus0
umass0: SCSI over Bulk-Only; quirks = 0x8100
umass0:3:0: Attached to scbus3
da0 at umass-sim0 bus 0 scbus3 target 0 lun 0
da0: <Seagate FA GoFlex Desk 0155> Fixed Direct Access SPC-2 SCSI device
da0: Serial Number 2HC015KJ
da0: 40.000MB/s transfers
da0: 476940MB (976773167 512 byte sectors)
da0: quirks=0x2<NO_6_BYTE>
GEOM: da0: corrupt or invalid GPT detected.
GEOM: da0: GPT rejected -- may not be recoverable.
(da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 00 00 00 80 00 00 10 00
(da0:umass-sim0:0:0:0): CAM status: SCSI Status Error
(da0:umass-sim0:0:0:0): SCSI status: Check Condition
(da0:umass-sim0:0:0:0): SCSI sense: ABORTED COMMAND asc:47,3 (Information unit iuCRC error detected)
(da0:umass-sim0:0:0:0): Retrying command (per sense data)
(da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 00 00 00 80 00 00 01 00
(da0:umass-sim0:0:0:0): CAM status: SCSI Status Error
(da0:umass-sim0:0:0:0): SCSI status: Check Condition
(da0:umass-sim0:0:0:0): SCSI sense: ABORTED COMMAND asc:47,3 (Information unit iuCRC error detected)
(da0:umass-sim0:0:0:0): Retrying command (per sense data)

I looked at https://forums.freebsd.org/threads/solved-gpt-table-corrupt.46162/

I tried gpt recover da0, but I get
gpart: arg0 'da0': Invalid argument

I suspect the drive has bigger issues and am out of luck.
 
I think to recover you need to create a backup copy first, then use that file in the recover.

gpart backup da0 > da0.gpart
then
gpart recover da0 < da0.gpart

But I think those CAM errors indicate bigger issues as you say.
 
1. Would it be wise to make a backup of the partition table?
I think to recover you need to create a backup copy first
Partition table backups are created periodically by default and stored under /var/backups. New backups are created only if there is a change in the partition table (including device names changes, partition label changes).

Code:
 # ls -l /var/backups
...
-rw-r--r--  1 root wheel       391 Feb 28  2024 gpart.nda0.bak
-rw-r--r--  1 root wheel       386 Feb 22  2024 gpart.nda0.bak2
-rw-r--r--  1 root wheel       380 Apr  5  2023 gpart.nvd0.bak
-rw-r--r--  1 root wheel       341 Aug  3  2022 gpart.nvd0.bak2

/etc/defaults/periodic.conf
Rich (BB code):
80 # 221.backup-gpart
81 if [ $(sysctl -n security.jail.jailed) = 0 ]; then
82         # Backup partition table/boot partition/MBR
83         daily_backup_gpart_enable="YES"
84 else
85         daily_backup_gpart_enable="NO"
86 fi

Default
Code:
 % sysctl  security.jail.jailed
security.jail.jailed: 0

/etc/periodic/daily/221.backup-gpart
Rich (BB code):
 9 ## If there is a global system configuration file, suck it in.
10 ##
11 if [ -r /etc/defaults/periodic.conf ]
12 then
13         . /etc/defaults/periodic.conf
14         source_periodic_confs
15 fi
16
17 bak_dir=/var/backups
 
Ahh. That helps but in the OP position, I'm guessing those auto created backups are not available because they would be on da0.
But knowing they are autocreated by periodic, one could get into the habit of copying them to a flash drive.
 
That helps but in the OP position, I'm guessing those auto created backups are not available because they would be on da0.
and as you said
I think those CAM errors indicate bigger issues
What I would try is another USB cable, then removing the disk from the enclosure and plug it in to another enclosure, or in one of the motherboards storage controller interfaces.

Once I had a short circuit on a (cheap) power outlet bar which fried something on a external USB disk enclosure. I thought the disk was gone, but changing the enclosure with a working one brought back the disk. I can't say anything about error messages, this happened on a libreelec media server.

knowing they are autocreated by periodic, one could get into the habit of copying them to a flash drive.
In any case, and not only partition table backups but the whole /var/backups should be copied. If there is a (cronjob) backup script for of-device, of-machine backups, it should be included as well.
 
Removing the disk from the enclosure and plug it in to another enclosure, or in one of the motherboards storage controller interfaces.
Yup or USB to SATA adapter and see what shows up. If device with partitions under /dev/da0 then look at gpart show da0
Then consider fsck before gpart recover.
Maybe run sysutils/smartmontools short test before to see if drive errors out.
I have a few USB to SATA adapters and some pass thru SMART and some do not.
 
The reason I ask about GoFlex device is that you need to be weary of any particular scheme a manufacturer might put on a disk.

I have worked on Western Digital Books and those you can tear drive out of it and work on those.

What I worry about is dual disk RAID type NAS devices.
Those might have a special scheme on them for RAID where you cannot simply work on them outside of the device.
Usually these have a Web Interface and you need to treat these differently.
 
I will try in another enclosure, if that fails, I will unplug my media drive and put this in there.

EDIT:
My other enclosure was apparently able to read the partition table, I'm trying to mount ... Whenever I rebooted with this drive in, the strange this was my root partition could not be found. Strange, I am able to see it with ZFS ...

I guess this is resolved, so far, I am able to recover my files. I will mark this drive as toast and scrap it. I suppose after 12.5 years the drive had enough ...
 
Is this Go-Flex a single drive unit? FreeBSD is installed on it correct?
Yes, and no. I took the drive out of the enclosure so I could use any disk with it and while that served me well for a few years, I think I shouldn't do that. Whenever I do my installs, I do them on internal drives. I use this for recovery or to perform an offline sync of drives.

The other process I do which I think works well is offline ZFS replication ... I add a device to a pool to mirror it and thus create a backup. Then, once it is resilvered, I offline the drive, remove it and put it in a fire box. I don't change my media that frequently, when I do, I resilver the drives. I keep at least 2 cold backups.
 
I think to recover you need to create a backup copy first, then use that file in the recover.

gpart backup da0 > da0.gpart
then
gpart recover da0 < da0.gpart

But I think those CAM errors indicate bigger issues as you say.
So, I will modify my install process. Right now, I backup the geli metadata and dump smartctl data. I will also backup the gpart data so that if I encounter this issue in the future, I *might* have a chance to recover from it easily.
 
For the record, I did not see any CAM errors on the drive in the enclosure that is working, I only saw that with the one enclosure. So, then, this is all that is needed to recover the partition table:

GPT 128
1 efi 40 532480 efiboot0
2 freebsd-boot 532520 1024 gptboot0
3 freebsd-zfs 534528 976238592 zfs0

I put the drive in a backup and attempted to boot back up. I was able to get the exact same error message:

vfs.root.mountfrom=zfs:z_500.6/ROOT/SOME_BE

Prior to that, I see zfs was not found and subsequently it cannot mount the zfs volume.

I came across this:

I do my own custom kernel so that I can do traffic shaping, so I wonder if perhaps I must have done a partial install and thus the kernel and its modules were not completely installed? When it boots up, FreeBSD prompts me to unlock the drive, but fails to mount root.
 
Back
Top