ZFS Fixing partition alignment on a ZFS boot disk?

Code:
~ gpart show nda1
=>      34  62914493  nda1  GPT  (30G)
        34       345     1  freebsd-boot  (173K)
       379     66584     2  efi  (33M)
     66963   2097152     3  freebsd-swap  (1.0G)
   2164115  60748397     4  freebsd-zfs  (29G)
  62912512      2015        - happy? -  (1.0M)

I don't care about the first two partitions. I do care about swap and the zfs partition.

I was contemplating gpart backup, making edits, then gpart restore onto a new disk, then dd the contents across. But I don't think ZFS would take kindly to an unannounced lift-and-shift onto new LBAs.

Are zfs send and zfs recv the answer? Or perhaps create a mirror vdev from the old+new partitions, boot from the new device, then break the mirror at which point ZFS complains about the missing device for the rest of eternity?

This is a rather messy problem...
 

Attachments

  • 1739441998595.png
    1739441998595.png
    30.1 KB · Views: 39
if you move the partition towards the beginning of the disk and are mega-brave then you can dd it directly without a spare disk
you still have to do it by booting from external media
the theory is like
dd if=/dev/nda1p4 bs=1m |dd of=/dev/nda1 seek=1057 bs=1m
the 2nd dd overwrites what the 1st has already read

a power failure would foobar everything hard

then you fix the partition table so the zfs part starts at 1057m
the swap part would be shrinked with something less than 1mb

don't try this at home
 
Never dd zfs providers! For one it circumvents all integrity checks and self-healing capabilities of zfs and may cause damage to the vdev or even the whole pool (e.g. due to single-bit errors in zfs metadata during the dd - been there, wasn't funny) and secondly it is horribly inefficient compared to a proper zfs resilver.

Add a new disk (image) to the VM, create the GPT table with properly aligned partitions and add the bootcode/uefi partition etc, then zpool attach(8) the new zfs partition to the existing vdev and let it resilver. After resilvering zpool detach(8) the old 'misaligned' provider. Shutdown the VM, remove old disk/image from VM configuration, start VM, collect underpants, then profit.


That being said - depending on what kind of disk image the hypervisor is using, the partition alignment is completely irrelevant anyways...
 
then you fix the partition table so the zfs part starts at 1057m

Out of curiosity, how do you fix the partition table? Is there a tool for that?
I said that because it 's somewhat complex to do by hand.

I imagine:

- Copy the fourth entry at the first place.
- Change the LBA start and end of the partition in this entry.
- Clear to zero the others entries (it's not clear if we get rid of the swap partition or not).
- Recompute the checksum (CRC32) of the table.
- Copy at the end of the disk this new table and correct the checksum in the corresponding header.
 
I can write you a short how-to migrate the zpool from one disk to another using send/recieve. As this is VM why you are using ZFS on it and what is your expectation of aligning the partitions on the virtual disk?
 
Out of curiosity, how do you fix the partition table? Is there a tool for that?
I said that because it 's somewhat complex to do by hand.

I imagine:

- Copy the fourth entry at the first place.
- Change the LBA start and end of the partition in this entry.
- Clear to zero the others entries (it's not clear if we get rid of the swap partition or not).
- Recompute the checksum (CRC32) of the table.
- Copy at the end of the disk this new table and correct the checksum in the corresponding header.
you can just delete and recreate with another starting block[/size] using gpart
 
I just had the bright idea to set up a new installation from scratch, mount my old zroot to the new install, and copy over my scripts, confs, tunables, and whatnot. DEAR LORD what a trainwreck when two zroot pools are present. ZFS is bratty and I hate it a little right now. There's got to be a clean way to do this...
 
In the past I moved partitions because of alignment nags. They never got any faster. I guess don't really understand all the bits involved. Maybe it isn't an exact science.
 
too many translation layers in between. if the disk has native 4k and emulates 512b then it might be the case that a fs block of 16k spawns 5 physical blocks instead of 4 if alignment is not optimal
same with ssd write/erase zones which are 32k or something
 
Here's how you can migrate your ZFS zroot to a new disk (da1). First you will need a USB with the same FreeBSD version of the current running OS.
Boot from the FreeBSD installation and select Live system. Then login with root w/o password

#The current running da0 disk which we going to migrate to a new disk (da1)
Code:
# gpart show
=>       40  266338224  da0  GPT  (127G)
         40     532480    1  efi  (260M)
     532520       1024    2  freebsd-boot  (512K)
     533544        984       - free -  (492K)
     534528    4194304    3  freebsd-swap  (2.0G)
    4728832  261607424    4  freebsd-zfs  (125G)
  266336256       2008       - free -  (1.0M)

# camcontrol devlist
<Msft Virtual Disk 1.0> at scbus0 target 0 lun 0 (pass0,da0)
<Msft Virtual Disk 1.0> at scbus0 target 0 lun 1 (pass1,da1)

# Create a new partitioning scheme on the new disk
gpart create -s gpt da1

# Add a new efi system partition (ESP)
gpart add -a 4k -l efiboot0 -t efi -s 260M da1

# Format the ESP
newfs_msdos da1p1

# Add new Boot partition for Legacy boot (BIOS)
gpart add -a 4k -l gptboot0 -t freebsd-boot -s 512k da1

# Add the protective master boot record and bootcode
gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 2 da1

# Create new swap partition
gpart add -a 1m -l swap0 -t freebsd-swap -s 2G da1

# Create new ZFS partition to the rest of the disk space
gpart add -a 1m -l zfs0 -t freebsd-zfs da1

# mount the ESP partition
mount_msdosfs /dev/da1p1 /mnt

# Create the directories and copy the efi loader in the ESP
mkdir -p /mnt/efi/boot
mkdir -p /mnt/efi/freebsd
cp /boot/loader.efi /mnt/efi/boot/bootx64.efi
cp /boot/loader.efi /mnt/efi/boot/loader.efi

# Create the new UEFI boot variable and unmount the ESP
efibootmgr -a -c -l /mnt/efi/boot/loader.efi -L FreeBSD-14
umount /mnt

# Create mountpoint for zroot and zroot_new
mkdir /tmp/zroot
mkdir /tmp/zroot_new

# Create the new ZFS pool on the new disk (zroot_new)
zpool create -o altroot=/tmp/zroot_new -O compress=lz4 -O atime=off -m none -f zroot_new da1p4

# Import the original zroot
zpool import -R /tmp/zroot zroot

# Create a snapshot and send it to the zroot_new on the other disk.
zfs snapshot -r zroot@migration
zfs send -vR zroot@migration | zfs receive -Fdu zroot_new

# Export zroot and zroot_new and import again zroot_new under the new name (rename the zpool_new to zroot)
zpool export zroot
zpool export zroot_new
zpool import -R /tmp/zroot zroot_new zroot

# Set the default boot
zpool set bootfs=zroot/ROOT/default zroot

# cleanup the snapshot created for migration
zfs list -t snapshot -H -o name | grep migration | xargs -n1 zfs destroy

# export the pool
zpool export zroot

# Shut down and remove the old disk
shutdown -p now
# After the reboot select FreeBSD-14 from the UEFI and if everything is ok clean up the old UEFI record using efibootmgr
 
In the past I moved partitions because of alignment nags. They never got any faster. I guess don't really understand all the bits involved. Maybe it isn't an exact science.
Back in the day when disks' geometry mattered and LBA wasn't a thing, alignment was a thing. Today it doesn't matter. I ignore the nags.
 
Back in the day when disks' geometry mattered and LBA wasn't a thing, alignment was a thing. Today it doesn't matter. I ignore the nags.

On flash drives it really doesn't matter - the ancient concept of blocks and sectors doesn't apply any more. those drives just pretend like they are structured like magnetic drums from 70 years ago, but their actual IO patterns are completely different and managed by their firmware - so they absolutely don't care if your IO is 512 bytes of a fictional mapping earlier or later because your 512 or even 4k chunks are comically small for them anyways.

Roughly the same applies for VMs sitting on non-raw disk images that have their own internal data structure and possibly even some compression going on...
 
On flash drives it really doesn't matter - the ancient concept of blocks and sectors doesn't apply any more. those drives just pretend like they are structured like magnetic drums from 70 years ago, but their actual IO patterns are completely different and managed by their firmware - so they absolutely don't care if your IO is 512 bytes of a fictional mapping earlier or later because your 512 or even 4k chunks are comically small for them anyways.

Nice that you restated what I just said. Thank you for reinforcing that.

Roughly the same applies for VMs sitting on non-raw disk images that have their own internal data structure and possibly even some compression going on...
Ditto.
 
Back
Top