Design of a huge future storage with FreeBSD and ZFS
This design is to improve it with your help and recommendations...
Thanks in advance for your time and help!
I will update this content as design improvements are made.
---------------------------------------------------------------------------------------------------
Section Hardware:
Dell PowerEdge R440
2x Intel® Xeon® Silver 4112 2.6G
6x 16GB RDIMM, 2666MT/s RDIMMs
QLogic FastLinQ 41112 Dual Por t 10GbE SFP+ Adapter, PCIe Low Profile
Dell PERC H330 RAID Controller
2x 240GB SSD SATA Mixed Use 6Gbps 512e 2.5in Hot plug (Hardware RAID1 -> da0)
Dell SAS 12Gbps Host Bus Adapter External Controller (more info 1, 2, 3, 4, 5, 6)
Dell PowerVault ME484 JBOD, HBA
42x (NL-SAS, 3.5-inch, 7.2K, 10TB)
-> da1, da2, da3, ... da42
---------------------------------------------------------------------------------------------------
Section FreeBSD:
PowerEdge R440 boot from USB
F11 = Boot Manager
One-shot BIOS Boot Menu
[Hard drive] Disk connected to front USB 2: DataTraveler 2.0
Root-on-ZFS Automatic Partitioning
Binary package
3. Escape to loader prompt
Load the appropriate driver for 'Dell PERC H330 RAID Controller' for installation
Install defaults
Manual Configuration
< Yes >
Have driver loaded when starting
View the partitions of da0
Because the configuration for the 10Gb network cards have some problems for the driver, I leave here the parameters
Update:
With 96GB of SWAP, FreeBSD report this error at boot
'WARNING: reducing swap size to maximum of 65536MB per unit'
The final size of SWAP is 64GB
FreeBSD 12.1 come with 'vfs.zfs.min_auto_ashift=12' default
Check the value
-------------------------------------------------
Only required for FreeBSD <= 12.0
Check the value
(recommendation from gkontos)
Put the value of 'vfs.zfs.min_auto_ashift=12' before creating the pool.
Change the value from 9 to 12
Make the change permanent
-------------------------------------------------
---------------------------------------------------------------------------------------------------
Section Disks distribution:
See what disks FreeBSD detects:
Strangely FreeBSD reports until da84 (the maximum capacity of the ME484), perhaps it is that the ME484 tells the HBA that empty slots are disks (this requires more investigation).
Dell PERC H330 RAID Controller
Dell SAS 12Gbps Host Bus Adapter External Controller
Operating System (Hardware RAID1)
da0 -> FreeBSD 12.1 amd64
Pool storage.
da1, da2, ... da21 -> vdev (first vdev)
da22, da23, ... da42 -> vdev (second vdev)
Pool storage diagram.
42x HDD 10TB, 2 [BGCOLOR=rgb(251, 160, 38)]striped[/BGCOLOR] 21x [BGCOLOR=rgb(250, 197, 28)]raidz3[/BGCOLOR] (raid7), ~ 289TB
{ da1, da2, ... da21 } { da22, da23, ... da42 }
..............|....................................|
..........[BGCOLOR=rgb(251, 160, 38)]vdev[/BGCOLOR] [BGCOLOR=rgb(247, 218, 100)](raidz3)[/BGCOLOR].................[BGCOLOR=rgb(251, 160, 38)]vdev[/BGCOLOR] [BGCOLOR=rgb(247, 218, 100)](raidz3)[/BGCOLOR]
..............|....................................|
-------------------------------------------------------------
| ZFS Pool 289TB approx.
-------------------------------------------------------------
Setting up disks with FreeBSD:
Delete a previous GPT partition scheme.
Later versions of gpart(8) have a -F (force) option for destroy that makes things quicker.
Create GPT partition scheme.
Proper sector alignment on 4K sector drives or SSDs.
View the partitions
View the partitions labels
Create the pool with the two VDEVs raidz3 (raid7) called 'storage'.
Using label name (the most recommended).
See the pool status.
See the pool list
See the pool mounted.
Creation of ZFS datasets (file systems).
Bonnie++ benchmarks
Bonnie++ benchmark resume
w=655MB/s, rw=398MB/s, r=623MB/s
---------------------------------------------------------------------------------------------------
Section Pool expansion:
Useful as an example in a future expansion, since the ME484 supports up to 84 HDD per enclosure and the addition of 3 more for a maximum of 4 enclosures (84 HDD x4 enclosures), adding more VDEVs.
Delete a previous GPT partition scheme.
Later versions of gpart(8) have a -F (force) option for destroy that makes things quicker.
Create GPT partition scheme.
Proper sector alignment on 4K sector drives or SSDs.
Adding the VDEVs partitions raidz3 (raid7) to existing the pool called ‘storage’.
Using label name (the most recommended).
See the pool mounted and the new size.
---------------------------------------------------------------------------------------------------
Section Tips:
Others useful commands examples.
Export a pool that is not in use.
Import a pool that be use.
A built-in monitoring system can display pool I/O statistics in real time.
Press Ctrl+C to stop this continuous monitoring.
# zpool iostat [-v] [pool] ... [interval [count]]
ZFS features. To create a dataset on this pool with compression and others useful commands examples.
To create a dataset on this pool with compression enabled.
Compression can be disabled with.
To unmount a file system.
To re-mount the file system.
Status can be viewed with.
The name of a dataset can be changed with.
---------------------------------------------------------------------------------------------------
Section Upgrading FreeBSD:
Useful as an example in a future FreeBSD upgrade for example from 12.1-RELEASE to 12.2-RELEASE.
Upgrade the FreeBSD.
Check the upgrade of FreeBSD.
Upgrade the pool called 'zroot' (Root-on-ZFS), and as is the boot, also update the boot code.
Upgrade the pool called 'storage'.
Check the upgrade of all pools.
---------------------------------------------------------------------------------------------------
Section Documentation:
---------------------------------------------------------------------------------------------------
This design is to improve it with your help and recommendations...
Thanks in advance for your time and help!
I will update this content as design improvements are made.
---------------------------------------------------------------------------------------------------
Section Hardware:
Dell PowerEdge R440
2x Intel® Xeon® Silver 4112 2.6G
6x 16GB RDIMM, 2666MT/s RDIMMs
QLogic FastLinQ 41112 Dual Por t 10GbE SFP+ Adapter, PCIe Low Profile
Dell PERC H330 RAID Controller
2x 240GB SSD SATA Mixed Use 6Gbps 512e 2.5in Hot plug (Hardware RAID1 -> da0)
Dell SAS 12Gbps Host Bus Adapter External Controller (more info 1, 2, 3, 4, 5, 6)
Dell PowerVault ME484 JBOD, HBA
42x (NL-SAS, 3.5-inch, 7.2K, 10TB)
-> da1, da2, da3, ... da42
---------------------------------------------------------------------------------------------------
Section FreeBSD:
PowerEdge R440 boot from USB
F11 = Boot Manager
One-shot BIOS Boot Menu
[Hard drive] Disk connected to front USB 2: DataTraveler 2.0
Root-on-ZFS Automatic Partitioning
Binary package
3. Escape to loader prompt
Load the appropriate driver for 'Dell PERC H330 RAID Controller' for installation
OK set hw.mfi.mrsas_enable="1"
OK boot
Install defaults
Manual Configuration
< Yes >
Have driver loaded when starting
# echo 'hw.mfi.mrsas_enable="1"' >> /boot/device.hints
# echo 'mrsas_load="YES"' >> /boot/loader.conf
# echo 'if_qlnxe_load="YES"' >> /boot/loader.conf
# shutdown -r now
# freebsd-version
Code:
12.1-RELEASE
# freebsd-update fetch
# freebsd-update install
# shutdown -r now
# freebsd-version
Code:
12.1-RELEASE-p1
View the partitions of da0
# gpart show da0
Code:
=> 40 467664816 da0 GPT (223G)
40 1024 1 freebsd-boot (512K)
1064 984 - free - (492K)
2048 134217728 2 freebsd-swap (64G)
134219776 333443072 3 freebsd-zfs (159G)
467662848 2008 - free - (1.0M)
Because the configuration for the 10Gb network cards have some problems for the driver, I leave here the parameters
# cat /etc/rc.conf
Code:
# QLogic FastLinQ 41112 Dual Por t 10GbE SFP+ Adapter, PCIe Low Profile
# 31.7.2. Failover Mode
#
ifconfig_ql0="up"
ifconfig_ql1="up"
cloned_interfaces="lagg0"
#
# IPv4
ifconfig_lagg0="laggproto failover laggport ql0 laggport ql1 172.16.3.31/16"
defaultrouter="172.16.1.1"
#
# IPv6
ifconfig_lagg0_ipv6="inet6 2001:470:1f2b:be::31/64"
ipv6_defaultrouter="2001:0470:1f2b:be::1"
# ifconfig lagg0
Code:
lagg0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=507bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWFILTER,VLAN_HWTSO>
ether 34:80:0d:5d:d4:a6
inet 172.16.3.31 netmask 0xffff0000 broadcast 172.16.255.255
inet6 fe80::3680:dff:fe5d:d4a6%lagg0 prefixlen 64 scopeid 0x6
inet6 2001:470:1f2b:be::31 prefixlen 64
laggproto failover lagghash l2,l3,l4
laggport: ql0 flags=5<MASTER,ACTIVE>
laggport: ql1 flags=0<>
groups: lagg
media: Ethernet autoselect
status: active
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
Update:
With 96GB of SWAP, FreeBSD report this error at boot
'WARNING: reducing swap size to maximum of 65536MB per unit'
The final size of SWAP is 64GB
FreeBSD 12.1 come with 'vfs.zfs.min_auto_ashift=12' default
Check the value
# sysctl vfs.zfs.min_auto_ashift
Code:
vfs.zfs.min_auto_ashift: 12
-------------------------------------------------
Only required for FreeBSD <= 12.0
Check the value
# sysctl vfs.zfs.min_auto_ashift
Code:
vfs.zfs.min_auto_ashift: 9
(recommendation from gkontos)
Put the value of 'vfs.zfs.min_auto_ashift=12' before creating the pool.
Change the value from 9 to 12
# sysctl vfs.zfs.min_auto_ashift=12
Code:
vfs.zfs.min_auto_ashift: 9 -> 12
Make the change permanent
# echo 'vfs.zfs.min_auto_ashift="12"' >> /etc/sysctl.conf
-------------------------------------------------
---------------------------------------------------------------------------------------------------
Section Disks distribution:
See what disks FreeBSD detects:
# egrep 'da[0-9]|cd[0-9]' /var/run/dmesg.boot
Code:
...
da0: <DELL PERC H330 Adp 4.30> Fixed Direct Access SPC-3 SCSI device
...
da1: <SEAGATE ST10000NM0256 TT55> Fixed Direct Access SPC-4 SCSI device
...
da2: <SEAGATE ST10000NM0256 TT55> Fixed Direct Access SPC-4 SCSI device
...
da42: <SEAGATE ST10000NM0256 TT55> Fixed Direct Access SPC-4 SCSI device
...
Strangely FreeBSD reports until da84 (the maximum capacity of the ME484), perhaps it is that the ME484 tells the HBA that empty slots are disks (this requires more investigation).
Dell PERC H330 RAID Controller
# cat /var/run/dmesg.boot | grep PERC
Code:
da0: <DELL PERC H330 Adp 4.30> Fixed Direct Access SPC-3 SCSI device
Dell SAS 12Gbps Host Bus Adapter External Controller
# cat /var/run/dmesg.boot | grep LSI
Code:
mpr0: <Avago Technologies (LSI) SAS3008> port 0xc000-0xc0ff mem 0xe1100000-0xe110ffff,0xe1000000-0xe10fffff irq 88 at device 0.0 numa-domain 1 on pci12
mpr0: Firmware: 16.00.08.00, Driver: 23.00.00.00-fbsd
Operating System (Hardware RAID1)
da0 -> FreeBSD 12.1 amd64
Pool storage.
da1, da2, ... da21 -> vdev (first vdev)
da22, da23, ... da42 -> vdev (second vdev)
Pool storage diagram.
42x HDD 10TB, 2 [BGCOLOR=rgb(251, 160, 38)]striped[/BGCOLOR] 21x [BGCOLOR=rgb(250, 197, 28)]raidz3[/BGCOLOR] (raid7), ~ 289TB
{ da1, da2, ... da21 } { da22, da23, ... da42 }
..............|....................................|
..........[BGCOLOR=rgb(251, 160, 38)]vdev[/BGCOLOR] [BGCOLOR=rgb(247, 218, 100)](raidz3)[/BGCOLOR].................[BGCOLOR=rgb(251, 160, 38)]vdev[/BGCOLOR] [BGCOLOR=rgb(247, 218, 100)](raidz3)[/BGCOLOR]
..............|....................................|
-------------------------------------------------------------
| ZFS Pool 289TB approx.
-------------------------------------------------------------
Setting up disks with FreeBSD:
Delete a previous GPT partition scheme.
Later versions of gpart(8) have a -F (force) option for destroy that makes things quicker.
# gpart destroy -F da1
# gpart destroy -F da2
...
# gpart destroy -F da42
Create GPT partition scheme.
# gpart create -s GPT da1
# gpart create -s GPT da2
...
# gpart create -s GPT da42
Proper sector alignment on 4K sector drives or SSDs.
# gpart add -t freebsd-zfs -b 1M -l da1 da1
# gpart add -t freebsd-zfs -b 1M -l da2 da2
...
# gpart add -t freebsd-zfs -b 1M -l da42 da42
View the partitions
# gpart show
Code:
gpart show
=> 40 467664816 da0 GPT (223G)
40 1024 1 freebsd-boot (512K)
1064 984 - free - (492K)
2048 134217728 2 freebsd-swap (64G)
134219776 333443072 3 freebsd-zfs (159G)
467662848 2008 - free - (1.0M)
=> 40 19134414768 da1 GPT (8.9T)
40 2008 - free - (1.0M)
2048 19134412760 1 freebsd-zfs (8.9T)
=> 40 19134414768 da2 GPT (8.9T)
40 2008 - free - (1.0M)
2048 19134412760 1 freebsd-zfs (8.9T)
...
=> 40 19134414768 da42 GPT (8.9T)
40 2008 - free - (1.0M)
2048 19134412760 1 freebsd-zfs (8.9T)
View the partitions labels
gpart show -l
Code:
=> 40 467664816 da0 GPT (223G)
40 1024 1 gptboot0 (512K)
1064 984 - free - (492K)
2048 134217728 2 swap0 (64G)
134219776 333443072 3 zfs0 (159G)
467662848 2008 - free - (1.0M)
=> 40 19134414768 da1 GPT (8.9T)
40 2008 - free - (1.0M)
2048 19134412760 1 da1 (8.9T)
=> 40 19134414768 da2 GPT (8.9T)
40 2008 - free - (1.0M)
2048 19134412760 1 da2 (8.9T)
...
=> 40 19134414768 da42 GPT (8.9T)
40 2008 - free - (1.0M)
2048 19134412760 1 da42 (8.9T)
Create the pool with the two VDEVs raidz3 (raid7) called 'storage'.
Using label name (the most recommended).
# zpool create storage \
raidz3 \
gpt/da1 gpt/da2 gpt/da3 gpt/da4 gpt/da5 gpt/da6 gpt/da7 \
gpt/da8 gpt/da9 gpt/da10 gpt/da11 gpt/da12 gpt/da13 gpt/da14 \
gpt/da15 gpt/da16 gpt/da17 gpt/da18 gpt/da19 gpt/da20 gpt/da21 \
raidz3 \
gpt/da22 gpt/da23 gpt/da24 gpt/da25 gpt/da26 gpt/da27 gpt/da28 \
gpt/da29 gpt/da30 gpt/da31 gpt/da32 gpt/da33 gpt/da34 gpt/da35 \
gpt/da36 gpt/da37 gpt/da38 gpt/da39 gpt/da40 gpt/da41 gpt/da42
See the pool status.
# zpool status storage
Code:
pool: storage
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
storage ONLINE 0 0 0
raidz3-0 ONLINE 0 0 0
gpt/da1 ONLINE 0 0 0
gpt/da2 ONLINE 0 0 0
gpt/da3 ONLINE 0 0 0
gpt/da4 ONLINE 0 0 0
gpt/da5 ONLINE 0 0 0
gpt/da6 ONLINE 0 0 0
gpt/da7 ONLINE 0 0 0
gpt/da8 ONLINE 0 0 0
gpt/da9 ONLINE 0 0 0
gpt/da10 ONLINE 0 0 0
gpt/da11 ONLINE 0 0 0
gpt/da12 ONLINE 0 0 0
gpt/da13 ONLINE 0 0 0
gpt/da14 ONLINE 0 0 0
gpt/da15 ONLINE 0 0 0
gpt/da16 ONLINE 0 0 0
gpt/da17 ONLINE 0 0 0
gpt/da18 ONLINE 0 0 0
gpt/da19 ONLINE 0 0 0
gpt/da20 ONLINE 0 0 0
gpt/da21 ONLINE 0 0 0
raidz3-1 ONLINE 0 0 0
gpt/da22 ONLINE 0 0 0
gpt/da23 ONLINE 0 0 0
gpt/da24 ONLINE 0 0 0
gpt/da25 ONLINE 0 0 0
gpt/da26 ONLINE 0 0 0
gpt/da27 ONLINE 0 0 0
gpt/da28 ONLINE 0 0 0
gpt/da29 ONLINE 0 0 0
gpt/da30 ONLINE 0 0 0
gpt/da31 ONLINE 0 0 0
gpt/da32 ONLINE 0 0 0
gpt/da33 ONLINE 0 0 0
gpt/da34 ONLINE 0 0 0
gpt/da35 ONLINE 0 0 0
gpt/da36 ONLINE 0 0 0
gpt/da37 ONLINE 0 0 0
gpt/da38 ONLINE 0 0 0
gpt/da39 ONLINE 0 0 0
gpt/da40 ONLINE 0 0 0
gpt/da41 ONLINE 0 0 0
gpt/da42 ONLINE 0 0 0
errors: No known data errors
See the pool list
zpool list storage
Code:
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
storage 374T 3.88M 374T - - 0% 0% 1.00x ONLINE -
See the pool mounted.
# df -h | egrep 'Filesystem|storage '
Code:
Filesystem Size Used Avail Capacity Mounted on
storage 289T 282K 289T 0% /storage
Creation of ZFS datasets (file systems).
# zfs create storage/datasetname
Bonnie++ benchmarks
# bonnie++ -u root -r 1024 -s 98304 -d /storage -f -b -n 1 -c 4
Code:
Version 1.98 ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Name:Size etc /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
FreeBSD 96G::4 655m 81 398m 90 623m 99 726.4 28
Bonnie++ benchmark resume
w=655MB/s, rw=398MB/s, r=623MB/s
---------------------------------------------------------------------------------------------------
Section Pool expansion:
Useful as an example in a future expansion, since the ME484 supports up to 84 HDD per enclosure and the addition of 3 more for a maximum of 4 enclosures (84 HDD x4 enclosures), adding more VDEVs.
Delete a previous GPT partition scheme.
Later versions of gpart(8) have a -F (force) option for destroy that makes things quicker.
# gpart destroy -F da43
# gpart destroy -F da44
...
# gpart destroy -F da63
Create GPT partition scheme.
# gpart create -s GPT da43
# gpart create -s GPT da44
...
# gpart create -s GPT da63
Proper sector alignment on 4K sector drives or SSDs.
# gpart add -t freebsd-zfs -b 1M -l da43 da43
# gpart add -t freebsd-zfs -b 1M -l da44 da44
...
# gpart add -t freebsd-zfs -b 1M -l da63 da63
Adding the VDEVs partitions raidz3 (raid7) to existing the pool called ‘storage’.
Using label name (the most recommended).
# zpool add storage \
raidz3 \
gpt/da43 gpt/da44 gpt/da45 gpt/da46 gpt/da47 gpt/da48 gpt/da49 \
gpt/da50 gpt/da51 gpt/da52 gpt/da53 gpt/da54 gpt/da55 gpt/da56 \
gpt/da57 gpt/da58 gpt/da59 gpt/da60 gpt/da61 gpt/da62 gpt/da63
See the pool mounted and the new size.
# df -h | egrep 'Filesystem|storage '
Code:
Filesystem Size Used Avail Capacity Mounted on
storage 435T 281K 435T 0% /storage
---------------------------------------------------------------------------------------------------
Section Tips:
Others useful commands examples.
Export a pool that is not in use.
# zpool export storage
Import a pool that be use.
# zpool import storage
A built-in monitoring system can display pool I/O statistics in real time.
Press Ctrl+C to stop this continuous monitoring.
# zpool iostat [-v] [pool] ... [interval [count]]
# zpool iostat -v storage 5 100
ZFS features. To create a dataset on this pool with compression and others useful commands examples.
To create a dataset on this pool with compression enabled.
# zfs create storage/compressed
# zfs set compression=lz4 storage/compressed
Compression can be disabled with.
# zfs set compression=off storage/compressed
To unmount a file system.
# zfs umount storage/compressed
To re-mount the file system.
# zfs mount storage/compressed
Status can be viewed with.
# zfs list storage
The name of a dataset can be changed with.
# zfs rename storage/oldname storage/newname
---------------------------------------------------------------------------------------------------
Section Upgrading FreeBSD:
Useful as an example in a future FreeBSD upgrade for example from 12.1-RELEASE to 12.2-RELEASE.
Upgrade the FreeBSD.
# freebsd-update fetch
# freebsd-update install
# freebsd-update upgrade -r 12.2-RELEASE
# freebsd-update install
# shutdown -r now
# freebsd-update install
# pkg-static upgrade -f
# freebsd-update install
# shutdown -r now
Check the upgrade of FreeBSD.
# freebsd-version
Upgrade the pool called 'zroot' (Root-on-ZFS), and as is the boot, also update the boot code.
# zpool upgrade zroot
# gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 da0
Upgrade the pool called 'storage'.
# zpool upgrade storage
Check the upgrade of all pools.
# zpool upgrade
---------------------------------------------------------------------------------------------------
Section Documentation:
Chapter 22. The Z File System (ZFS)
ZFS is an advanced file system designed to solve major problems found in previous storage subsystem software
www.freebsd.org
QLogic FastLinQ 41112 Dual Por t 10GbE SFP+ Adapter
Hi, Our new storage server come with this network card: QLogic FastLinQ 41112 Dual Por t 10GbE SFP+ Adapter, PCIe Low Profile FreeBSD detect this hardware: # pciconf -lv none90@pci0:59:0:0: class=0x020000 card=0x00021077 chip=0x80701077 rev=0x02 hdr=0x00 vendor = 'QLogic Corp.'...
forums.freebsd.org
Last edited: