Hello folks,
yesterday I wrote a Twitter thread to give an example how to deploy VNET jails in a ZFS environment.
Here is it again in this forum.
A guide to deploy a VNET jail using a FreeBSD 13.0 server with ZFS and populated /usr/src. We start with preparing the file tree. I use /l/prison
(in zpool/prison) as starting point.
I assume /usr/src is in zpool/usr/src, /usr/ports in zpool/usr/ports. We need snapshots of /usr/src and /usr/ports:
We prepare a jail template according to the jail(8):
Now we make the zfs datasets for the jail. I put these in a simple shell script and use zfs clone quite often:
Now we prepare the host environment: I put local addons in /l/local:
If you want to use netgraph(4), look at the jng script. I go here with if_bridge(4) because of some issues with netgraph. A few little patches to jib:
Of course, if_bridge(4) must be linked in the kernel or available as kernel module.
Now the most important part: /etc/jail.conf. Here is a example for a pretty normal jail.
In this example I grant the jail access to two network interfaces on the host side. No preparation is necessary on the host side. The host doesn't even
need access to this networks.
Example rc.conf entry (host side):
You have to switch off some fancy hardwaresupport on certain ethernet controllers (as in my example) to get VNET work but in most cases you won't have problems. Using Mellanox Bluefield controllers there are some additional tasks to do but this is out of scope here.
Of course, the host can also set an IP address on mynet0/1 and use it for its own network connectivity.
As you see in my example I use FIBs here. You have to adjust the number of FIBs to at least the same number as you use in /boot/loader.conf and/or /etc/sysctl.conf. Since FreeBSD 13 you have to set net.fibs ALSO in jail's sysctl.conf /l/prison/myvnetjail/etc/sysctl.conf !!
In my example:
/boot/loader.conf:
[...]
net.fibs="16"
net.add_addr_allfibs="0"
[...]
/etc/sysctl.conf AND /l/prison/myvnetjail/etc/sysctl.conf:
[...]
net.fibs=16
net.add_addr_allfibs=0
[...]
As I use pf in the kernel I need access to pf from inside the jail:
Add stuff like that to devfs.rules. The entry number (here 7) must match /etc/jail.conf:
Now we have to customize the jail:
Having ipfw and pf with drop default in the kernel you need at least something like that:
Don't forget a "pass all" rule in jail's /etc/pf.conf.
Now something strange: Copy host's boot/kernel inside the jail:
Why that? All ports depending on perl with break out unable to configure the dtrace stuff without the kernel
Now add this to your host's /etc/rc.conf:
and type "service jail start" as root user. jls and friends won't show jail's IP address. This is correct for VNET jails
One last addition: Don't even think about using NFS, client and server side anywhere near VNET jails. The NFS guys never ported the NFS kernel code to VNET and this is a real pain in the ass because it makes some very important use cases in my professional environment difficult.
yesterday I wrote a Twitter thread to give an example how to deploy VNET jails in a ZFS environment.
Here is it again in this forum.
A guide to deploy a VNET jail using a FreeBSD 13.0 server with ZFS and populated /usr/src. We start with preparing the file tree. I use /l/prison
(in zpool/prison) as starting point.
I assume /usr/src is in zpool/usr/src, /usr/ports in zpool/usr/ports. We need snapshots of /usr/src and /usr/ports:
Code:
# zfs snapshot zpool/usr/src@jail-template
# zfs snapshot zpool/usr/ports@jail-template
We prepare a jail template according to the jail(8):
Code:
# zfs create zpool/prison/template
# cd /usr/src
# make world DESTDIR=/l/prison/template
# make distribution DESTDIR=/l/prison/template
# zfs snapshot zpool/prison/template@jail-template
Now we make the zfs datasets for the jail. I put these in a simple shell script and use zfs clone quite often:
Code:
:
target=myvnetjail
source="zpool/prison/template@jail-template"
t="zpool/prison/$target"
path="/l/prison/$target"
zfs clone -o exec=on -o setuid=on -o compression=off $source $t
cd $path || exit 1
tar cvf /tmp/$target.$$ var
chflags -R noschg var usr
rm -rf usr var
zfs create -o mountpoint="$path/var" -o exec=off -o setuid=off -o compression=off "${t}/var"
zfs create -o exec=off -o setuid=off -o compression=lz4 "${t}/var/mail"
zfs create -o exec=off -o setuid=off -o compression=lz4 "${t}/var/log"
zfs create -o exec=off -o setuid=off -o compression=off "${t}/var/run"
zfs create -o exec=off -o setuid=off -o compression=lz4 "${t}/var/tmp"
zfs create -o exec=off -o setuid=off -o compression=lz4 "${t}/var/db"
zfs create -o exec=off -o setuid=off -o compression=lz4 "${t}/var/db/pkg"
zfs create -o exec=off -o setuid=off -o compression=off "${t}/var/db/portsnap"
zfs create -o exec=off -o setuid=off -o compression=off -o readonly=on "${t}/var/empty"
zfs create -o exec=off -o setuid=off -o compression=off "${t}/var/local"
zfs create -o exec=off -o setuid=off -o compression=lz4 "${t}/var/spool"
zfs create -o exec=off -o setuid=off -o compression=lz4 "${t}/var/local/spool"
zfs create -o exec=off -o setuid=off -o compression=lz4 "${t}/var/local/log"
zfs create -o exec=on -o setuid=on -o compression=off -o mountpoint=${path}/l "${t}/localdisk"
zfs create -o exec=on -o setuid=on -o compression=off "${t}/localdisk/home"
zfs create -o exec=on -o setuid=on -o compression=off "${t}/localdisk/local"
zfs create -o exec=on -o setuid=off -o compression=lz4 "${t}/tmp"
cd $path
tar xvpf /tmp/$target.$$
rm /tmp/$target.$$
#
zfs clone -o exec=on -o setuid=on -o compression=off $source ${t}/usr
cd ${path}/usr
chflags -R noschg lib libexec sbin var
rm -r .cshrc .profile COPYRIGHT bin boot dev etc lib libexec media mnt net proc rescue root sbin tmp var
cd usr
mv * ..
cd ..
rmdir usr
#
zfs create -o exec=on -o setuid=on -o compression=off "${t}/usr/local"
zfs clone -o exec=on -o setuid=off -o compression=lz4 zpool/usr/ports@jail-template ${t}/usr/ports
zfs clone -o exec=on -o setuid=off -o compression=lz4 zpool/usr/src@jail-template ${t}/usr/src
Now we prepare the host environment: I put local addons in /l/local:
Code:
# cp -p /usr/src/share/examples/jails/jib /l/local/sbin
If you want to use netgraph(4), look at the jng script. I go here with if_bridge(4) because of some issues with netgraph. A few little patches to jib:
Code:
--- /usr/src/share/examples/jails/jib 2021-09-12 03:13:47.057333000 +0200
+++ /l/local/sbin/jib 2021-06-09 14:30:57.000000000 +0200
@@ -215,11 +215,25 @@
# the MAC address will be recalculated to a new, similarly
# unique value preventing conflict.
#
+ # ARNE ## OM ## START #
+ # That's wrong. The jails of good-hope and odin get the same
+ # MAC address!
+ # Why?
+ # arne@trajan:~-<10> echo -n gh | sum
+ # 32923 1
+ # arne@trajan:~-<11> echo -n od | sum
+ # 32923 1
+ # ARNE ## OM ## END #
+ #
+ #
__iface_devid=$( ifconfig $__iface ether | awk '/ether/,$0=$2' )
# ??:??:??:II:II:II
__new_devid=${__iface_devid#??:??:??} # => :II:II:II
# => :SS:SS:II:II:II
- __num=$( set -- `echo -n "$__name" | sum` && echo $1 )
+ # ARNE ## OM ## START #
+ # __num=$( set -- `echo -n "$__name" | sum` && echo $1 )
+ __num=$( set -- `echo -n "$__name" | cksum` && echo $1 )
+ # ARNE ## OM ## END #
__new_devid=$( printf :%02x:%02x \
$(( $__num >> 8 & 255 )) $(( $__num & 255 )) )$__new_devid
# => P:SS:SS:II:II:II
@@ -307,6 +321,10 @@
# Create a new interface to the bridge
new=$( ifconfig epair create ) || return
+ # ARNE # OM # START #
+ mtu=$( ifconfig $iface$bridge | head -1 | sed -e 's/^.*mtu //') || return
+ ifconfig $new mtu $mtu || return
+ # ARNE # OM # END #
ifconfig "$iface$bridge" addm $new || return
# Rename the new interface
Of course, if_bridge(4) must be linked in the kernel or available as kernel module.
Code:
# chmod 755 /l/local/sbin/jib
Now the most important part: /etc/jail.conf. Here is a example for a pretty normal jail.
Code:
# ATTENTION in case you have firewall code inside the kernel (drop as default)
# Increasing the secure level here means that inside the Jail
# the firewall configurations can no longer be made (== opened)!
# securelevel 3 only works with Netgraph or in traditional (non-vnet) jails
# according to this
myjail {
host.hostname = "myjail.example.com";
path = "/l/prison/myvnetjail";
devfs_ruleset = "7";
securelevel = 0;
vnet = "new";
vnet.interface = e0b_myjail, e1b_myjail;
exec.fib = "8";
exec.system_user = "root";
exec.jail_user = "root";
exec.consolelog = "/var/local/log/jails/myjail_console.log";
exec.clean;
exec.prestart += "/l/local/sbin/jib addm myjail mynet0 mynet1";
exec.start = "/bin/sh /etc/rc";
exec.stop = "/bin/sh /etc/rc.shutdown";
exec.poststop += "/l/local/sbin/jib destroy myjail";
mount.devfs;
enforce_statfs = "1";
persist;
}
Code:
# mkdir /var/local/log/jails
# touch /var/local/log/jails/myjail_console.log
# chmod 600 /var/local/log/jails/myjail_console.log
In this example I grant the jail access to two network interfaces on the host side. No preparation is necessary on the host side. The host doesn't even
need access to this networks.
Example rc.conf entry (host side):
Code:
ifconfig_mce0_name="mynet0"
ifconfig_mce1_name="mynet1"
ifconfig_mynet0="-lro -tso4 -tso6 -vlanhwtso mtu 9000 up"
ifconfig_mynet1="-lro -tso4 -tso6 -vlanhwtso mtu 9000 up"
You have to switch off some fancy hardwaresupport on certain ethernet controllers (as in my example) to get VNET work but in most cases you won't have problems. Using Mellanox Bluefield controllers there are some additional tasks to do but this is out of scope here.
Of course, the host can also set an IP address on mynet0/1 and use it for its own network connectivity.
As you see in my example I use FIBs here. You have to adjust the number of FIBs to at least the same number as you use in /boot/loader.conf and/or /etc/sysctl.conf. Since FreeBSD 13 you have to set net.fibs ALSO in jail's sysctl.conf /l/prison/myvnetjail/etc/sysctl.conf !!
In my example:
/boot/loader.conf:
[...]
net.fibs="16"
net.add_addr_allfibs="0"
[...]
/etc/sysctl.conf AND /l/prison/myvnetjail/etc/sysctl.conf:
[...]
net.fibs=16
net.add_addr_allfibs=0
[...]
As I use pf in the kernel I need access to pf from inside the jail:
Code:
# cp -p /etc/defaults/devfs.rules /etc/devfs.rules
Code:
# Devices usually found in a jail.
#
[devfsrules_jail=4]
add include $devfsrules_hide_all
add include $devfsrules_unhide_basic
add include $devfsrules_unhide_login
add path fuse unhide
add path zfs unhide
# Jail mit Berkeley Paket Filter (DHCP Client und Server)
#
[devfsrules_jail_bpf=5]
add include $devfsrules_jail
add path 'bpf*' unhide
# Jail mit Berkeley Paket Filter (DHCP Client und Server) und pf
#
[devfsrules_jail_bpf_pf=6]
add include $devfsrules_jail_bpf
add path pf unhide
add path pflog unhide
add path pfsync unhide
# Jail mit pf
#
[devfsrules_jail_pf=7]
add include $devfsrules_jail
add path pf unhide
add path pflog unhide
add path pfsync unhide
Code:
# chroot /l/prison/myvnetjail /bin/sh
# cd etc
# vipw
# vi /etc/group /etc/resolv.conf /etc/sysctl.conf /etc/make.conf # .... as you like
# cp -p /usr/share/zoneinfo/Europe/Berlin /etc/localtime # Whatever you need
# cap_mkdb /etc/login.conf
# cap_mkdb -f /usr/share/misc/termcap /etc/termcap
# vi /etc/rc.conf
Having ipfw and pf with drop default in the kernel you need at least something like that:
Code:
hostname="myjail.example.com"
#
ifconfig_e0b_myjail="inet 192.0.2.2 netmask 255.255.255.0 mtu 9000"
ifconfig_e1b_myjail="inet 198.51.100.2 netmask 255.255.255.0 mtu 9000"
#
# We need this because we use net.add_addr_allfibs="0" !
static_routes="lo0ifroute e0bifroute e1bifroute"
route_lo0ifroute="-host 127.0.0.1 -iface lo0"
route_e0bifroute="-net 192.0.2.0/24 -iface e0b_myjail"
route_e1bifroute="-net 198.51.100.0/24 -iface e1b_myjail"
#
ipv6_activate_all_interfaces="NO"
defaultrouter="198.51.100.1"
gateway_enable="NO"
#
firewall_enable="YES" # Set to YES to enable firewall functionality
firewall_script="/etc/rc.firewall" # Which script to run to set up the firewall
firewall_type="OPEN" # Firewall type (see /etc/rc.firewall)
#
pf_enable="YES"
#
sshd_enable="YES"
#
Don't forget a "pass all" rule in jail's /etc/pf.conf.
Now something strange: Copy host's boot/kernel inside the jail:
Code:
# cp -pr /boot/kernel /l/prison/myvnetjail/boot
Why that? All ports depending on perl with break out unable to configure the dtrace stuff without the kernel
Now add this to your host's /etc/rc.conf:
Code:
jail_enable="YES"
jail_confwarn="YES"
jail_parallel_start="NO"
jail_list="myjail"
jail_reverse_stop="YES"
and type "service jail start" as root user. jls and friends won't show jail's IP address. This is correct for VNET jails
One last addition: Don't even think about using NFS, client and server side anywhere near VNET jails. The NFS guys never ported the NFS kernel code to VNET and this is a real pain in the ass because it makes some very important use cases in my professional environment difficult.