jails Passthrough SR-IOV VF Network Interface to Jail: Which jail managers can do the job?

cb000 · Nov 18, 2024

I am currently experimenting with jails. My testing machine has a Mellanox ConnectX-4 NIC. Following the Nvidia procedure and forum posts, I upgraded the firmware and successfully enabled and created VFs. Now, I have several mceX interfaces (X=0, 1 are the PFs, and 2 – 3 are the VFs).

Code:

root@freebsd0:~ # cat /etc/iovctl.conf
PF {
        device : "mlx5_core0";
        num_vfs : 2,
}

DEFAULT {
        passthrough : false;
}

VF-0 {
        mac-addr : "aa:88:44:00:02:01";
}

VF-1 {
        mac-addr : "aa:88:44:00:02:02";
}

Following the handbook, I created a native thick jail with the mce2 interface, and it worked as expected. The jail does not have an IP by default, and the VF is controlled by the jail, so it does not appear on the host anymore.

Code:

root@freebsd0:~ # cat /etc/jail.conf
exec.start += "/bin/sh /etc/rc";
exec.stop = "/bin/sh /etc/rc.shutdown";
exec.clean;
mount.devfs;

classic {

        # STARTUP/LOGGING
        exec.consolelog = "/var/log/jail_console_${name}.log";

        host.hostname = "${name}";
        path = "/usr/local/jails/containers/${name}";
        vnet;
        #vnet.interface = "mce2.160";
        vnet.interface = "mce2";
        devfs_ruleset="7";
        allow.raw_sockets;
}

root@freebsd0:~ # jexec -u root classic

root@classic:/ # cat /etc/rc.conf
ifconfig_mce2="mtu 9000 UP"
vlans_mce2="160"
ifconfig_mce2_160="SYNCDHCP"

root@classic:/ # ifconfig
lo0: flags=1008049<UP,LOOPBACK,RUNNING,MULTICAST,LOWER_UP> metric 0 mtu 16384
        options=680003<RXCSUM,TXCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
        inet 127.0.0.1 netmask 0xff000000
        inet6 ::1 prefixlen 128
        inet6 fe80::1%lo0 prefixlen 64 scopeid 0xa
        groups: lo
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
mce2: flags=1008843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,LOWER_UP> metric 0 mtu 9000
        options=7eef07bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWFILTER,NV,VLAN_HWTSO,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6,HWSTATS,HWRXTSTMP,MEXTPG,TXTLS4,TXTLS6,VXLAN_HWCSUM,VXLAN_HWTSO,RXTLS4,RXTLS6>
        ether aa:88:44:00:02:01
        media: Ethernet 10GBase-CR1 <full-duplex,rxpause,txpause>
        status: active
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
mce2.160: flags=1008843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,LOWER_UP> metric 0 mtu 9000
        options=1c680703<RXCSUM,TXCSUM,TSO4,TSO6,LRO,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6,MEXTPG,TXTLS4,TXTLS6>
        ether aa:88:44:00:02:01
        inet xxx.xxx.xxx.xxx netmask 0xffffff00 broadcast xxx.xxx.xxx.255
        groups: vlan
        vlan: 123 vlanproto: 802.1q vlanpcp: 0 parent interface: mce2
        media: Ethernet 10GBase-CR1 <full-duplex,rxpause,txpause>
        status: active
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>

However, since I need to type multiple commands to create a jail, I am looking for a suitable jail manager that can do the same job using a formatted command or config file. Unfortunately, I haven’t found a way to do this with BastilleBSD or CBSD.

For BastilleBSD, I used the following command to create a jail. There is no option to passthrough the NIC to the jail, and I must set an IP or use DHCP to create the jail.

Code:

root@freebsd0:~ # bastille create -T bjail 14.1-RELEASE DHCP mce3

For CBSD, I tried the jconstruct-tui method to create a jail. It allows me not to set an IP, but it seems that I must create a vnet epair bridge in Networking.

Any suggestions are welcome.

cb000 · Nov 19, 2024

Today I tried Appjail. It can create a thin jail using appjail quick test vnet=mce3 start, but I don’t know how to create a thick jail with only the passthrough interface visible and the bpf unhidden for DHCP.

sko · Nov 19, 2024

I added support for vnet interfaces to sysutils/iocell, but just like several other PRs they have been sitting for over half a year now.

I have already created a (local at my buildhosts) port 'iocell-devel' which uses my repo as upstream and has several other fixes/improvements/additions. I wanted to try getting it added to the ports tree - maybe under another name, since it basically is a fork by now - but haven't found the time yet... (iocell by design doesn't use the (modifyable) default values if an option is unset for a jail - I still want to fix this behavior before I present it as a 'new version'/fork/whatever...)

One just adds the interface(s) to the vnet_interfaces jail property via iocell set vnet_interfaces="mce2.8,mce3.5" jailname and those are simply handed over as 'vnet.interface' parameter to the jail command.

Ole · Nov 20, 2024

cb000 said:
For CBSD, I tried the jconstruct-tui method to create a jail. It allows me not to set an IP, but it seems that I must create a vnet epair bridge in Networking.

If you want to assign an interface, then in CBSD it looks like this:

Code:

ifconfig igb0.160 create
ifconfig igb0.160 up
cbsd jcreate jname=test vnet=1 interface=igb0.160 allow_raw_sockets=1 ip4_addr=REALDHCP devfs_ruleset=5

or via CBSDfile:

Code:

jail_test()
{
   vnet=1
   interface="igb0.160"
   allow_raw_sockets=1
   ip4_addr=REALDHCP
   devfs_ruleset=5
}

then: `cbsd up`.

As for TUI, you need to go in 'jailnic` options from `cbsd jconfig` -> 'nic1' -> 'nic_parent'

cb000 · Nov 20, 2024

Ole said:
If you want to assign an interface, then in CBSD it looks like this:

Code:

ifconfig igb0.160 create ifconfig igb0.160 up cbsd jcreate jname=test vnet=1 interface=igb0.160 allow_raw_sockets=1 ip4_addr=REALDHCP devfs_ruleset=5

Thank you for your reply, Ole.

I call cbsd jcreate jname=test vnet=1 interface=mce3 allow_raw_sockets=1 devfs_ruleset=7
but it is not what I want. I see it creates an epair bridge. In the jail, I see eth0 instead of mce3. What I want is to avoid the usage of epair, and just passthrough the VF interface to the jail to reduce the CPU resource on the virtual network.

DtxdF · Nov 20, 2024

cb000 said:
Today I tried Appjail. It can create a thin jail using appjail quick test vnet=mce3 start, but I don’t know how to create a thick jail with only the passthrough interface visible and the bpf unhidden for DHCP.

* DHCP: https://appjail.readthedocs.io/en/latest/networking/DHCP-and-SLAAC/ [1]
* For a thickjail, simply set the option `type=thick`.

If the feature you mention will create an interface on your host, simply pass the `vnet=interface` option like any other interface [2].

[1] The documentation refers to the use of devfs.rules(5), but appjail(1) can dynamically manage your devices: https://appjail.readthedocs.io/en/latest/DEVFS/
[2] Of course, remember that using VNET will make your interface disappear from your host.

Snail · Nov 20, 2024

I do not use any manager, just been using variables in /etc/jail.conf:

/etc/jail.conf

Code:

# interface name on the host ( the vf interfaces are named iavfX in my case)
$if       = "iavf$ifnum";
$jail_if = "$if";

exec.prestart      = "ifconfig $if up mtu $mtu -tso4 -tso6 -lro -vlanhwtso";

exec.start = "dhclient $jail_if";
exec.start += "/bin/sh /etc/rc";

# configuration of the VMs (setting interface num)
jenkins {
  $ifnum                 = "1";
}

git {
$ifnum = "2";
}

etc...

Reading this years later, I think that $if and $jail_if are redundant and test residues.
To summarize, $ifnum is set at jail level and used at global level.

I hope this was clear.

cb000 · Nov 21, 2024

DtxdF said:
* DHCP: https://appjail.readthedocs.io/en/latest/networking/DHCP-and-SLAAC/ [1]
* For a thickjail, simply set the option `type=thick`.

If the feature you mention will create an interface on your host, simply pass the `vnet=interface` option like any other interface [2].

Thank you, DtxdF. Setting mount_devfs enables the devfs_ruleset effects: appjail quick test type=thin vnet=mce3 mount_devfs devfs_ruleset=7 start

However, this does not work as expected: appjail jail create test type=thin vnet=mce3 mount_devfs devfs_ruleset=7 start

I would like to know the difference between appjail quick and appjail jail create. When using appjail jail create with the same arguments, I can see all interfaces in the jail. It seems that this type of jail is called a "Typical Jail (Legacy?)". How do I create a jail using appjail jail create that works the same as appjail quick?

Additionally, how do I mount storage using AppJail? I read the File System Management Page but found it a bit confusing as it does not mention mounting with nullfs, which I assume is used in the general mount step.

I tried appjail fstab jail test set -d /zroot/test -m /mnt/test -t nullfs but I cannot see /mnt/test in the test jail.

Snail said:
I do not use any manager, just been using variables in /etc/jail.conf:

/etc/jail.conf

Code:

exec.prestart = "ifconfig $if up mtu $mtu -tso4 -tso6 -lro -vlanhwtso"; etc...

To Snail, I understand your code uses $if to define which VF the jail will use, but I would like to know why you set -tso4 -tso6 -lro -vlanhwtso in the interface configuration. This option should disable the offload. Shouldn't it be set to enable for improving performance?

cb000 · Nov 21, 2024

I forgot to restart the jail for the changes to take effect. The mount point functions now.

sko · Nov 21, 2024

cb000 said:
To Snail, I understand your code uses $if to define which VF the jail will use, but I would like to know why you set -tso4 -tso6 -lro -vlanhwtso in the interface configuration. This option should disable the offload. Shouldn't it be set to enable for improving performance?

Intel interfaces completely reset when you add/remove virtual interfaces to/from the bridge that interface is attached to. Same goes for adding/removing vlan interfaces to the device.
I.e. *everything* that is using that physical interface will go down for a few seconds.

IIRC this is due to how intel handles the offloading in the firmware, so upon changes to the interface, the driver/firmware needs to be reset to ensure it contains the correct (virtual) interface informations. (There is/was an ancient thread on the mailing lists about this behavior where it was explained in more detail)

So to prevent the interface from going dark every time you start/stop/restart a jail, you have to disable offloading.

JohnK · Nov 21, 2024

cb000 said:
However, since I need to type multiple commands to create a jail, I am looking for a suitable jail manager that can do the same job using a formatted command or config file.

cb000, I am creating a series of "setup scripts" to set up jails and what not and I tried a few jail managers too and most were well beyond my immediate need. Since I needed to practice my shell scripting abilities to create the 'setup scripts', I decided to automate the jail setup steps as well. I wouldn't call my effort a jail manager by any stretch (tools like CBSD, Bastille, Pot, are far more professional) but my simple tool allows me to keep directories for all my 'setups' and 'jail configs' and spin up jails with one command. My tool won't fit your direct need right out the box, but I have a makefile, manpage, etc. all there. And if you happen to be a scripting genius or a full-stack engineer, I could use a quick pass over my code (I'm still learning /bin/sh). I placed an absolute ton of comments in the code so if you need to modify, it should all be there.

J

Shell Post in thread 'Simple jail making script : jcreate'

Nov 7, 2024

I have made two updates to this jail setup tool:

Added the optional ability to install packages from the host system with a 'jail.packages' variable. This allows the jail to be slimmed (extra unnecessary stuff removed from within) thus moving the jail maintenance steps to host system. This also should allow for easier copying of existing jail templates (like the plugins or scripts others have written, which may keep a separate package list). See the “emby example” in the example directory.
Added a check for existing userland (container). This should allow for updates to be run...

cb000 · Nov 21, 2024

sko said:
IIRC this is due to how intel handles the offloading in the firmware, so upon changes to the interface, the driver/firmware needs to be reset to ensure it contains the correct (virtual) interface informations. (There is/was an ancient thread on the mailing lists about this behavior where it was explained in more detail)

So to prevent the interface from going dark every time you start/stop/restart a jail, you have to disable offloading.

Understand.

Besides, I tried iocell v2.2.0 from this link. Is it a fork of the abandoned iocell? I see that the version is still iocell 2.1.2 (2017-06-17). I tried to create a jail using iocell create tag=test vnet=mce3, but the behavior is like BastilleBSD / CBSD: I can see all interfaces inside the jail.

cb000 · Nov 21, 2024

JohnK said:
cb000, I am creating a series of "setup scripts" to set up jails and what not and I tried a few jail managers too and most were well beyond my immediate need. Since I needed to practice my shell scripting abilities to create the 'setup scripts', I decided to automate the jail setup steps as well. I wouldn't call my effort a jail manager by any stretch (tools like CBSD, Bastille, Pot, are far more professional) but my simple tool allows me to keep directories for all my 'setups' and 'jail configs' and spin up jails with one command. My tool won't fit your direct need right out the box, but I have a makefile, manpage, etc. all there. And if you happen to be a scripting genius or a full-stack engineer, I could use a quick pass over my code (I'm still learning /bin/sh). I placed an absolute ton of comments in the code so if you need to modify, it should all be there.

J

Shell Post in thread 'Simple jail making script : jcreate'

Nov 7, 2024

I have made two updates to this jail setup tool:

Added the optional ability to install packages from the host system with a 'jail.packages' variable. This allows the jail to be slimmed (extra unnecessary stuff removed from within) thus moving the jail maintenance steps to host system. This also should allow for easier copying of existing jail templates (like the plugins or scripts others have written, which may keep a separate package list). See the “emby example” in the example directory.

Added a check for existing userland (container). This should allow for updates to be run...

JohnK

Thank you, JohnK. Although there are many powerful tools, I am also considering building my own personal scripts for my usage. As I am still a beginner in FreeBSD, I will follow the handbook and gain more experience with using jails. Once I have more experience with my use cases, I will generalize them and start building my tools.

JohnK · Nov 21, 2024

That tool I linked to has a few really simple examples (that have a tiny bit more complicated than necessary setup scripts). That tool allows me to keep a directory of setup scripts in an organized manner (names, packages, readme's) so when I come back to them 6 months later, I can figure stuff out (organization was the whole point). Good luck. I know the user victort is also working on "template setup scripts" (he has a lot more than I do) so you are not alone.

sko · Nov 21, 2024

cb000 said:
Understand.

Besides, I tried iocell v2.2.0 from this link. Is it a fork of the abandoned iocell? I see that the version is still iocell 2.1.2 (2017-06-17). I tried to create a jail using iocell create tag=test vnet=mce3, but the behavior is like BastilleBSD / CBSD: I can see all interfaces inside the jail.

That's the stale port. Problem with iocell version is that the version is hard-coded and wasn't bumped with all updates. My port uses the github commit date as version number (e.g. g20241118) to avoid this. It's currently done via a very crude sed 's/whatevertag/newtag/' to the main iocell script but there must be a prettier variant via the makefile...

The repo of my fork is at https://github.com/rostwald/iocell and the branch I'm merging my changes to and using in production is the 'develop' branch. I once started to also merge to the 'master' branch, but originally I wanted to keep that in sync with upstream so I might have to clean that up at some point.
Feel free to try out the develop branch, but the usual warnings apply: there might be dragons. I'm running that code in production a bit more than a dozend hosts (varying in size from ~5-35 jails), but I most certainly can't think of every edge-case to test for, so in some rare/special cases something might go wrong. I usually make sure to feed everyhing I build with enough garbage or malice input to see if it breaks, but still...

Regarding the sytax; it should be iocell create [...] vnet_interfaces=mce3 jailname for vnet or iocell create [...] interfaces=vnet0:hostbridge jailname for 'standard' vnet/epair interfaces.

Basically, any jailmanager that allows for custom commands that are directly handed over to the jail(8) command it generates from all settings is capable of passing vnet interfaces to the jail - just add the 'vnet.interface=mlxen0.160' parameter as an additional option and you're done.

Sidenote:
The major problem with passing through physical interfaces is, that jail startup fails if that interface doesn't exist (yet). This isn't an issue with virtual interfaces (if the physical interface is absent, it just isn't attached to the bridge but the jail still starts and gets attached to the bridge), but if you pass through a physical interface, it *must* be present.
I just began to implement allowing interfaces to be absent during jail startup by adding a '?' at the end of the interface name. In my case several identically configured PF gateways running in jails *might* have an ue0 interface as a backup uplink - but if that interface isn't present, the gateways still have to be started or that branch is pretty much dead. (the ue0 interface has to be handled by the jail itself; adding it to a bridge and going the 'default' way doesn't work because mobile networks are a bloody mess...)

DtxdF · Nov 21, 2024

cb000 said:
Thank you, DtxdF. Setting mount_devfs enables the devfs_ruleset effects: appjail quick test type=thin vnet=mce3 mount_devfs devfs_ruleset=7 start

However, this does not work as expected: appjail jail create test type=thin vnet=mce3 mount_devfs devfs_ruleset=7 start

I would like to know the difference between appjail quick and appjail jail create. When using appjail jail create with the same arguments, I can see all interfaces in the jail. It seems that this type of jail is called a "Typical Jail (Legacy?)". How do I create a jail using appjail jail create that works the same as appjail quick?

Additionally, how do I mount storage using AppJail? I read the File System Management Page but found it a bit confusing as it does not mention mounting with nullfs, which I assume is used in the general mount step.

I tried appjail fstab jail test set -d /zroot/test -m /mnt/test -t nullfs but I cannot see /mnt/test in the test jail.

To Snail, I understand your code uses $if to define which VF the jail will use, but I would like to know why you set -tso4 -tso6 -lro -vlanhwtso in the interface configuration. This option should disable the offload. Shouldn't it be set to enable for improving performance?

appjail-jail(1) create is not the same as appjail-quick(1) as the man pages say:

Create a new jail. This subcommand only has the responsibility of creating a jail; It is highly recommended to use appjail-quick(1) unless you know what you are doing.

And yes, nullfs is the default filesystem type and the reason why the mount point is not mounted after adding it to your fstab(5) is explained in the documentation (see the debian example in the "File System Management" section). But you can mount it at runtime without restarting the jail.

By the way, if you read the man pages of man.freebsd.org, note that only 13.4-RELASE is updated:

* https://man.freebsd.org/cgi/man.cgi....4-RELEASE+and+Ports&arch=default&format=html

Just to confirm, can you get an IP via DHCP after creating a jail with the VNET option set? With the interface created by the SR-IOV feature, of course.

DtxdF · Nov 21, 2024

cb000 said:
Thank you, JohnK. Although there are many powerful tools, I am also considering building my own personal scripts for my usage. As I am still a beginner in FreeBSD, I will follow the handbook and gain more experience with using jails. Once I have more experience with my use cases, I will generalize them and start building my tools.

Developing a new jail manager is simply interesting and can give you more information about more things. This is not a problem if you want to learn a lot more. But I think tools like AppJail, pot and CBSD will give you more than just jail creation, they will get automation and maybe this is what you need.

* AppJail has Makejails.
* Pot has flavours.
* CBSD has CBSDFiles.

I recommend you to learn only the basics, that is, jail(8) and jail.conf(5) and jump to one of the three managers.

sko · Nov 22, 2024

DtxdF said:
I recommend you to learn only the basics, that is, jail(8) and jail.conf(5) and jump to one of the three managers.

this.
I can highly recommend "FreeBSD Mastery: Jails" by Michael W. Lucas for that. It really covers everything you might ever need to know about jails and is fun to read.

Regarding automation capabilities of some jail managers: I also tried that - but TBH if I'm using automation anyways, I'd directly use jail(8) and the native tools and configuration. Things like CBSD are nice if you start completely fresh, don't want to know too much about the inards and never have any "edge cases" or some kind of (network) environment or configuration that those "do-it-all" managers can't support - because most of them have strict requirements and make many assumptions about your infrastructure and host configuration.

IMHO jail managers (the "smaller" ones) shine when you manually administer your jails or use a mix of manual and scripted setup. They streamline and simplify a lot of things and first and foremost take the burden of having enough self-control to do configuration the same way/style for every jail every time, even in a hurry. I.e. they prevent config files becoming a mess by taking over that part.

Ole · Nov 22, 2024

cb000 said:
but it is not what I want. I see it creates an epair bridge.

Got it. Sounds like a feature request, this should be added

Thanks for the case. ( 2024-11-23 update: added in upcoming CBSD 14.2.0 )

DtxdF said:
Developing a new jail manager is simply interesting and can give you more information about more things

I agree with you. I read advice like "you should use <INSERT_NAME> manager" and "Avoid jail/vm managers! use basic /usr/sbin/jail!" for more than 15 years. But few people write about the third option: since the user base is very small and for this reason there is no commercial benefit, many start writing their own manager as a research work (at least I will speak for cbsd) and sometimes - a solution (and demonstration) of some alternative concepts and visions. For this reason, thanks to managers, new users can see: "aha! it can be used like this!" and write their own wrapper - as a rule, if you do not plan to publish and attract other users - there is no need to try to please everyone and make it universal. However, if someone does not want to write their own and finds the existing manager convenient for themselves - for the authors it is a nice bonus. However, it is important to keep in mind that all managers are, in fact, one-person’s project, and sooner or later that person will get tired of it.

JohnK · Nov 22, 2024

Speaking for myself:
For me and my jail container script setup thing, I only really wanted to build the "template scripts" to allow new users to setup jails. -e.g. "plex jail", "nextcloud", "gitlab", etc.. I started using a few jail managers but getting my 'template scripts' loaded was more of a pain then I wanted (I needed an easier way to test scripts). My concept is/was that these scripts are entirely separate from the actual jail manager used to build the container(s). ...I know my concept is fairly low tech--build the jail, fetch template script, run template script--but it seemed to me each jail manager had its own "template script setup/formula" which I thought would just be a problem later, so I opted for setup template scripts be launched from within the jail itself (-i.e. separate the 'template' from the 'jail manager'). ...Basically, I think it would be great if I could contribute in some way; these jail manager authors have spent a ton of time and effort and, it would be great if we can sort of combine our efforts and knowledge on how to set up and configure jails for specific things people want like, plex, *cloud, gitsomething, etc..

Ole · Nov 23, 2024

JohnK said:
it would be great if we can sort of combine our efforts and knowledge on how to set up and configure jails for specific things people want like, plex, *cloud, gitsomething, etc..

Unfortunately, the discussion has long gone beyond the original topic. Why don't we discuss this in some mailing list or a separate topic? I'm interested in this and would join from the CBSD side ;-)

JohnK · Nov 23, 2024

I get the impression I may have stepped on a bee hive but I will try to organize my thoughts to create a thread that will afford (you) smart people a chance to pick apart my ideas.

I just sent a personal message but for anyone else interested in my crude concept here is a brief outline of my thoughts.

I believe a setup script run from within the jail is easier to write and most useful. For example, a jail setup script can be run on the base system so it can be useful in "new system" and "jail" setup.

EXAMPLE - To set up a "plex jail" we can break the setup down to:
1. setup sshd (for admin ssh login)
2. setup admin user (~/.ssh/authorized_keys file for upload)
3. install "mdnsresponder"
4. configure "mdnsresponder"
5. install "plex"
6. configure "plex"
...

Each step should essentially be a separate script which can be loaded and run from *within* a jail in order. Keeping these scripts separate will also allow these scripts to be run on a new system, mfsbsd, jail, etc.. These scripts could also be fetched from the internet or checked out from repository with `gitup`, `fetch`, or `git`.

Obviously, some steps above are optional--like setting up admin access--and I'm still trying to find a good way for dealing with this without creating a series or prompt.

nocsi · Nov 24, 2024

OCI jails are going to solve a lot of the problems people have with provisioning jails. For one, it's a standard, one of which that's immutable and vettable. If there's more configuration you need after building the jail, then go with the Vagrant-replacement approach -- Cloud-Init runtime configure the rest of the jail. We can't be coming up with new jail managers as FreeBSD is on the cusp on finally getting capabilities to run an image standard that every other OS is working with