[systemd-devel] Question about generators and adding new units in the middle of a transaction

Manuel Amador rudd-o at rudd-o.com
Sat Nov 12 22:53:36 PST 2011


The problem I am trying to solve is this: we want to support on-boot mounting 
of datasets in multiple pools before local-fs.target, some of which are 
detected **late** (after cryptsetup / LVM).

Technically, we could do this today.  You can set a ZFS dataset to 
'mountpoint=legacy' and place it in /etc/fstab.  However, this is NOT the 
correct way to mount ZFS datasets, this dispenses with a LOT of advantages 
that ZFS provides, and it won't work if the dataset is in a pool that has not 
been imported early (perhaps because its device files are missing at that 
point).

A little background:

Officially, ZFS datasets are normally mounted with the command zfs mount -a or 
when a pool is imported, *without* registering them in fstab.  This causes 
problems for us in, e.g., this scenario (list of filesystems follows):

/             ext4
/var          zfs   <- not in fstab
/var/lib      ext4
/var/lib/rpm  zfs   <- not in fstab either

As you can see, we could try let the system mount stuff normally as it would, 
in which case /var/lib would not mount because there is no /var mounted at 
that time, thus there is no /var/lib mountpoint.

We could also do zfs mount -a very early in the mount process, in which case 
/var/lib/rpm would not mount because /var/lib would not be mounted, thus the 
mountpoint /var/lib/rpm would be absent.

These are all dumb examples, but there are legitimate use cases where this is 
a problem (/var on zfs, for example).  And none of those strategies above 
work, because we're trying to support interleaved mounting of filesystems 
(yes, you can have many filesystems in ZFS, and you are encouraged to exploit 
that for many reasons that are beyond this e-mail).

So what we are tryiing to do is bring ZFS datasets into systemd as first-class 
elements, just like fstab filesystems are.

So what we've done is this: when the system starts up, we can tell ZFS to 
prevent automounting of any datasets, then generate unit files for each 
dataset (exactly like systemd would generate for /etc/fstab filesystems), then 
rely on systemd to mount the filesystems in the right order.

And that's exactly what we have done, of course.  Our generator generates unit 
files for ZFS, and then systemd mounts everything correctly, in parallel, just 
peachy.  It's fuckenawesome.

BUT...

This only works if the ZFS pool was loaded very early (say, in the initramfs, 
which is the case if you're booting with root=zfs, or if the devices are 
available for importing the pool at the time we run the generator).

This, of course, won't work if the devices aren't available, right?  Which 
they aren't at the time that the generator runs.

So what we are trying to do is see if we can, during the boot process, have 
**another** stage (an opportunity to run the generator again) when the block 
devices have fully initialized (I think it's after udev-settle, correct me if 
I am wrong, PLEASE).  We would run the generator again, discovering all pools 
forrealz, and then call systemctl daemon-reload to inform systemd that the 
transaction has changed, and that we need to mount a number of filesystems in 
addition to the ones generated in the initial systemd-zfs-generator run.

But this doesn't work.  systemctl daemon-reload won't work from within the 
generator (it hangs), and I suspect it won't work from within the unit files 
themselves.

So, can this be done?  We're not trying to turn systemd into a volume manager 
-- we are merely trying to get the filesystems mounted on boot, in parallel, 
in such an order that they won't conflict with the ones in /etc/fstab.  If 
this can't be done, why?  And what would be the alternative???


Oh, in addition to that, remount-rootfs.service fails on ZFS because mount -o 
remount,rw is (a) moot with ZFS on / (b) not supported.  We tried to correctly 
override remount-rootfs.service in the generator, but systemd wouldn't load 
our override.

Thanks in advance.  Systemd rocks!

On Friday, November 04, 2011 14:44:09 Mirco Tischler wrote:
> 2011/11/4 Manuel Amador <rudd-o at rudd-o.com>:
> > I am developing systemd support for ZFS:
> > 
> > https://github.com/zfsonlinux/zfs/pull/435/files
> > 
> > as you can see, I create the units early on bootup using a generator (a
> > mechanism that is entirely undocumented, tsk).
> > 
> > Then systemd proceeds with normal system startup.
> > 
> > The whole point is to be able to mount file systems of other types on
> > top of ZFS file systems, and then ZFS file systems on top of that.
> >  This work lets this scenario work properly:
> > 
> > / zfs
> > /blah ext4
> > /blah/blahblah zfs
> > 
> > But, here is a problem.  This works fine and dandy when ZFS has loaded
> > the pools at boot through dracut or something, but will most assuredly
> > fail if ZFS is not the root file system, as nothing will load the ZFS
> > module.
> > 
> > We have some udev mechanisms at the moment to ensure that actually
> > happens (loading of the zfs modules, importing of all pools).
> > 
> > Good and dandy so far.
> > 
> > Now, this will happen during udev settle.  What I want is to generate
> > more units when pools are discovered and their file systems require to
> > be mounted automatically.  That is, I need to re-run the generator and
> > generate new units, and then tell systemd to daemon-reload.
> > 
> > But systemd is in the middle of a transaction, serving the unit local-
> > fs.target.  And, as you can imagine, the file systems that were
> > discovered late, must be linked as wants of local-fs.target.
> > 
> > So my question is: what happens if I systemctl daemon-reload DURING the
> > transaction that brings the system up?  Will systemd pick up the new
> > units and add them as wants of local-fs.target?
> > 
> > ideal process:
> > 
> > root fs is mounted
> > starting local-fs.target
> > starting block device discovery
> > block dev discovered, import pool in block dev
> > oh, we found new file systems!
> > generate units for those
> > daemon-reload to add the new units as wants for local-fs.target
> > start all of these new units
> > and then, only then, local-fs.target will reach started state.
> > 
> > Is this even possible??
> 
> Hi
> I know very little about ZFS so please excuse my ignorance, but I
> don't understand the problem you are trying to solve. Systemd parses
> /etc/fstab already and creates the mount units. And the necessary
> modules should be loaded automatically on mount, just like with all
> the other filesystems. And udev works nicely with systemd to announce
> new block devs. What differs in ZFS that this isn't working for you?
> 
> Thanks
> Mirco


More information about the systemd-devel mailing list