[systemd-devel] Question about generators and adding new units in the middle of a transaction
Manuel Amador
rudd-o at rudd-o.com
Sat Nov 12 22:53:36 PST 2011
The problem I am trying to solve is this: we want to support on-boot mounting
of datasets in multiple pools before local-fs.target, some of which are
detected **late** (after cryptsetup / LVM).
Technically, we could do this today. You can set a ZFS dataset to
'mountpoint=legacy' and place it in /etc/fstab. However, this is NOT the
correct way to mount ZFS datasets, this dispenses with a LOT of advantages
that ZFS provides, and it won't work if the dataset is in a pool that has not
been imported early (perhaps because its device files are missing at that
point).
A little background:
Officially, ZFS datasets are normally mounted with the command zfs mount -a or
when a pool is imported, *without* registering them in fstab. This causes
problems for us in, e.g., this scenario (list of filesystems follows):
/ ext4
/var zfs <- not in fstab
/var/lib ext4
/var/lib/rpm zfs <- not in fstab either
As you can see, we could try let the system mount stuff normally as it would,
in which case /var/lib would not mount because there is no /var mounted at
that time, thus there is no /var/lib mountpoint.
We could also do zfs mount -a very early in the mount process, in which case
/var/lib/rpm would not mount because /var/lib would not be mounted, thus the
mountpoint /var/lib/rpm would be absent.
These are all dumb examples, but there are legitimate use cases where this is
a problem (/var on zfs, for example). And none of those strategies above
work, because we're trying to support interleaved mounting of filesystems
(yes, you can have many filesystems in ZFS, and you are encouraged to exploit
that for many reasons that are beyond this e-mail).
So what we are tryiing to do is bring ZFS datasets into systemd as first-class
elements, just like fstab filesystems are.
So what we've done is this: when the system starts up, we can tell ZFS to
prevent automounting of any datasets, then generate unit files for each
dataset (exactly like systemd would generate for /etc/fstab filesystems), then
rely on systemd to mount the filesystems in the right order.
And that's exactly what we have done, of course. Our generator generates unit
files for ZFS, and then systemd mounts everything correctly, in parallel, just
peachy. It's fuckenawesome.
BUT...
This only works if the ZFS pool was loaded very early (say, in the initramfs,
which is the case if you're booting with root=zfs, or if the devices are
available for importing the pool at the time we run the generator).
This, of course, won't work if the devices aren't available, right? Which
they aren't at the time that the generator runs.
So what we are trying to do is see if we can, during the boot process, have
**another** stage (an opportunity to run the generator again) when the block
devices have fully initialized (I think it's after udev-settle, correct me if
I am wrong, PLEASE). We would run the generator again, discovering all pools
forrealz, and then call systemctl daemon-reload to inform systemd that the
transaction has changed, and that we need to mount a number of filesystems in
addition to the ones generated in the initial systemd-zfs-generator run.
But this doesn't work. systemctl daemon-reload won't work from within the
generator (it hangs), and I suspect it won't work from within the unit files
themselves.
So, can this be done? We're not trying to turn systemd into a volume manager
-- we are merely trying to get the filesystems mounted on boot, in parallel,
in such an order that they won't conflict with the ones in /etc/fstab. If
this can't be done, why? And what would be the alternative???
Oh, in addition to that, remount-rootfs.service fails on ZFS because mount -o
remount,rw is (a) moot with ZFS on / (b) not supported. We tried to correctly
override remount-rootfs.service in the generator, but systemd wouldn't load
our override.
Thanks in advance. Systemd rocks!
On Friday, November 04, 2011 14:44:09 Mirco Tischler wrote:
> 2011/11/4 Manuel Amador <rudd-o at rudd-o.com>:
> > I am developing systemd support for ZFS:
> >
> > https://github.com/zfsonlinux/zfs/pull/435/files
> >
> > as you can see, I create the units early on bootup using a generator (a
> > mechanism that is entirely undocumented, tsk).
> >
> > Then systemd proceeds with normal system startup.
> >
> > The whole point is to be able to mount file systems of other types on
> > top of ZFS file systems, and then ZFS file systems on top of that.
> > This work lets this scenario work properly:
> >
> > / zfs
> > /blah ext4
> > /blah/blahblah zfs
> >
> > But, here is a problem. This works fine and dandy when ZFS has loaded
> > the pools at boot through dracut or something, but will most assuredly
> > fail if ZFS is not the root file system, as nothing will load the ZFS
> > module.
> >
> > We have some udev mechanisms at the moment to ensure that actually
> > happens (loading of the zfs modules, importing of all pools).
> >
> > Good and dandy so far.
> >
> > Now, this will happen during udev settle. What I want is to generate
> > more units when pools are discovered and their file systems require to
> > be mounted automatically. That is, I need to re-run the generator and
> > generate new units, and then tell systemd to daemon-reload.
> >
> > But systemd is in the middle of a transaction, serving the unit local-
> > fs.target. And, as you can imagine, the file systems that were
> > discovered late, must be linked as wants of local-fs.target.
> >
> > So my question is: what happens if I systemctl daemon-reload DURING the
> > transaction that brings the system up? Will systemd pick up the new
> > units and add them as wants of local-fs.target?
> >
> > ideal process:
> >
> > root fs is mounted
> > starting local-fs.target
> > starting block device discovery
> > block dev discovered, import pool in block dev
> > oh, we found new file systems!
> > generate units for those
> > daemon-reload to add the new units as wants for local-fs.target
> > start all of these new units
> > and then, only then, local-fs.target will reach started state.
> >
> > Is this even possible??
>
> Hi
> I know very little about ZFS so please excuse my ignorance, but I
> don't understand the problem you are trying to solve. Systemd parses
> /etc/fstab already and creates the mount units. And the necessary
> modules should be loaded automatically on mount, just like with all
> the other filesystems. And udev works nicely with systemd to announce
> new block devs. What differs in ZFS that this isn't working for you?
>
> Thanks
> Mirco
More information about the systemd-devel
mailing list