[systemd-devel] systemd-tmpfiles subvolume handling vs. changing default btrfs root
Lennart Poettering
lennart at poettering.net
Fri Jul 13 14:34:46 UTC 2018
On Fr, 29.06.18 21:04, Ignaz Forster (iforster at suse.de) wrote:
> Reordered the quotes below for better reading flow.
>
> Am 28.06.2018 um 10:52 schrieb Lennart Poettering:
> > > > But quite frankly I don't grok the problem at hand, i.e. what you are
> > > > trying to do, even.
> > >
> > > Was this explanation any better?
> >
> > Not really still, what I don't grok what precisely a "system snapshot"
> > in suse terms is actually supposed to entail. Is it supposed to
> > contain only the vendor RPMs, i.e. only /usr?
>
> That's the general idea, yes.*
>
> Everything which contains variable or user data (i.e. which is not supposed
> to be rolled back like databases or files created by the user) will be put
> onto an own subvolume or partition.
>
> For reference here's how this looks like on openSUSE Leap 15 again:
> ID parent top lvl path
> -- ------ ------- ----
> 257 5 5 <FS_TREE>/@
> 258 257 257 <FS_TREE>/@/var
> 259 257 257 <FS_TREE>/@/usr/local
> 260 257 257 <FS_TREE>/@/tmp
> 261 257 257 <FS_TREE>/@/srv
> 262 257 257 <FS_TREE>/@/root
> 263 257 257 <FS_TREE>/@/opt
> 264 257 257 <FS_TREE>/@/home
> 265 257 257 <FS_TREE>/@/boot/grub2/x86_64-efi
> 266 257 257 <FS_TREE>/@/boot/grub2/i386-pc
> 267 257 257 <FS_TREE>/@/.snapshots
> 411 267 267 <FS_TREE>/@/.snapshots/138/snapshot
> 412 267 267 <FS_TREE>/@/.snapshots/139/snapshot
>
>
> *) Some packages will still use /bin, /lib and the like, and those will be
> part of the snapshot; on the other hand distribution RPMs may also contain
> files or directories in e.g. /var, which will not be part of the snapshot.
> Because of that I'd prefer the term "static / read-only / unmodifiable part
> of the root file system" instead of "vendor RPMs".
>
> > or everything except
> > /home, /srv, /var, /tmp?
>
> Everything except the directories listed above, because those contain
> variable data which one usually doesn't want to reset just because e.g. a
> new kernel doesn't boot.
> That won't prevent the user from creating his own snapshots of these
> subvolumes of course.
>
> > > > systemd will never create disassociated subvolumes for you.
> > >
> > > That's the problem - it will create subvolumes which will just disappear
> > > from the system when switching to the next snapshot.
> >
> > Well, no, if snapshots are done recursively they wouldn't, they would
> > be switched at the same time.
>
> I think it's not relevant for this discussion, you were repeatedly talking
> about recursive snapshots now, however as far as I'm aware btrfs is not
> capable to doing that. I've found a patchset on
> https://www.spinics.net/lists/linux-btrfs/msg29205.html, but it seems the
> relevant parts for snapshot creation weren't added upstream.
>
> So how are those recursive btrfs snapshots supposed to work?
So, systemd's btrfs code supports doing recursive snapshots (which is
exposed through "machinectl clone" or "systemd-nspawn
--ephemeral"). If the upstream btrfs tools don't support them, please
work with them to fix that. There's nothing too magic about them, it's
a pity that this isn't supported yet.
> > tmpfiles won't create any subvolumes for you — except if they are
> > missing. tmpfiles can't guess the complex mappings you applied to your
> > tree, it can't know that you don't want to allow recursive snapshots,
> > but place them all in the same dir and bind mount them. Also, if I
> > understand correctly the way suse sets this up always *requires*
> > additions to fstab for any subvol created, which is clearly out of
> > focus for tmpfiles.
>
> I agree that it's next to impossible to programmatically find out what a
> user intended to do with a specific layout.
> However in my opinion it would be preferable to create at least a working,
> though maybe not optimal configuration compared to a configuration which is
> known to break in several cases (independent of the distribution).
>
> Instead of adding fstab entries (which I also have a bellyache with) it may
> be an alternative to create a mount unit instead. But yes, something would
> have to be done to mount those subvolumes on boot.
I am very much convinced that tmpfiles not should change mount
configuration. It's a tool to adjust file system objects on disk, and
it should remain that.
I think the much nicer approach is the one I suggested, i.e. where
subvol trees are always cloned in full, recursively, and it is solely
/usr and whatever else shall be disconnected fom them each tree that
is mounted into it.
> I'm wondering if just refusing to create a subvolume on a snapshot would be
> another option... That way the problem would be given back to the user or
> distribution.
My recommendation: if you really want to go with the design you
proposed, go ahead, but make sure you created the bind mounts early
enough, tmpfiles won't change them then. After all, tmpfiles will only
make changes if something is missing here, it will never change
anything that already exists into a subvol.
> > > > The assumption systemd-tmpfiles makes is always that the subvolumes
> > > > it implicitly creates for you if they are missing are associated
> > > > with the subvolume they are created below, and that this means they
> > > > are snapshotted, removed and otheerwise managed along with them.
> > >
> > > Keeping this logic more or less assumes that snapshots will always be used
> > > as static backups and pattern 3 from above must not be used.
> >
> > I don't see that at all. I mean, this all depends how you want to
> > associate /var with /. my assumption is that they belong together, but
> > i figure that's not what you have in mind? you want to keep using the
> > same /var even though you switch back and forth to different /?
>
> Exactly - viewing them as separate entities after installation has proven to
> work very reliably for us and is documented accordingly.
> As said above the reasoning behind this is that you usually don't want to
> loose e.g. all accumulated databases changes just because you have to revert
> the system state due to a failed package update.
Which is great, but then simplify your logic, and invert your model:
do a single subvol per OS installation, and include everything in it,
with the single exception of /usr (and make sure /bin and /sbin are
symlinks pointing into it). Then fix btrfs tools to do recursive
snapshots properly, and things are vastly simpler: snapshots can be
made in a single command only, and /etc/fstab only needs to carry a
single mount expression for /usr and nothing else.
Lennart
--
Lennart Poettering, Red Hat
More information about the systemd-devel
mailing list