[systemd-devel] Possible systemd segfault switching from 216 to 219 in fedora upgrade

Fri Mar 13 14:59:24 PDT 2015

On Tue, 2015-03-10 at 17:21 +0100, Lennart Poettering wrote:
> My recommendation would be to use the offline updates logic we have in
> systemd already:
> 
> http://www.freedesktop.org/wiki/Software/systemd/SystemUpdates/
> 
> systemd has been implementing this for quite a while, at least for all
> systems fedup should care for. In this scheme you mark the system for
> upgrades, you reboot using the old kernel/initrd. This will now enter
> the special upgrade mode, where you make your changes, and then you
> reboot again with the new kernel.

Yeah, I really wanted that to work, but... when you're replacing
literally the entire system *while* part of it is running, stuff gets a
little weird.

All my efforts to make major-version upgrades work from the
system-updates hook ended in weird crashes due to libraries being
replaced out from under running binaries, etc.

At this point people usually say: "Hmm, maybe you need a minimal system
you can chroot into and run the upgrade from?"

..and that's exactly what the upgrade.img initramfs is! We just need to
be able to switch-root from your running system into an upgrade image..
*WITH* your disks mounted in it.

I don't really like the new->old->new switchroot stuff, but I haven't
got a better solution at the moment.

But: if we could use something like "systemd-nspawn" to:

1) start your old system in a container,
2) let it mount its disks,
3) copy/bind/move those mounts back out to the host somehow

Then we wouldn't need to do the double-switchroot. I couldn't find a way
to make that work a couple of years ago, but maybe it's something that
we could figure out now?

> > Analysis of that makes it clear it's the mkdir_p_label function that
> > causes libselinux.so to do a type lookup on the path to segfault (at a
> > strcmp in selinux_sub) which then bubbles back up as an underlying
> > issue in this case. 
> 
> Do you have a full backtrace for this?
> 
> Which mkdir_p_label() invocation is this?

It's the mkdir_p_label() in switch_root(). It happens after
mac_selinux_finish(), but mac_selinux_finish() doesn't set
label_hnd=NULL, so it crashes due to use-after-free.

This only happens if you're doing switch-root *from* a system with
SELinux policy loaded, which explains why nobody else saw it.

(See https://bugzilla.redhat.com/show_bug.cgi?id=1185604 for the
backtrace or my other mail for a patch.)

-w