[systemd-devel] Possible systemd segfault switching from 216 to 219 in fedora upgrade

Sun Mar 8 15:32:12 PDT 2015

On Thu, 05.03.15 22:07, James Hogarth (james.hogarth at gmail.com) wrote:

> > Tried to put together a reduced testcase via a yum installroot style
> > container to switch-root into to see what that behaviour is like and
> > do see a segfault - not certain if this is the same being seen during
> > the fedup switch-root though...
> >
> > Any ideas to get a better grasp on this?
> 
> So it's actually slightly more complicated than I had originally
> thought (thanks #fedora-qa) after a brief chat with wwoods.
> 
> The path taken in the process is the initrd used by fedup is built
> from the newer Fedora release (ie in the present testing this contains
> systemd-219).
> 
> This starts up and then carries out a switch-root to the actual system
> which in this case has systemd-216.

We don't support downgrades really. The reexec stuff should work fine
for upgrades, but downgrades is nothing we could even remotely test,
or even think/know about to work. fedup really shouldn't do that.

> The reason for this is to simplify finding out where mount points are
> for a clean upgrade - it's been felt the easiest way is to just 'ask'
> the actual system to do this.
> 
> After the mount points are all up switch-root is used to switch back
> to the initrd setup so that the upgrades can be carried out on teh
> non-running system... so we have a systemd-216 to 219 transition here.
> 
> This naturally means that the serialization/deserialization needs to
> be forwards *and* backwards compatible between 216 and 219 for this to
> work.

Yeah, but no. Allowing uprgades is one thing, allowing downgrades a
completely different one, and nothing we want to support.

> >From the logs that I've pulled (see the various attachments in
> https://bugzilla.redhat.com/show_bug.cgi?id=1185604 for them) it would
> appear the 219 -> 216 process is fine but then switching back from 216
> -> 219 is failing with the associated segfault.
> 
> There appears to be a couple of options here:
> 
> 1) Try to get a workable reduced test case or better debugging from
> the 216 -> 219 transition to work out why that is failing.
> 2) Have some sort of generator or call or similar that allows the
> systemd-newer in the initrd to parse the unit files and fstab of the
> installed system and carry out any mounting itself rather than using
> switch-root to the installed system and asking it to do so. This would
> then eliminate the jumping backwards and forwards between systemd
> versions during the upgrade process.

I am not really sure I follow here...

> Any thoughts on either of these options to try to get a way
> forwards... or is there any additional debugging or diagnostics that I
> can provide to help?

Well, it might be possible to get coredump out of the thing, by
disabling the core_pattern stuff, and first booting into init=/bin/sh,
then setting RLIMIT_CORE with ulimit in the shell, and then execing
systemd with the raised limit. THen, use gdb to extract the stack
trace from it?

Lennart

-- 
Lennart Poettering, Red Hat