[systemd-devel] Possible systemd segfault switching from 216 to 219 in fedora upgrade

Thu Mar 5 14:07:24 PST 2015

On 5 March 2015 at 17:07, James Hogarth <james.hogarth at gmail.com> wrote:
> On 5 March 2015 at 15:10, Lennart Poettering <lennart at poettering.net> wrote:
>>
>>
>> Right before switch rooting systemd will kill all remaining processes
>> of the initrd, including the strace, hence the strace logs aren't that
>> useful either, they end before the transition.
>>
>> Please boot with "systemd.log_level=debug systemd.log_target=kmsg" on
>> the kernel cmdline, which ensures the logs go to the kernel log
>> buffer. And then please provide the output this generates here.
>>
>> Also see:
>>
>> http://freedesktop.org/wiki/Software/systemd/Debugging/
>>
>>
>
> Thanks Lennart - good point about the strace being killed before the
> interesting bit...
>
> This makes it annoyingly tricky to see what is happening as the
> systemd-219 binary gets loaded...
>
> Screenshot has been attached to the bug:
>
> https://bugzilla.redhat.com/show_bug.cgi?id=1185604#c22
>
> Doesn't show much though - just a SEGV reported by pid1 that
> immediately results in execution being halted.
>
> Tried to put together a reduced testcase via a yum installroot style
> container to switch-root into to see what that behaviour is like and
> do see a segfault - not certain if this is the same being seen during
> the fedup switch-root though...
>
> Any ideas to get a better grasp on this?

So it's actually slightly more complicated than I had originally
thought (thanks #fedora-qa) after a brief chat with wwoods.

The path taken in the process is the initrd used by fedup is built
from the newer Fedora release (ie in the present testing this contains
systemd-219).

This starts up and then carries out a switch-root to the actual system
which in this case has systemd-216.

The reason for this is to simplify finding out where mount points are
for a clean upgrade - it's been felt the easiest way is to just 'ask'
the actual system to do this.

After the mount points are all up switch-root is used to switch back
to the initrd setup so that the upgrades can be carried out on teh
non-running system... so we have a systemd-216 to 219 transition here.

This naturally means that the serialization/deserialization needs to
be forwards *and* backwards compatible between 216 and 219 for this to
work.

>From the logs that I've pulled (see the various attachments in
https://bugzilla.redhat.com/show_bug.cgi?id=1185604 for them) it would
appear the 219 -> 216 process is fine but then switching back from 216
-> 219 is failing with the associated segfault.

There appears to be a couple of options here:

1) Try to get a workable reduced test case or better debugging from
the 216 -> 219 transition to work out why that is failing.
2) Have some sort of generator or call or similar that allows the
systemd-newer in the initrd to parse the unit files and fstab of the
installed system and carry out any mounting itself rather than using
switch-root to the installed system and asking it to do so. This would
then eliminate the jumping backwards and forwards between systemd
versions during the upgrade process.

Any thoughts on either of these options to try to get a way
forwards... or is there any additional debugging or diagnostics that I
can provide to help?

Cheers,

James