[systemd-devel] [PATCH] fstab-generator: introduce rd.weak_sysroot to bypass failures in sysroot.mount

Wed Jul 31 06:56:01 PDT 2013

On Wed, Jul 31, 2013 at 09:44:01AM -0400, Vivek Goyal wrote:
> On Wed, Jul 31, 2013 at 12:19:06PM +0200, Harald Hoyer wrote:
> > On 07/30/2013 09:14 PM, Vivek Goyal wrote:
> > > On Wed, Jul 31, 2013 at 12:46:22AM +0800, WANG Chao wrote:
> > >> On 07/31/13 at 12:32am, WANG Chao wrote:
> > >>> On 07/30/13 at 03:46pm, Zbigniew Jędrzejewski-Szmek wrote:
> > >>>> On Tue, Jul 30, 2013 at 09:43:16AM -0400, Vivek Goyal wrote:
> > >>>>> [CC harald]
> > >>>>>
> > >>>>> Not sure if this is right way to do or not but I will give more
> > >>>>> background about the issue.
> > >>>>>
> > >>>>> This assumption seems to be built into initramfs and systemd that root
> > >>>>> should always be mountable. If one can't mount root, it is a fatal
> > >>>>> failure.
> > >>>>>
> > >>>>> But in case of kdump initramfs, this assumption is no more valid. Core
> > >>>>> might be being saved to a target which is not root (say over ssh). And
> > >>>>> even if mounting root fails, it is ok.
> > >>>>>
> > >>>>> So we kind of need a mode (possibly driven by command line option) where
> > >>>>> if mouting root failed, it is ok and continue with mouting other targets
> > >>>>> and kdump module will then handle errors.
> > >>>> Maybe rootfsflags=nofail could do be used as this flag?
> > >>>
> > >>> rootflags=nofail works. Thanks.
> > >>>
> > >>> Although it results in a little difference between my approach, I prefer
> > >>> use this one than adding another cmdline param.
> > >>
> > >> I just find nofail option only works when mnt device doesn't exists.
> > >>
> > >> What if the filesytem is corrupted? sysroot.mount will and
> > >> initrd-root-fs.target will never reach.
> > > 
> > > Right.
> > > 
> > > In kdump environment, for most of the users default of dropping into a
> > > shell does not make sense. If some server crashes some where and we are
> > > not able to take dump due to some failure, most of the users will like that
> > > system reboots automatically and services are back up online.
> > > 
> > > I see that right now rd.action_on_fail is parsed by emergency.service and
> > > service does not start if this parameter is specified.
> > > 
> > > Can't we interpret this parameter little differently. That is this
> > > parameter modifies the OnFailure= behavior.
> > > 
> > > So by default OnFailure= is emergency.service which is equivalent to
> > > a shell.
> > > 
> > > A user can force change of behavior by specifying command line.
> > > 
> > > rd.action_on_failure=shell (OnFailure=emergency.service)
> > > rd.action_on_failure=reboot (OnFailure=reboot)
> > > rd.action_on_failure=continue (OnFailure=continue)
> > > 
> > > Now action_on_failure=continue will effectively pretend that unit start
> > > was successful and go ahead with starting next unit. This might be little
> > > contentious though as other dependent units will fail in unknown ways.
> > > 
> > > Now by default kdump can use rd.acton_on_failure=continue and try to
> > > save dump. If it can't due to previous failures, then it will anyway
> > > reboot the system.
> > > 
> > > Also if emergency.service stops parsing rd.action_on_failure, then kdump
> > > module will be able to start emergency.service too when it sees there
> > > is a problem. Right now when kdump module tries to start emergency.service
> > > it fails because it looks at acton_on_fail parameter (Another issue Bao is
> > > trying to solve).
> > > 
> > > Thanks
> > > Vivek
> > > 
> > 
> > Why not install your own version of emergency.service in the kdump dracut
> > module, which parses rd.action_on_failure and acts accordingly. Or replace
> > emergency.service in the dracut cmdline hook according to rd.action_on_failure.
> 
> That is doable but I think there is still one more issue. What happens
> to rest of the systemd services. Once a service fails, systemd will
> recognize it as failure and start emergency.service. Once
> emergency.service exits, what will systemd do. Will it continue to
> start other services which are dependent on failed service.
> 
> I will guess it will not start. Because dependent services are supposed
> to be started only if previous service started successfully.
> 
> If that's the case, then just replacing emergency.service is not a
> solution. In fact, I think that's how things are currently working.
> emergency.service does not start if acton_on_fail=continue is specified
> on command line.
> 
> ConditionKernelCommandLine=!action_on_fail=continue
> 
> So core of the problem here is that systemd needs to be aware that
> user wants to continue to start other services despite the fact that
> previous service failed. And using action_on_failure= command line to
> trigger change of behavior is one way of doing it.

Ok, I noticed the commit to not run emergency.service in
action_on_fail=continue is mentioned.

commit dcae873414ff643e1de790f256e414923e2aef8b
Author: Harald Hoyer <harald at redhat.com>
Date:   Thu May 30 11:14:39 2013 +0200

    systemd/emergency.service: do not run for action_on_fail=continue

    same as for dracut-emergency.service

Apart from the issue of other services not starting, we are facing another
issue. And that is, kdump module wants to start a bash shell upon failure
and start emergency.service. And now that fails because we are booted
with action_on_fail=continue.

If we make systemd aware of acton_on_fail=continue, then we can take
this out of emergency.service and problem will be solved.

I guess other option could be to modify emergency.service on the fly
(remove ConditionKernelCommandLine=!action_on_fail=continue) and
reload systemd configuration and then start emergency.service. If this
works, it will take care of second problem but not the first one.

Thanks
Vivek