[systemd-devel] [PATCH] fstab-generator: introduce rd.weak_sysroot to bypass failures in sysroot.mount

Wed Jul 31 06:44:01 PDT 2013

On Wed, Jul 31, 2013 at 12:19:06PM +0200, Harald Hoyer wrote:
> On 07/30/2013 09:14 PM, Vivek Goyal wrote:
> > On Wed, Jul 31, 2013 at 12:46:22AM +0800, WANG Chao wrote:
> >> On 07/31/13 at 12:32am, WANG Chao wrote:
> >>> On 07/30/13 at 03:46pm, Zbigniew Jędrzejewski-Szmek wrote:
> >>>> On Tue, Jul 30, 2013 at 09:43:16AM -0400, Vivek Goyal wrote:
> >>>>> [CC harald]
> >>>>>
> >>>>> Not sure if this is right way to do or not but I will give more
> >>>>> background about the issue.
> >>>>>
> >>>>> This assumption seems to be built into initramfs and systemd that root
> >>>>> should always be mountable. If one can't mount root, it is a fatal
> >>>>> failure.
> >>>>>
> >>>>> But in case of kdump initramfs, this assumption is no more valid. Core
> >>>>> might be being saved to a target which is not root (say over ssh). And
> >>>>> even if mounting root fails, it is ok.
> >>>>>
> >>>>> So we kind of need a mode (possibly driven by command line option) where
> >>>>> if mouting root failed, it is ok and continue with mouting other targets
> >>>>> and kdump module will then handle errors.
> >>>> Maybe rootfsflags=nofail could do be used as this flag?
> >>>
> >>> rootflags=nofail works. Thanks.
> >>>
> >>> Although it results in a little difference between my approach, I prefer
> >>> use this one than adding another cmdline param.
> >>
> >> I just find nofail option only works when mnt device doesn't exists.
> >>
> >> What if the filesytem is corrupted? sysroot.mount will and
> >> initrd-root-fs.target will never reach.
> > 
> > Right.
> > 
> > In kdump environment, for most of the users default of dropping into a
> > shell does not make sense. If some server crashes some where and we are
> > not able to take dump due to some failure, most of the users will like that
> > system reboots automatically and services are back up online.
> > 
> > I see that right now rd.action_on_fail is parsed by emergency.service and
> > service does not start if this parameter is specified.
> > 
> > Can't we interpret this parameter little differently. That is this
> > parameter modifies the OnFailure= behavior.
> > 
> > So by default OnFailure= is emergency.service which is equivalent to
> > a shell.
> > 
> > A user can force change of behavior by specifying command line.
> > 
> > rd.action_on_failure=shell (OnFailure=emergency.service)
> > rd.action_on_failure=reboot (OnFailure=reboot)
> > rd.action_on_failure=continue (OnFailure=continue)
> > 
> > Now action_on_failure=continue will effectively pretend that unit start
> > was successful and go ahead with starting next unit. This might be little
> > contentious though as other dependent units will fail in unknown ways.
> > 
> > Now by default kdump can use rd.acton_on_failure=continue and try to
> > save dump. If it can't due to previous failures, then it will anyway
> > reboot the system.
> > 
> > Also if emergency.service stops parsing rd.action_on_failure, then kdump
> > module will be able to start emergency.service too when it sees there
> > is a problem. Right now when kdump module tries to start emergency.service
> > it fails because it looks at acton_on_fail parameter (Another issue Bao is
> > trying to solve).
> > 
> > Thanks
> > Vivek
> > 
> 
> Why not install your own version of emergency.service in the kdump dracut
> module, which parses rd.action_on_failure and acts accordingly. Or replace
> emergency.service in the dracut cmdline hook according to rd.action_on_failure.

That is doable but I think there is still one more issue. What happens
to rest of the systemd services. Once a service fails, systemd will
recognize it as failure and start emergency.service. Once
emergency.service exits, what will systemd do. Will it continue to
start other services which are dependent on failed service.

I will guess it will not start. Because dependent services are supposed
to be started only if previous service started successfully.

If that's the case, then just replacing emergency.service is not a
solution. In fact, I think that's how things are currently working.
emergency.service does not start if acton_on_fail=continue is specified
on command line.

ConditionKernelCommandLine=!action_on_fail=continue

So core of the problem here is that systemd needs to be aware that
user wants to continue to start other services despite the fact that
previous service failed. And using action_on_failure= command line to
trigger change of behavior is one way of doing it.

Thanks
Vivek