[systemd-devel] Antw: [EXT] Infinite loop at startup on var fsck failure

Vito Caputo vcaputo at pengaru.com
Wed Feb 26 18:38:37 UTC 2020


On Wed, Feb 26, 2020 at 10:39:50AM +0100, Michael Biebl wrote:
> Am Mi., 26. Feb. 2020 um 10:13 Uhr schrieb Ulrich Windl
> <Ulrich.Windl at rz.uni-regensburg.de>:
> >
> > >>> Vito Caputo <vcaputo at pengaru.com> schrieb am 25.02.2020 um 01:01 in
> > Nachricht
> > <7343_1582589314_5E546582_7343_4690_1_20200225000143.nowls5peec5sxg7v at shells.gnu
> >
> > eneration.com>:
> > > Hello list,
> > >
> > > Today I experienced an unclean shutdown due to battery dying unexpectedly,
> > > and it left my /var in a state requiring a manual fsck to repair errors.
> >
> > I wonder: Shouldn't be a fsck just be a journal reply these days? For ext >=3
> > this should be quite fast. ReiserFS was rather slow several years ago (it did
> > replay too much IMHO), but haven't used it the last five years.
> >
> > >
> > > The normal startup process failed and dropped me to a rescue shell after
> > > asking for my root password.  But I was unable to immediately run fsck
> > > manually, because systemd was endlessly trying to fsck /var.
> >
> > That's not a problem of fsck.
> 
> 
> I suspect that the real problem is, that fsck failed to fix the file
> system, so as a result, systemd tried repeatedly to start the fsck job
> for /var as var.mount was pulled in as a dependency (e.g. for
> journald).

That's what seemed to be occurring, ad infinitum.

In this particular instance, at least it wasn't due to hardware
errors and the constant barrage of disk accesses did little more
than flash the disk status light on my thinkpad and prevent
manual fscking, while I tried to figure out how to correctly calm
things down for a manual fsck.

But it doesn't seem particularly helpful for the failed fsck to
keep getting restarted.  If there were actual hardware errors,
this behavior could be exacerbating them during the
initial investigation stage.  If it were triggering bus resets
and timeouts, as I've experienced in the past with spinning rust
on the sata bus, the system could have been very difficult and
time consuming to interact with.

IMHO the failed fsck should not be retried automatically at all.
Fail the fsck more permanently, log something in the journal
about it with some hints as to what might be the appropriate next
step, and leave the system quiescent while it waits for the root
password for recovery...

Regards,
Vito Caputo


More information about the systemd-devel mailing list