[systemd-devel] Second (erroneous) check of rootfs?

Sun Jan 11 03:48:56 PST 2015

Hi,
11.01.2015 1:31, Chris Murphy:
> Yeah it's a bit messy and I really think to some degree this should be
> bounced back to the ext developers and say "how do you envision this
> working" because doing the right thing for ext4 really burdens
> multiple other processes: systemd of course, but also dracut, and then
> linux-utils is pulled in since that's where generic fsck is, which
> then calls e2fsck. It's at this point really overly complicated.

IMO there is nothing wrong with ext4 itself. It is relaible and simple. 
It does not tend to break spontaneously, or because mount-count reached 
some higher number, or because of extended periods of uptime, or even 
because of abrupt power-off. It only breaks when something _external_ 
really hurts it, like failed hardware, failed raid, mad user, or some 
invalid fsck.

I suppose this traditional (historical) technique of maintaining 
mount-count, running fsck at boot time before remount r/w, etc, should 
not be so much attributed specifically to ext filesystem. Most probably 
it existed long before even ext2 appeared. However, 15 years ago I was 
already wondering about the motivation of running full fsck depending of 
mount-count. What's the point really?

At present it seems even more pointless because of btrfs/xfs/... thing, 
and also because in case of trouble one can use a bootable usb stick 
with a whole lot of tools, or easily disconnect the drive in question 
and plug it into some similar box, or even prepare some write-protected 
media with a self-contained emergency system attached inside the box in 
question for immediate access, etc etc etc.

So generally I'd agree that it would be good to critically reconsider 
the validity of antique techniques for current environment.

Thank you,
Nikolai

>
> Windows and OS X by default at boot time do not do fsck. They defer to
> the kernel mount code replaying the journal and only if there's
> journal inconsistency does this trigger fsck. So I feel like somehow
> there needs to be a deference to not doing fsck unless asked, and
> maybe that requires the kdbus stuff to get finished first
> (speculation)? That way only when kernel code says, yeah no that's not
> gonna work, it can then communicate a need for an offline fsck. And
> hypothetically it's possible in such a case to unmount root and do a
> prescribed (recommended) offline fsck regardless of what filesystem it
> is, and if that exits succesfully, resume normal boot. And just do
> away with this mount ro and remount rw, and all the stupid fake fsck's
> floating around.
>
> Chris Murphy
>
>