[systemd-devel] more verbose debug info than systemd.log_level=debug?

Sat Apr 8 17:28:00 UTC 2017

On Tue, Apr 4, 2017 at 11:55 AM, Andrei Borzenkov <arvidjaar at gmail.com> wrote:
> 03.04.2017 07:56, Chris Murphy пишет:
>> On Thu, Mar 30, 2017 at 6:07 AM, Michael Chapman <mike at very.puzzling.org> wrote:
>>
>>> I am not a filesystem developer (IANAFD?), but I'm pretty sure they're going
>>> to say "the metadata _is_ synced, it's in the journal". And it's hard to
>>> argue that. After all, the filesystem will be perfectly valid the next time
>>> it is mounted, after the journal has been replayed, and it will contain all
>>> data written prior to the sync call. It did exactly what the manpage says it
>>> does.
>>
>> That's their position.
>>
>> Also, the same file system dirtiness and journal replay is needed on
>> ext4. The sample size is too small to say categorically that the same
>> problem can't happen on ext4 in the same situation. Maybe the grub.cfg
>> is readable, but maybe the kernel isn't, or the initramfs, or
>> something else.
>>
>
> Yes, I have seen the same on ext4 which prompted me to play with journal
> replay code. Unfortunately I do not know how to reliably trigger this
> condition.

I can reliably trigger a dirty ext4 or XFS file system 100% of the
time with all recent Fedora installations when doing an offline
update. What's very non-deterministic is how this dirtiness will
manifest. Filesystems folks basically live in an alternate reality
where the farther in time a file system is from mkfs time, the more
non-deterministic the file system behaves. *shrug*

>
>>
>>> The problem here seems to be that GRUB is an incomplete XFS implementation,
>>> one which doesn't know about XFS journalling. It may be a good argument XFS
>>> shouldn't be used for /boot... but the issue can really arise with just
>>> about any other journalled filesystems, like Ext3/4.
>>
>> I wondered about it at the start, and asked about it on the XFS list
>> in the first post about the problem. The developers nearly died
>> laughing at the idea of doing journal replay in 640KiB of memory. They
>> said categorically it's not possible.
>>
>
> grub2 is not limited to 640KiB. Actually it will actively avoid using
> low memory. It switches to protected mode as the very first thing and
> can use up to 4GiB (and even this probably can be lifted on 64 bit
> platform). The real problem is the fact that grub is read-only so every
> time you access file on journaled partition it will need to replay
> journal again from scratch. This will likely be painfully slow (I
> remember that grub legacy on reiser needed couple of minutes to read
> kernel and much more to read initrd, and that was when both were smaller
> than now).

OK well that makes more sense; but yeah it still sounds like journal
replay is a non-starter. The entire fs metadata would have to be read
into memory and create something like a RAM based rw snapshot which is
backed by the ro disk version as origin, and then play the log against
the RAM snapshot. That could be faster than constantly replaying the
journal from scratch for each file access. But still - sounds overly
complicated.

I think this qualifies as "Doctor, it hurt when I do this." And the
doctor says, "So don't do that." And I'm referring to Plymouth
exempting itself from kill while also not running from initramfs. So
I'll kindly make the case with Plymouth folks to stop pressing this
particular hurt me button.

But hey, pretty cool bug. Not often is it the case you find such an
old bug so easily reproducible but near as I can tell only one person
was hitting it until I tried to reproduce it.

-- 
Chris Murphy