[systemd-devel] Service output missing from journal?

Wed Jul 6 01:11:12 UTC 2022

On Tue, 5 Jul 2022 at 00:29, Lennart Poettering <lennart at poettering.net> wrote:
>
On Mo, 04.07.22 23:15, Michael Biebl (mbiebl at gmail.com) wrote:

> Am Mo., 4. Juli 2022 um 19:36 Uhr schrieb Lennart Poettering
> <lennart at poettering.net>:
> >
> > eOn So, 03.07.22 19:29, Uwe Geuder (systemd-devel-ugeuder at snkmail.com) wrote:
> >
> > > Hi!
> > >
> > > When I run the command given below on a current Fedora CoreOS system
> > > (systemd 250 (v250.6-1.fc36)) I get a result I absolute cannot understand.
> > > Can anybody help me with what is wrong there?
> > >
> > > $ systemd-run --user sh -c 'while true; do echo foo; df -h /var/log/journal/; echo $?; sleep 3; done'
> > > Running as unit: run-r9a155474889b4d40a1ac119823bdc2bf.service
> > > $ journalctl --user -f -u run-r9a155474889b4d40a1ac119823bdc2bf
> > > [ ... similar lines redacted ... ]
> > > Jul 03 15:25:08 ip-172-31-8-116 sh[366900]: 0
> > > Jul 03 15:25:11 ip-172-31-8-116 sh[366900]: foo
> > > Jul 03 15:25:11 ip-172-31-8-116 sh[366900]: 0
> > > Jul 03 15:25:14 ip-172-31-8-116 sh[366900]: foo
> > > Jul 03 15:25:14 ip-172-31-8-116 sh[366900]: 0
> > > Jul 03 15:25:17 ip-172-31-8-116 sh[366900]: foo
> > > Jul 03 15:25:17 ip-172-31-8-116 sh[366900]: 0
> > > Jul 03 15:25:20 ip-172-31-8-116 sh[366900]: foo
> > > Jul 03 15:25:20 ip-172-31-8-116 sh[366941]: Filesystem      Size  Used Avail Use% Mounted on
> > > Jul 03 15:25:20 ip-172-31-8-116 sh[366941]: /dev/nvme0n1p4  9.5G  8.1G  1.5G  86% /var
> > > Jul 03 15:25:20 ip-172-31-8-116 sh[366900]: 0
> > > Jul 03 15:25:23 ip-172-31-8-116 sh[366900]: foo
> > > Jul 03 15:25:23 ip-172-31-8-116 sh[366900]: 0
> > > Jul 03 15:25:26 ip-172-31-8-116 sh[366900]: foo
> > > Jul 03 15:25:26 ip-172-31-8-116 sh[366900]: 0
> > > Jul 03 15:25:29 ip-172-31-8-116 sh[366900]: foo
> > > Jul 03 15:25:29 ip-172-31-8-116 sh[366900]: 0
> > > [...]
> > >
> > > So the output from the df command appears in the journal pretty rarely,
> > > appearingly at random intervals. When I run the same loop on the
> > > command line the output occurs every time.
> > >
> > > The problem was originally noted in a somewhat loaded system. However,
> > > above reproducer (including the 2 echo commands and a shorter sleep)
> > > shows the same problem even on an idling machine.
> >
> > https://github.com/systemd/systemd/issues/2913
>
> I thought about this as well, but in this case the service is still
> running. So I'm not sure if #2913 applies here.

The service is, but the "df" process exits extremely quickly, before
we can figure out what it belongs to. See the PIDs where it works,
they are different from your shell script's PID, because they are
short-lived child processes.

Thanks for your answers!

I thought I could quickly prove Lennart's explanataion correct by making
the 2 echo commands their own processes, too. They should not be
long-running ones either.

$ systemd-run --user sh -c 'while true; do /bin/echo foo; df -h /var/log/journal/; /bin/echo $?; sleep 3; done'

Interestingly enough the result looks much better:

Jul 05 23:14:07 ip-172-31-13-133 sh[2944]: foo
Jul 05 23:14:07 ip-172-31-13-133 sh[2945]: Filesystem      Size  Used Avail Use% Mounted on
Jul 05 23:14:07 ip-172-31-13-133 sh[2945]: /dev/nvme0n1p4  9.5G  2.2G  7.4G  23% /var
Jul 05 23:14:07 ip-172-31-13-133 sh[2946]: 0
Jul 05 23:14:10 ip-172-31-13-133 sh[2948]: foo
Jul 05 23:14:10 ip-172-31-13-133 sh[2949]: Filesystem      Size  Used Avail Use% Mounted on
Jul 05 23:14:10 ip-172-31-13-133 sh[2949]: /dev/nvme0n1p4  9.5G  2.2G  7.4G  23% /var
Jul 05 23:14:10 ip-172-31-13-133 sh[2950]: 0
Jul 05 23:14:13 ip-172-31-13-133 sh[2954]: Filesystem      Size  Used Avail Use% Mounted on
Jul 05 23:14:13 ip-172-31-13-133 sh[2954]: /dev/nvme0n1p4  9.5G  2.2G  7.4G  23% /var
Jul 05 23:14:13 ip-172-31-13-133 sh[2955]: 0
Jul 05 23:14:16 ip-172-31-13-133 sh[2957]: foo
Jul 05 23:14:16 ip-172-31-13-133 sh[2958]: Filesystem      Size  Used Avail Use% Mounted on
Jul 05 23:14:16 ip-172-31-13-133 sh[2958]: /dev/nvme0n1p4  9.5G  2.2G  7.4G  23% /var
Jul 05 23:14:16 ip-172-31-13-133 sh[2959]: 0
Jul 05 23:14:19 ip-172-31-13-133 sh[2961]: foo
Jul 05 23:14:19 ip-172-31-13-133 sh[2962]: Filesystem      Size  Used Avail Use% Mounted on
Jul 05 23:14:19 ip-172-31-13-133 sh[2962]: /dev/nvme0n1p4  9.5G  2.2G  7.4G  23% /var
Jul 05 23:14:19 ip-172-31-13-133 sh[2963]: 0
Jul 05 23:14:22 ip-172-31-13-133 sh[2966]: foo
Jul 05 23:14:22 ip-172-31-13-133 sh[2967]: Filesystem      Size  Used Avail Use% Mounted on
Jul 05 23:14:22 ip-172-31-13-133 sh[2967]: /dev/nvme0n1p4  9.5G  2.2G  7.4G  23% /var
Jul 05 23:14:22 ip-172-31-13-133 sh[2968]: 0
Jul 05 23:14:25 ip-172-31-13-133 sh[2970]: foo
Jul 05 23:14:25 ip-172-31-13-133 sh[2971]: Filesystem      Size  Used Avail Use% Mounted on
Jul 05 23:14:25 ip-172-31-13-133 sh[2971]: /dev/nvme0n1p4  9.5G  2.2G  7.4G  23% /var
Jul 05 23:14:25 ip-172-31-13-133 sh[2972]: 0
Jul 05 23:14:28 ip-172-31-13-133 sh[2974]: foo
Jul 05 23:14:28 ip-172-31-13-133 sh[2975]: Filesystem      Size  Used Avail Use% Mounted on
Jul 05 23:14:28 ip-172-31-13-133 sh[2975]: /dev/nvme0n1p4  9.5G  2.2G  7.4G  23% /var
Jul 05 23:14:31 ip-172-31-13-133 sh[2978]: foo
Jul 05 23:14:31 ip-172-31-13-133 sh[2979]: Filesystem      Size  Used Avail Use% Mounted on
Jul 05 23:14:31 ip-172-31-13-133 sh[2979]: /dev/nvme0n1p4  9.5G  2.2G  7.4G  23% /var
Jul 05 23:14:31 ip-172-31-13-133 sh[2980]: 0
Jul 05 23:14:34 ip-172-31-13-133 sh[2984]: 0
Jul 05 23:14:37 ip-172-31-13-133 sh[2987]: foo
Jul 05 23:14:37 ip-172-31-13-133 sh[2988]: Filesystem      Size  Used Avail Use% Mounted on
Jul 05 23:14:37 ip-172-31-13-133 sh[2988]: /dev/nvme0n1p4  9.5G  2.2G  7.4G  23% /var
Jul 05 23:14:37 ip-172-31-13-133 sh[2989]: 0

In the original form some 9/10 df outputs are missing, now any of the 3
short running processes seems to be missing only "rarely". Of course
when looking carefully you see that it's unreliable, probably racy as
you write. And I thought the big promise of systemd was no more racy
scripts. Never interpreted it like that that **I** should not write
scripts and give them to systemd :)

But seriously, given the fact that we have several shell scripts running
as services, do you have any recommendation how we can log their output
reliably? Always piping the whole script to some filter that does not
buffer???

Occasionally I have said "the log is lying" because something just
didn't make sense, but I have never seen it so systematically that
entries are missing. But after having learned that output from the end
of the lifetime of a process might be missing, from now on I will always
have a bad feeling when investigating logs. I guess it's not limited to
short-lived processes. Isn't the difference just that for a short-lived
one all output can be missing. Or does the race disappear once one line
from a process has been successfully logged already?

Reading https://github.com/systemd/systemd/issues/2913 once again: So
the whole problem (race) is only about the metadata? The messages are
not lost, just not attributed. So

$ journalctl --user -u <unit>

ends up producing non-trustworthy output. But reading the whole log
or running

$ journalctl --user --grep /dev/nvme0n1p4

(in this example) works just fine.

Well, I guess I can live with that. One just needs to understand the
limitations.

Is there a simple way to search just for those messages could not be
attributed? That might help if there is no good --grep to search for and
one takes the risk of running the unsafe -u variant.

Should the documentation mention that -u can be unreliable?

Regards,

Uwe

Uwe Geuder
Neuro Event Labs Oy
Tampere, Finland
uwe.gxuder at neuroeventlabs.com (bot check: fix 1 obvious typo)