[systemd-devel] Thoughts about storing unit/job statistics

Thu Nov 28 08:32:13 UTC 2019

On Mi, 27.11.19 14:26, Philip Withnall (philip at tecnocode.co.uk) wrote:

> > > If I were to implement this as a separate daemon, it would need to
> > > be
> > > active all the time, listening to
> > > UnitNew/UnitRemoved/JobNew/JobRemoved
> > > signals from systemd. That seems like a waste of a process. Let’s
> > > call
> > > this problem 0.
> >
> > This data is already collected and written to the journal anyway if
> > you turn on the various XYZAccounting= properties for your
> > services. Then use the invocation ID of the service and the
> > MESSAGE_ID=ae8f7b866b0347b9af31fe1c80b127c0 match in journalctl to
> > find these.
>
> We’re interested in the wall clock time that a unit/scope was active,
> not the CPU time, so I suspect we’d have to add another message along
> the same lines.

Just add another structured field to the existing message. The message
already contains IO/CPU/IP/… stats, hence adding more time stats
definitely makes sense.

> > > One approach would be to store this data in the journal, but
> > > (problems
> > > 1-3):
> > >  1. We can’t control how long the journal data is around for, or
> > > even
> > > if it’s set to persist.
> >
> > You can pull the data from the journal at your own pace, always
> > keeping the cursor you last read from around so that you don't lose
> > messages.
>
> Yes and no. The distro or admin could set the journal up to be non-
> persistent, in which case we’d need to pull the data from it before
> `systemd-journald` stops. That could work as long as we could make sure
> the pulls happen at all the right times.

Well, if people want to shoot themselves in the foot they can of
course, not sure why you should care...

> > A long-standing TODO item in systemd was to have some form of metrics
> > collector, that may be turned on and that writes a time-keyed ring
> > buffer  of metrics collected per service to disk, from the data
> > collected via cgroup attributes, bpf and so on. But so far noone has
> > found the time to do it. It probably should be decoupled from PID 1
> > in
> > some form, so that PID 1 only pings it whenever a new cgroup shall be
> > watched but the collecting/writing of the data points is done
> > entirely
> > separate from it.
>
> Would the idea with that be that it uses the journal, or not? Is there
> a task in GitHub for it?

In my current thinking it would be similar to journald in many way,
but not be journald, since the data is differently structured
(i.e. not keyed by arbitrary fields but keyed by time, just time-based
ring buffers). The idea would be to mantain ring buffers in /run/ and
/var/ similar to how journald does it, and have "systemd-metricsd"
pull at its own pace metrics from the various cgroups/bpf tables/… and
write them to these buffers. Apps could then mmap them and pull the
data out either instantly (if they are located in /run/) or after
substantial latency (if they are located in /var/) depending on the
usecase.

Ideally we wouldn't even come up with our own file format for these
ring buffers, and just use what is already established, but afaiu
there's no established standard for time series ring buffer files so
far, hence I figure we need to come up with our own. I mean, after all
the intention is not to process this data ourselves but have other
tools do that.

There's #10229.

Lennart

--
Lennart Poettering, Berlin