[systemd-devel] Thoughts about storing unit/job statistics

Wed Nov 27 14:26:07 UTC 2019

Hey,

On Tue, 2019-11-19 at 10:26 +0100, Lennart Poettering wrote:
> On Fr, 08.11.19 11:15, Philip Withnall (philip at tecnocode.co.uk)
> wrote:
> 
> > Hello all,
> > 
> > As part of work on a GNOME feature for monitoring how often the
> > user
> > uses applications (for example, to let them know that they spent 4
> > hours in the Slack app, or 17 hours playing games), I’m trying to
> > work
> > out the best way to store data like that.
> > 
> > If we assume the system is using systemd user sessions, then an
> > application being run is actually a unit being started, and so the
> > data
> > we want to store is actually the duration of each unit run.
> > 
> > A related issue is that of storing network usage data per unit, to
> > allow the user to see which apps have been using the most data over
> > the
> > last (say) month.
> 
> This data can already be tracked by systemd for you, just set
> IPAccounting=yes for the service. Alas this only works for system
> services currently, since the bpf/cgroup2 logic this relies on is
> only
> accessible to privileged processes. (Fixing that would probably mean
> introducing a tiny privileged service that takes a cgroup fd as
> input,
> and then installs the correct bpf program and bpf table into it,
> returning the fds for these to objects. Happy to take a patch for
> that.)

Yes, the plan was already to use IPAccounting=yes, although I hadn’t
realised it only worked for system services. That’s good to know, and
will be a bit of a stumbling block we’ll have to fix before adding
support for network data. The usage duration data is what we want to
focus on first, though, so the BPF helper can wait.

> > If I were to implement this as a separate daemon, it would need to
> > be
> > active all the time, listening to
> > UnitNew/UnitRemoved/JobNew/JobRemoved
> > signals from systemd. That seems like a waste of a process. Let’s
> > call
> > this problem 0.
> 
> This data is already collected and written to the journal anyway if
> you turn on the various XYZAccounting= properties for your
> services. Then use the invocation ID of the service and the
> MESSAGE_ID=ae8f7b866b0347b9af31fe1c80b127c0 match in journalctl to
> find these.

We’re interested in the wall clock time that a unit/scope was active,
not the CPU time, so I suspect we’d have to add another message along
the same lines.

> > One approach would be to store this data in the journal, but
> > (problems
> > 1-3):
> >  1. We can’t control how long the journal data is around for, or
> > even
> > if it’s set to persist.
> 
> You can pull the data from the journal at your own pace, always
> keeping the cursor you last read from around so that you don't lose
> messages.

Yes and no. The distro or admin could set the journal up to be non-
persistent, in which case we’d need to pull the data from it before
`systemd-journald` stops. That could work as long as we could make sure
the pulls happen at all the right times.

> > So I have two questions:
> >  1. Does this seem like the kind of functionality which should go
> > into
> > the journal, if it was modified to address problems 1-3 above?
> >  1a. If not, do you have any suggestions for how to implement it so
> > that problem 0 above is not an issue, i.e. we don’t have to keep a
> > daemon running all the time just to record a small chunk of data
> > once
> > every few minutes?
> >  2. Does this seem like a subset of a larger bit of functionality,
> > storing metrics about units and jobs for later analysis, which
> > might be
> > interesting to non-desktop users of systemd?
> 
> A long-standing TODO item in systemd was to have some form of metrics
> collector, that may be turned on and that writes a time-keyed ring
> buffer  of metrics collected per service to disk, from the data
> collected via cgroup attributes, bpf and so on. But so far noone has
> found the time to do it. It probably should be decoupled from PID 1
> in
> some form, so that PID 1 only pings it whenever a new cgroup shall be
> watched but the collecting/writing of the data points is done
> entirely
> separate from it.

Would the idea with that be that it uses the journal, or not? Is there
a task in GitHub for it?

Philip