[PATCH v1] dynamic_debug: add support for logs destination

Daniel Vetter daniel at ffwll.ch
Wed Oct 11 09:42:24 UTC 2023


On Wed, Oct 11, 2023 at 11:48:16AM +0300, Pekka Paalanen wrote:
> On Tue, 10 Oct 2023 10:06:02 -0600
> jim.cromie at gmail.com wrote:
> 
> > since I name-dropped you all,
> 
> Hi everyone,
> 
> I'm really happy to see this topic being developed! I've practically
> forgot about it myself, but the need for it has not diminished at all.
> 
> I didn't understand much of the conversation, so I'll just reiterate
> what I would use it for, as a Wayland compositor developer.
> 
> I added a few more cc's to get better coverage of DRM and Wayland
> compositor developers.
> 
> > On Tue, Oct 10, 2023 at 10:01 AM <jim.cromie at gmail.com> wrote:
> > >
> > > On Mon, Oct 9, 2023 at 4:47 PM Łukasz Bartosik <lb at semihalf.com> wrote:  
> 
> ...
> 
> > > > I don't have a real life use case to configure different trace
> > > > instance for each callsite.
> > > > I just tried to be as much flexible as possible.
> > > >  
> > >
> > > Ive come around to agree - I looked back at some old threads
> > > (that I was a part of, and barely remembered :-}
> > >
> > > At least Sean Paul, Lyude, Simon Ser, Pekka Paalanen
> > > have expressed a desire for a "flight-recorder"
> > > it'd be hard to say now that 2-3 such buffers would always be enough,
> > > esp as theres a performance reason for having your own.
> 
> A Wayland compositor has roughly three important things where the kernel
> debugs might come in handy:
> - input
> - DRM KMS
> - DRM GPU rendering
> 
> DRM KMS is the one I've been thinking of in the flight recorder context
> the most, because KMS hardware varies a lot, and there is plenty of
> room for both KMS drivers and KMS userspace to go wrong. The usual
> result is your display doesn't work, so the system is practically
> unusable to the end user. In the wild, the simplest or maybe the only
> way out of that may be a reboot, maybe an automated one (e.g. digital
> signage). In order to debug such problems, we would need both
> compositor logs and the relevant kernel debug messages.
> 
> For example, Weston already has a flight recorder framework of its own,
> so we have the compositor debug logs. It would be useful to get the
> selected kernel debug logs in the same place. It could be used for
> automated or semi-manual bug reporting, for example, making the
> administrator or end user life much easier reporting issues.
> 
> Since this is usually a production environment, and the Wayland
> compositor runs without root privileges, we need something that works
> with that. We would likely want the kernel debug messages in the
> compositor to combine and order them properly with the compositor debug
> messages.
> 
> It's quite likely that developers would like to pick and choose which
> kernel debug messages might be interesting enough to record, to avoid
> excessive log flooding. The flight recorder in Weston is fixed size to
> avoid running out of memory or disk space. I can also see that Weston
> could have debugging options that affect which kernel debug messages it
> subscribes to. We can have a reasonable default setup that allows us to
> pinpoint the problem area and figure out most problems, and if needed,
> we could ask the administrator pass another debug option to Weston. It
> helps if there is just one place to configure everything about the
> compositor.
> 
> This implies that it would be really nice to have userspace subscriber
> specific debug message streams from the kernel, or a good way to filter
> the messages we want. A Wayland compositor would not be interested in
> file system or wireless debugs for example, but another system
> component might be. There is also a security aspect of which component is
> allowed to see which messages in case they could contain anything
> sensitive (input debug could contain typed passwords).
> 
> Configuring the kernel debug message selection for our debug message
> stream can and probably should require elevated privileges, and we can
> likely solve that in userspace with a daemon or such, to allow the
> Wayland compositor to run as a regular user.
> 
> Thinking of desktop systems, and especially physically multi-seat systems:
> - there can be multiple different Wayland compositors running simultaneously
> - each of them may want debug messages only from a specific DRM KMS
>   device instance, and not from others
> - each of them may have a different idea of which debug messages are important
> - because DRM KMS leasing is a thing, different compositor instances
>   could be using the same DRM KMS device instance simultaneously; since
>   this is specific to DRM KMS, and it should be harmless to get a
>   little too much DRM KMS debug (that is, from the whole device instead
>   of just the leased parts), it may not be worth to consider splitting
>   debug message streams this far.
> 
> If userspace is offered some standardised fields in kernel debug
> structures, then userspace could do some filtering on its own too, but I
> guess it would be better to filter at the source and not need that.
> 
> There is also an anti-goal. The kernel debug message contents are
> specifically not machine-parsable. I very much do not want to impose
> debug strings as ABI, they are for human (and AI?) readers only.
> 
> 
> As a summary, here are the most important requirements first:
> - usable in production as a normal thing to enable always by default
> - final delivery to unprivileged userspace process

I think this is the one that's trickiest, and I don't fully understand why
you need it. The issues I'm seeing:

- logs tend to leak a lot of kernel internal state that's useful for
  attacks. There's measures for the worst (like obfuscating kernel
  pointers by hashing them), so there's always going to be a difference
  here between what full sysadmin can get and what unpriviledged userspace
  can get. And there's always a risk we miss something that we should
  obfuscate but didn't.

- handing this to userspace increases the risks it becomes uapi. Who's
  going to stop compositors from sussing out the reason an atomic commit
  failed from the logs if they can get them easily, and these logs contain
  very interesting information about why something failed?

  Much better if journald or a crash handler assemebles all the different
  flight recorder logs and packages it into a bug report so that the
  compositor cannot ever get at these directly. Yeah this needs some OS
  support with a dbus request or similar so that the compositor can ask
  for a crash dump with everything relevant to its session.

- the idea of an in-kernel flight recorder is that it's really fast. The
  entire tracing infra is built such that recording an event is really
  quick, but printing it is not - the entire string formatting is delayed
  to when userspace reads the buffers. If you constantly push the log
  messages to userspace we toss the advantage of the low-overhead
  in-kernel flight recorder. If you push logs to dmesg there's a
  substantial measureable overhead which you don't really want in
  production, and your requirement would impose the same.

- I'm not sure how this is supposed to mesh with userspace log aggregators
  like journald when every compositor has it's own flight recorder on top.
  Feels a bit like a solution that ignores the entire os stack and assumes
  that the only pieces we can touch are the kernel and the compositor to
  get to such a flight recorder.

  You might object that events will get out-of-order if you merge multiple
  logs after the fact, but that's the case anyway if we use the tracing
  framework since that's always a ringbuffer within the kernel and not
  synchronous. The only thing we could do is allow userspace to push
  markers into that ringbuffer, which is easily done by adding more debug
  output lines (heck we could even add a logging cookie to certain ioctl
  when userspace really cares about knowing exact ordering of it's
  requests with the stuff the kernel does).

- If you really want direct deliver to userspace I guess we could do
  something where sessiond opens the flight recorder fd for you, sets it
  all up and passes it to the compositor. But I'm really not a big fan of
  sending the full kms dbg spam to compositors to freely digest in real
  time.

> - per debug-print selection of messages (finer or coarser, categories
>   within a kernel sub-system could be enough)
> - per originating device (driver instance) selection of messages

The dyndbg stuff can do all that already, which is why I'm so much in
favour of relying on that framework.

> - all selections tailored separately for each userspace subscriber
> (- per open device file description selection of messages)

Again this feels like a userspace problem. Sessions could register what
kind of info they need for their session, and something like journald can
figure out how to record it all.

If you want the kernel to keep separate flight recorders I guess we could
add that, but I don't think it currently exists for the dyndbg stuff at
least. Maybe a flight recorder v2 feature, once the basics are in.

> That's my idea of it. It is interesting to see how far the requirements
> can be reasonably realised.

I think aside from the "make it available directly to unpriviledged
userspace" everything sounds reasonable and doable.

More on the process side of things, I think Jim is very much looking for
acks and tested-by by people who are interested in better drm logging
infra. That should help that things are moving in a direction that's
actually useful, even when it's not yet entirely complete.

Cheers, Sima
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


More information about the dri-devel mailing list