[PATCH] drm: Funnel drm logs to tracepoints

Tue Oct 15 11:18:17 UTC 2019

Quoting Sean Paul (2019-10-10 23:48:08)
> From: Sean Paul <seanpaul at chromium.org>
> 
> *Record scratch* You read that subject correctly, I bet you're wondering
> how we got here. At least hear me out before you flame :-)
> 
> For a long while now, we (ChromeOS) have been struggling getting any
> value out of user feedback reports of display failures (notably external
> displays not working). The problem is that all logging, even fatal
> errors (well, fatal in the sense that a display won't light up) are
> logged at DEBUG log level. So in order to extract these logs, you need
> to be able to turn on logging, and reproduce the issue with debug
> enabled. Unfortunately, this isn't really something we can ask CrOS users
> I spoke with airlied about this and RHEL has similar issues.
> 
> This is the point where you ask me, "So Sean, why don't you just enable
> DRM_UT_BLAH?". Good question! Here are the reasons in ascending order of
> severity:
>  1- People aren't consistent with their categories, so we'd have to
>     enable a bunch to get proper coverage
>  2- We don't want to overwhelm syslog with drm spam, others have to use
>     it too
>  3- Console logging is slow
> 
> Hopefully you're with me so far. I have a problem that has no existing
> solution. What I really want is a ringbuffer of the most recent logs
> (in the categories I'm interested in) exposed via debugfs so I can
> extract it when users file feedback.

A nitpick, tracepoints are no longer in debugfs but tracefs. I'm being
told the reason is because production environments want to use them and
expect them to be stable.

> It just so happens that there is something which does _exactly_ this! I
> can dump the most recent logs into tracepoints, turn them on and off
> depending on which category I want, and pull them from debugfs on demand.
> 
> "What about trace_printk()?" You'll say. It doesn't give us the control we
> get from using tracepoints and it's not meant to be left sprinkled around
> in code.
> 
> So that is how we got here, now it's time for you to tell me why this is
> a horrible idea :-)

Being devil's advocate; How long until all our debugging messages will be
kernel ABI?

In the context of the DRM subsystem level unified tracing
(Message-Id: 20190121232040.26114-1-chris at chris-wilson.co.uk)
we already struggled to find the sweet spot of exposing only
information we can maintain long term.

I can imagine this de-railing into to a direction when
the userspace debugging information of interest is extracted
from the kernel debug messages. When that message format
changes and breaks the userspace tool, you probably know
the drill.

I like the base idea, but implementation through tracepoints
has great potential to become maintenance nightmare. So maybe
something actually in debugfs might be the right solution?

Regards, Joonas