[PATCH] drm: Funnel drm logs to tracepoints

Wed Dec 4 13:52:20 UTC 2019

On Wed, Dec 04, 2019 at 02:02:51PM +0100, Daniel Vetter wrote:
> On Wed, Dec 4, 2019 at 10:54 AM Pekka Paalanen <ppaalanen at gmail.com> wrote:
> >
> > On Wed, 4 Dec 2019 10:14:11 +0100
> > Daniel Vetter <daniel.vetter at ffwll.ch> wrote:
> >
> > > On Wed, Dec 4, 2019 at 8:33 AM Pekka Paalanen <ppaalanen at gmail.com> wrote:
> > > >
> > > > On Tue, 3 Dec 2019 22:20:14 +0100
> > > > Daniel Vetter <daniel.vetter at ffwll.ch> wrote:
> > > >
> > > > > On Tue, Dec 3, 2019 at 8:10 PM Sean Paul <sean at poorly.run> wrote:
> > > > > >
> > > > > > On Thu, Oct 17, 2019 at 10:22:16AM +0300, Pekka Paalanen wrote:
> > > > > > > On Wed, 16 Oct 2019 15:23:45 +0200
> > > > > > > Thomas Zimmermann <tzimmermann at suse.de> wrote:
> > > > > > >
> > > > > > > > Hi
> > > > > > > >
> > > > > > > > Am 16.10.19 um 15:05 schrieb Pekka Paalanen:
> > > > > > >
> > > > > > > > > specifically be available in production. So a new file in some fs
> > > > > > > > > somewhere it should be, and userspace in production can read it at will
> > > > > > > > > to attach to a bug report.
> > > > > > > > >
> > > > > > > > > Those semantics, "only use this content for attaching into a bug
> > > > > > > > > report" should be made very clear in the UAPI.
> > > > > > > >
> > > > > > > > Has this ever worked? As soon as a userspace program starts depending on
> > > > > > > > the content of this file, it becomes kabi. From the incidents I know,
> > > > > > > > Linus has always been quite strict about this. Even for broken interfaces.
> > > > > > >
> > > > > > > The kernel log content is not kabi, is it? I've seen it change plenty
> > > > > > > during the years. This would be just another similar log with free-form
> > > > > > > text.
> > > > > > >
> > > > > >
> > > > > > Ok, so given the more structured version of this set [1] was not well received,
> > > > > > are we all comfortable going with the freeform approach in this version?
> > > > >
> > > > > Imo yes. It's still uabi, so someone will have regrets about it. But
> > > > > given that dmesg has been around forever, and causes rather little
> > > > > breakage, I think we should be fairly ok.
> > > > >
> > > > > I still think that figuring out the drm_dev logging bikeshed might be
> > > > > good, while we noodle around in here.

Yeah, we might want to take a closer look at how logs are categorized (for
instance, it might be nice to be able to separate DP aux dumps from DP training
failures, and distinguish between atomic test failures and state dumps). All the
circuitry is in place to classify a message in multiple log categories since log
category is a bitfield.

> > > >
> > > > Hi,
> > > >
> > > > one more wacky idea: have a flight recorder buffer(s) in the kernel,
> > > > but do not expose them as is to userspace. Instead, create a trigger
> > > > somewhere (/proc?) that causes the flight recorder buffers to be
> > > > flushed into dmesg. That way the amount of new UABI is reduced to just
> > > > the trigger. Obviously this spams dmesg and would need the rights to
> > > > access dmesg to actually collect the logs. I'm not sure if that's good
> > > > or bad, but it would re-use dmesg.
> > >
> > > That's roughly how we ended up here, since trace buffer dumping is
> > > something that exists already (you can e.g. dump it on an Oops too, we
> > > do that in our CI with a few i915 tracepoints enabled). I think at
> > > that point a section in drm-uapi.rst explaining what you
> > > should/shouldn't do with these tracepoints is about as dmesg.
> >
> > Sorry?
> >
> > Is there already a kernel flight recorder buffer (a ring buffer
> > continuously overwritten with DRM debug messages) and some userspace
> > trigger to flush it out after the fact, without any preparations
> > needed beforehand from userspace?
> 
> Nope. But what you describe is why we ended up with tracepoints, since
> those already do flight recorder with on-demand dumping. Engineering
> our own thing when there's one already there seems a bit silly, and
> the debug log stuff might also be useful in some gpu trace
> visualizers. You're not going to get that with some dmesg thing.
> 
> > I specifically mean not like "you need to enable a thousand tracepoints
> > manually from userspace and they don't have any stable names so you
> > can't even do that". The whole point is to have a single one-bit stable
> > UAPI: "flush out" and nothing else, into dmesg (maybe as debug level
> > prints). Not any tracepoint enabling hassle that userspace would need
> > to take care of, since it cannot.
> 
> I thought the idea is you only enable that one drm debug tracepoint.
> Not thousands of them.

Correct. You can enable _all_ log categories by writing a 1 into the top-level
drm_print enable file. Alternatively, you can selectively enable log categories
by writing a 1 into that category's enable file (ie: drm_dbg_kms/enable). Finer
granularity is available using a trace events filter, but that would be
inadvisable since we don't want these messages to be relied upon.

> 
> > Preferably as much tied to debugfs as dmesg is: not at all.
> 
> tracepoints moved out of dmesg too.

Well, copied out, they're still available in debugfs.

Sean

> -Daniel
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch

-- 
Sean Paul, Software Engineer, Google / Chromium OS