[PATCH v3] drm: Funnel drm logs to tracepoints

Pekka Paalanen ppaalanen at gmail.com
Mon Dec 16 08:55:44 UTC 2019


On Fri, 13 Dec 2019 11:54:33 -0500
Sean Paul <sean at poorly.run> wrote:

> On Fri, Dec 13, 2019 at 01:34:46PM +0200, Pekka Paalanen wrote:
> > On Thu, 12 Dec 2019 15:32:35 -0500
> > Sean Paul <sean at poorly.run> wrote:
> >   
> > > From: Sean Paul <seanpaul at chromium.org>
> > > 
> > > For a long while now, we (ChromeOS) have been struggling getting any
> > > value out of user feedback reports of display failures (notably external
> > > displays not working). The problem is that all logging, even fatal
> > > errors (well, fatal in the sense that a display won't light up) are
> > > logged at DEBUG log level. So in order to extract these logs, users need
> > > to be able to turn on logging, and reproduce the issue with debug
> > > enabled. Unfortunately, this isn't really something we can ask CrOS users
> > > to do. I spoke with airlied about this and RHEL has similar issues. After
> > > a few more people piped up on previous versions of this patch, it is a
> > > Real Issue.
> > > 
> > > So why don't we just enable DRM_UT_BLAH? Here are the reasons in
> > > ascending order of severity:
> > >  1- People aren't consistent with their categories, so we'd have to
> > >     enable a bunch to get proper coverage
> > >  2- We don't want to overwhelm syslog with drm spam, others have to use
> > >     it too
> > >  3- Console logging is slow
> > > 
> > > So what we really want is a ringbuffer of the most recent logs
> > > (filtered by categories we're interested in) exposed via debugfs so the
> > > logs can be extracted when users file feedback.
> > > 
> > > It just so happens that there is something which does _exactly_ this!
> > > This patch dumps drm logs into tracepoints, which allows us to turn tracing
> > > on and off depending on which category is useful, and pull them from
> > > tracefs on demand.
> > > 
> > > What about trace_printk()? It doesn't give us the control we get from using
> > > tracepoints and it's not meant to be left sprinkled around in code.
> > > 
> > > Cc: David Airlie <airlied at gmail.com>
> > > Cc: Daniel Vetter <daniel.vetter at ffwll.ch>
> > > Cc: Pekka Paalanen <ppaalanen at gmail.com>
> > > Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
> > > Cc: Thomas Zimmermann <tzimmermann at suse.de>
> > > Cc: Rob Clark <robdclark at gmail.com>
> > > Cc: Ville Syrjälä <ville.syrjala at linux.intel.com>
> > > Acked-by: Daniel Vetter <daniel.vetter at ffwll.ch>
> > > Signed-off-by: Sean Paul <seanpaul at chromium.org>
> > > Link: https://patchwork.freedesktop.org/patch/msgid/20191010204823.195540-1-sean@poorly.run #v1
> > > 
> > > Changes in v2:
> > > - Went with a completely different approach: https://lists.freedesktop.org/archives/dri-devel/2019-November/243230.html
> > > 
> > > Changes in v3:
> > > - Changed commit message to be a bit less RFC-y
> > > - Make class_drm_category_log an actual trace class
> > > ---
> > > 
> > > Even though we don't want it to be, this is UAPI. So here's some userspace
> > > code which uses it:
> > > https://chromium-review.googlesource.com/c/chromiumos/platform2/+/1965562
> > > 
> > > 
> > >  drivers/gpu/drm/drm_print.c      | 143 ++++++++++++++++++++++++++-----
> > >  include/trace/events/drm_print.h | 116 +++++++++++++++++++++++++
> > >  2 files changed, 239 insertions(+), 20 deletions(-)
> > >  create mode 100644 include/trace/events/drm_print.h  
> > 
> > Hi,
> > 
> > reading the userspace patch is very enlightening, thanks.
> > 
> > It uses debugfs, and it uses the generic tracing UAPI. When all
> > distributions will enable this debug logging like you do in your
> > userspace patch (I really want that to be the end result, since
> > otherwise we are back to asking people to manually enable debug and then
> > reproduce the failure), does that scale?
> > 
> > What if V4L2 is the next one deciding they need a similar logging
> > framework to debug camera issues? If the trace log is already flooded
> > with DRM messages, it will be useless for them?
> > 
> > Or maybe someone else wants their piece and flood it even more
> > aggressively than DRM, making the DRM messages disappear before they
> > can be saved?
> > 
> > Is there a way to pull out messages
> > from /sys/kernel/debug/tracing/trace and filter them on reading instead
> > of on writing?  
> 
> Hi Pekka,
> Yep, there's also a trace_pipe output from tracefs. So you could pipe the output
> through a classifier in userspace and split off different subsystems.

Hi,

IOW, one needs to run a userspace daemon to read the pipe, filter it,
and keep a flight recorder buffer, instead of the kernel keeping it.
Right?

I suppose keeping the flight recorder in userspace makes it much more
flexible with things like dynamic sizing and remotes and whatnot. Is
this where you would like to aim for, for general purpose desktop
distributions to run one at all times?

> I think if this type of patch proliferates this could be a problem. In that case,
> we'd probably have to become better citizens and reclassify some of our log
> messages such that we're not needlessly spamming the trace buffer.

How will you do that in a way that people who have already taken to bug
reporting logging do not scream "UAPI broke, the logs are much less
useful now"? :-)

Although I might worry more about people screaming "my tracing broke"
if the buffers fill up with DRM chatter in the first place. After all,
the aim is to have this enabled in desktop distributions by default,
and if distributions get bug reports forcing them to disable it by
default, then we're back to square "please enable debugging and
reproduce the issue so we have something to look at" again.

FWIW, I'm very much in favour of your overall goal, but I'm worried it
might not be sustainable or practical in the long term outside of very
specialized distributions. Then again, I have no alternative
suggestions that would have the same level of code and UAPI re-use.

> I'm not too worried about this since we have a number of tools at our disposal.

What tools would those be?


Thanks,
pq
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20191216/f3219f18/attachment.sig>


More information about the dri-devel mailing list