[PATCH] drm: Funnel drm logs to tracepoints

Sean Paul sean at poorly.run
Tue Oct 15 19:11:56 UTC 2019


On Tue, Oct 15, 2019 at 02:07:00PM +0200, Thomas Zimmermann wrote:
> Hi
> 

Hi Thomas (and Daniel/Joonas/Rob),
Thanks for your feedback

> Am 10.10.19 um 22:48 schrieb Sean Paul:
> > From: Sean Paul <seanpaul at chromium.org>
> > 
> > *Record scratch* You read that subject correctly, I bet you're wondering
> > how we got here. At least hear me out before you flame :-)
> > 
> > For a long while now, we (ChromeOS) have been struggling getting any
> > value out of user feedback reports of display failures (notably external
> > displays not working). The problem is that all logging, even fatal
> > errors (well, fatal in the sense that a display won't light up) are
> > logged at DEBUG log level. So in order to extract these logs, you need
> > to be able to turn on logging, and reproduce the issue with debug
> > enabled. Unfortunately, this isn't really something we can ask CrOS users
> > I spoke with airlied about this and RHEL has similar issues.
> > 
> > This is the point where you ask me, "So Sean, why don't you just enable
> > DRM_UT_BLAH?". Good question! Here are the reasons in ascending order of
> > severity:
> >  1- People aren't consistent with their categories, so we'd have to
> >     enable a bunch to get proper coverage
> >  2- We don't want to overwhelm syslog with drm spam, others have to use
> >     it too
> >  3- Console logging is slow
> > 
> > Hopefully you're with me so far. I have a problem that has no existing
> > solution. What I really want is a ringbuffer of the most recent logs
> > (in the categories I'm interested in) exposed via debugfs so I can
> > extract it when users file feedback.
> 
> For bug reports, I don't want categories or anything else that users can
> switch on/off. All I'd want is a simple way of retrieving the last ~100
> messages from DRM (ala: "please attach the content of the file at
> /sys/debug...").
> 

Yeah, this is basically what I'm looking for (although I'd filter out a few
categories that are particularly noisy and bump N to as big as I can get away
with).

> > It just so happens that there is something which does _exactly_ this! I
> > can dump the most recent logs into tracepoints, turn them on and off
> > depending on which category I want, and pull them from debugfs on demand.
> > 
> > "What about trace_printk()?" You'll say. It doesn't give us the control we
> > get from using tracepoints and it's not meant to be left sprinkled around
> > in code.
> > 
> > So that is how we got here, now it's time for you to tell me why this is
> > a horrible idea :-)
> 
> Tracepoints are considered stable uapi, right? As a distro person (SUSE)
> I don't want us to have to maintain debugging messages forever.
> 

This is something I wasn't aware of (that's what reviews are for, right?!). So
this idea is perhaps dead in the water. I could probably integrate with
ringbuffer at a lower level and expose a drm-specific debugfs node without _too_
much extra goo, but [ab]using tracepoints seemed pretty natural. The
other benefit to this approach is that backporting it into distro kernels is
relatively easy.

> 
> The problem itself doesn't seem related to DRM. Do other subsystems have
> similar requirements?

I think other subsystems are chattier than ours (ie: when an error occurs they
actually log something). Some of our drivers in drm are more helpful than others
as far as complaining when something unexpected happens. The core in particular
is really conservative about issuing errors (which makes sense since a lot of
the failures are expected (atomic testing is messy), only the compositor really
knows which ones are noteworthy).

I've been holding onto this patch for a couple weeks now because although it
does what I want (and it seems others want), it does feel pretty greasy.

One half-baked idea I had (when I was thinking about what to do after this
particular approach was inevitably nak'd :) would be to audit some of the more
important messages and re-categorize them (or rather make use of category being
a bit field to add a new category). The new category would be much lower volume
and could be turned on in production. It would hopefully glean a little more use
from the console logs. Examples of uses would be dp link training messages,
modeset failures, bandwidth errors, etc). The downside to this is that it's a
ton of work, lots of churn, hard to enforce as new driver support is added, and
a complete horror-show to backport. So, not sure about embarking down that path.

I'm desperate for suggestions, if anyone has better ideas!

Sean

PS- Something I didn't address is how this approach differs from just enabling
    dynamic debug. We could probably print out the drm category in
    drm_[dev_]printk and then use a filter on dynamic debug to enable a single
    category. However we would still be using the [slow] console and most
    categories would overwhelm the console with drm spam. So not a good fit for
    what I'm after.

> 
> Best regards
> Thomas
> 
> > Cc: David Airlie <airlied at gmail.com>
> > Cc: Daniel Vetter <daniel.vetter at ffwll.ch>
> > Signed-off-by: Sean Paul <seanpaul at chromium.org>
> > ---
> >  drivers/gpu/drm/drm_print.c      | 143 ++++++++++++++++++++++++++-----
> >  include/trace/events/drm_print.h | 116 +++++++++++++++++++++++++
> >  2 files changed, 239 insertions(+), 20 deletions(-)
> >  create mode 100644 include/trace/events/drm_print.h
> > 
> > diff --git a/drivers/gpu/drm/drm_print.c b/drivers/gpu/drm/drm_print.c
> > index 9a25d73c155c..f591292811aa 100644
> > --- a/drivers/gpu/drm/drm_print.c
> > +++ b/drivers/gpu/drm/drm_print.c
> > @@ -27,11 +27,15 @@
> >  
> >  #include <stdarg.h>
> >  
> > +#include <linux/bitops.h>
> >  #include <linux/io.h>
> >  #include <linux/moduleparam.h>
> >  #include <linux/seq_file.h>
> >  #include <linux/slab.h>
> >  
> > +#define CREATE_TRACE_POINTS
> > +#include <trace/events/drm_print.h>
> > +
> >  #include <drm/drm.h>
> >  #include <drm/drm_drv.h>
> >  #include <drm/drm_print.h>
> > @@ -241,10 +245,10 @@ void drm_dev_printk(const struct device *dev, const char *level,
> >  	struct va_format vaf;
> >  	va_list args;
> >  
> > -	va_start(args, format);
> >  	vaf.fmt = format;
> >  	vaf.va = &args;
> >  
> > +	va_start(args, format);
> >  	if (dev)
> >  		dev_printk(level, dev, "[" DRM_NAME ":%ps] %pV",
> >  			   __builtin_return_address(0), &vaf);
> > @@ -253,49 +257,145 @@ void drm_dev_printk(const struct device *dev, const char *level,
> >  		       level, __builtin_return_address(0), &vaf);
> >  
> >  	va_end(args);
> > +
> > +	va_start(args, format);
> > +	trace_drm_log(level, dev, &vaf);
> > +	va_end(args);
> >  }
> >  EXPORT_SYMBOL(drm_dev_printk);
> >  
> > +static unsigned int drm_trace_enabled(unsigned int category)
> > +{
> > +	unsigned int bit;
> > +
> > +	for_each_set_bit(bit, (unsigned long*)&category, sizeof(category) * 8) {
> > +		switch (BIT(bit)) {
> > +		case DRM_UT_NONE:
> > +			return trace_drm_dbg_none_enabled();
> > +		case DRM_UT_CORE:
> > +			return trace_drm_dbg_core_enabled();
> > +		case DRM_UT_DRIVER:
> > +			return trace_drm_dbg_driver_enabled();
> > +		case DRM_UT_KMS:
> > +			return trace_drm_dbg_kms_enabled();
> > +		case DRM_UT_PRIME:
> > +			return trace_drm_dbg_prime_enabled();
> > +		case DRM_UT_ATOMIC:
> > +			return trace_drm_dbg_atomic_enabled();
> > +		case DRM_UT_VBL:
> > +			return trace_drm_dbg_vbl_enabled();
> > +		case DRM_UT_STATE:
> > +			return trace_drm_dbg_state_enabled();
> > +		case DRM_UT_LEASE:
> > +			return trace_drm_dbg_lease_enabled();
> > +		case DRM_UT_DP:
> > +			return trace_drm_dbg_dp_enabled();
> > +		default:
> > +			return trace_drm_dbg_unknown_enabled();
> > +		}
> > +	}
> > +	return false;
> > +}
> > +
> > +static void drm_do_trace(const struct device *dev, unsigned int category,
> > +			 struct va_format *vaf)
> > +{
> > +	WARN_ON(hweight32(category) > 1);
> > +
> > +	switch (category) {
> > +	case DRM_UT_NONE:
> > +		trace_drm_dbg_none(dev, vaf);
> > +		break;
> > +	case DRM_UT_CORE:
> > +		trace_drm_dbg_core(dev, vaf);
> > +		break;
> > +	case DRM_UT_DRIVER:
> > +		trace_drm_dbg_driver(dev, vaf);
> > +		break;
> > +	case DRM_UT_KMS:
> > +		trace_drm_dbg_kms(dev, vaf);
> > +		break;
> > +	case DRM_UT_PRIME:
> > +		trace_drm_dbg_prime(dev, vaf);
> > +		break;
> > +	case DRM_UT_ATOMIC:
> > +		trace_drm_dbg_atomic(dev, vaf);
> > +		break;
> > +	case DRM_UT_VBL:
> > +		trace_drm_dbg_vbl(dev, vaf);
> > +		break;
> > +	case DRM_UT_STATE:
> > +		trace_drm_dbg_state(dev, vaf);
> > +		break;
> > +	case DRM_UT_LEASE:
> > +		trace_drm_dbg_lease(dev, vaf);
> > +		break;
> > +	case DRM_UT_DP:
> > +		trace_drm_dbg_dp(dev, vaf);
> > +		break;
> > +	default:
> > +		trace_drm_dbg_unknown(dev, vaf);
> > +		break;
> > +	}
> > +}
> > +
> >  void drm_dev_dbg(const struct device *dev, unsigned int category,
> >  		 const char *format, ...)
> >  {
> >  	struct va_format vaf;
> > +	unsigned int bit;
> >  	va_list args;
> >  
> > -	if (!drm_debug_enabled(category))
> > -		return;
> > -
> > -	va_start(args, format);
> >  	vaf.fmt = format;
> >  	vaf.va = &args;
> >  
> > -	if (dev)
> > -		dev_printk(KERN_DEBUG, dev, "[" DRM_NAME ":%ps] %pV",
> > -			   __builtin_return_address(0), &vaf);
> > -	else
> > -		printk(KERN_DEBUG "[" DRM_NAME ":%ps] %pV",
> > -		       __builtin_return_address(0), &vaf);
> > +	if (drm_debug_enabled(category)) {
> > +		va_start(args, format);
> > +		if (dev)
> > +			dev_printk(KERN_DEBUG, dev, "[" DRM_NAME ":%ps] %pV",
> > +				   __builtin_return_address(0), &vaf);
> > +		else
> > +			printk(KERN_DEBUG "[" DRM_NAME ":%ps] %pV",
> > +			       __builtin_return_address(0), &vaf);
> > +		va_end(args);
> > +	}
> > +
> > +	if (!drm_trace_enabled(category))
> > +		return;
> > +
> > +	for_each_set_bit(bit, (unsigned long*)&category, sizeof(category) * 8) {
> > +		va_start(args, format);
> > +		drm_do_trace(dev, BIT(bit), &vaf);
> > +		va_end(args);
> > +	}
> >  
> > -	va_end(args);
> >  }
> >  EXPORT_SYMBOL(drm_dev_dbg);
> >  
> >  void drm_dbg(unsigned int category, const char *format, ...)
> >  {
> >  	struct va_format vaf;
> > +	unsigned int bit;
> >  	va_list args;
> >  
> > -	if (!drm_debug_enabled(category))
> > -		return;
> > -
> > -	va_start(args, format);
> >  	vaf.fmt = format;
> >  	vaf.va = &args;
> >  
> > -	printk(KERN_DEBUG "[" DRM_NAME ":%ps] %pV",
> > -	       __builtin_return_address(0), &vaf);
> > +	if (drm_debug_enabled(category)) {
> > +		va_start(args, format);
> > +		printk(KERN_DEBUG "[" DRM_NAME ":%ps] %pV",
> > +		       __builtin_return_address(0), &vaf);
> > +		va_end(args);
> > +	}
> >  
> > -	va_end(args);
> > +	if (!drm_trace_enabled(category))
> > +		return;
> > +
> > +	for_each_set_bit(bit, (unsigned long*)&category, sizeof(category) * 8) {
> > +		va_start(args, format);
> > +		drm_do_trace(NULL, BIT(bit), &vaf);
> > +		va_end(args);
> > +	}
> >  }
> >  EXPORT_SYMBOL(drm_dbg);
> >  
> > @@ -304,13 +404,16 @@ void drm_err(const char *format, ...)
> >  	struct va_format vaf;
> >  	va_list args;
> >  
> > -	va_start(args, format);
> >  	vaf.fmt = format;
> >  	vaf.va = &args;
> >  
> > +	va_start(args, format);
> >  	printk(KERN_ERR "[" DRM_NAME ":%ps] *ERROR* %pV",
> >  	       __builtin_return_address(0), &vaf);
> > +	va_end(args);
> >  
> > +	va_start(args, format);
> > +	trace_drm_err(NULL, &vaf);
> >  	va_end(args);
> >  }
> >  EXPORT_SYMBOL(drm_err);
> > diff --git a/include/trace/events/drm_print.h b/include/trace/events/drm_print.h
> > new file mode 100644
> > index 000000000000..cc93c69f3fe4
> > --- /dev/null
> > +++ b/include/trace/events/drm_print.h
> > @@ -0,0 +1,116 @@
> > +// SPDX-License-Identifier: MIT
> > +/*
> > + * Copyright (C) 2019 Google, Inc.
> > + *
> > + * Authors:
> > + * Sean Paul <seanpaul at chromium.org>
> > + */
> > +#undef TRACE_SYSTEM
> > +#define TRACE_SYSTEM drm_print
> > +
> > +#if !defined(_TRACE_DRM_PRINT_H) || defined(TRACE_HEADER_MULTI_READ)
> > +
> > +#include <linux/device.h>
> > +#include <linux/tracepoint.h>
> > +
> > +#define DRM_PRINT_MAX 256
> > +
> > +#define _TRACE_DRM_PRINT_H
> > +
> > +TRACE_EVENT(drm_log,
> > +	TP_PROTO(const char * level, const struct device *dev,
> > +		 struct va_format *vaf),
> > +	TP_ARGS(level, dev, vaf),
> > +	TP_STRUCT__entry(
> > +		__field(const char *,		level			)
> > +		__field(const struct device *,	dev			)
> > +		__dynamic_array(char,		msg,	DRM_PRINT_MAX	)
> > +		__field(int,			len			)
> > +	),
> > +	TP_fast_assign(
> > +		__entry->level = level;
> > +		__entry->dev = dev;
> > +		__entry->len = vsnprintf(__get_str(msg), DRM_PRINT_MAX,
> > +					 vaf->fmt, *vaf->va);
> > +		if (__entry->len > 0 &&
> > +		    __get_str(msg)[__entry->len - 1] == '\n')
> > +			__get_str(msg)[--__entry->len] = '\0';
> > +	),
> > +	TP_printk("%s %s %s: %s", __entry->level,
> > +		  __entry->dev ? dev_driver_string(__entry->dev) : "(NULL):",
> > +		  __entry->dev ? dev_name(__entry->dev) : "", __get_str(msg))
> > +);
> > +
> > +TRACE_EVENT(class_drm_category_log,
> > +	TP_PROTO(const struct device *dev, struct va_format *vaf),
> > +	TP_ARGS(dev, vaf),
> > +	TP_STRUCT__entry(
> > +		__field(const struct device *,	dev			)
> > +		__dynamic_array(char,		msg,	DRM_PRINT_MAX	)
> > +		__field(int,			len			)
> > +	),
> > +	TP_fast_assign(
> > +		__entry->dev = dev;
> > +		__entry->len = vsnprintf(__get_str(msg), DRM_PRINT_MAX,
> > +					 vaf->fmt, *vaf->va);
> > +		if (__entry->len > 0 &&
> > +		    __get_str(msg)[__entry->len - 1] == '\n')
> > +			__get_str(msg)[--__entry->len] = '\0';
> > +	),
> > +	TP_printk("%s %s%s%s",
> > +		  __entry->dev ? dev_driver_string(__entry->dev) : "",
> > +		  __entry->dev ? dev_name(__entry->dev) : "",
> > +		  __entry->dev ? ": " : "", __get_str(msg))
> > +);
> > +DEFINE_EVENT(class_drm_category_log, drm_err,
> > +	TP_PROTO(const struct device *dev, struct va_format *vaf),
> > +	TP_ARGS(dev, vaf)
> > +);
> > +DEFINE_EVENT(class_drm_category_log, drm_dbg_none,
> > +	TP_PROTO(const struct device *dev, struct va_format *vaf),
> > +	TP_ARGS(dev, vaf)
> > +);
> > +DEFINE_EVENT(class_drm_category_log, drm_dbg_core,
> > +	TP_PROTO(const struct device *dev, struct va_format *vaf),
> > +	TP_ARGS(dev, vaf)
> > +);
> > +DEFINE_EVENT(class_drm_category_log, drm_dbg_driver,
> > +	TP_PROTO(const struct device *dev, struct va_format *vaf),
> > +	TP_ARGS(dev, vaf)
> > +);
> > +DEFINE_EVENT(class_drm_category_log, drm_dbg_kms,
> > +	TP_PROTO(const struct device *dev, struct va_format *vaf),
> > +	TP_ARGS(dev, vaf)
> > +);
> > +DEFINE_EVENT(class_drm_category_log, drm_dbg_prime,
> > +	TP_PROTO(const struct device *dev, struct va_format *vaf),
> > +	TP_ARGS(dev, vaf)
> > +);
> > +DEFINE_EVENT(class_drm_category_log, drm_dbg_atomic,
> > +	TP_PROTO(const struct device *dev, struct va_format *vaf),
> > +	TP_ARGS(dev, vaf)
> > +);
> > +DEFINE_EVENT(class_drm_category_log, drm_dbg_vbl,
> > +	TP_PROTO(const struct device *dev, struct va_format *vaf),
> > +	TP_ARGS(dev, vaf)
> > +);
> > +DEFINE_EVENT(class_drm_category_log, drm_dbg_state,
> > +	TP_PROTO(const struct device *dev, struct va_format *vaf),
> > +	TP_ARGS(dev, vaf)
> > +);
> > +DEFINE_EVENT(class_drm_category_log, drm_dbg_lease,
> > +	TP_PROTO(const struct device *dev, struct va_format *vaf),
> > +	TP_ARGS(dev, vaf)
> > +);
> > +DEFINE_EVENT(class_drm_category_log, drm_dbg_dp,
> > +	TP_PROTO(const struct device *dev, struct va_format *vaf),
> > +	TP_ARGS(dev, vaf)
> > +);
> > +DEFINE_EVENT(class_drm_category_log, drm_dbg_unknown,
> > +	TP_PROTO(const struct device *dev, struct va_format *vaf),
> > +	TP_ARGS(dev, vaf)
> > +);
> > +#endif
> > +
> > +/* This part must be outside protection */
> > +#include <trace/define_trace.h>
> > 
> 
> -- 
> Thomas Zimmermann
> Graphics Driver Developer
> SUSE Software Solutions Germany GmbH
> Maxfeldstr. 5, 90409 Nürnberg, Germany
> (HRB 36809, AG Nürnberg)
> Geschäftsführer: Felix Imendörffer
> 




-- 
Sean Paul, Software Engineer, Google / Chromium OS


More information about the dri-devel mailing list