[PATCH] drm: Funnel drm logs to tracepoints

Thomas Zimmermann tzimmermann at suse.de
Wed Oct 16 13:23:45 UTC 2019


Am 16.10.19 um 15:05 schrieb Pekka Paalanen:
> On Wed, 16 Oct 2019 00:35:39 +0200
> Daniel Vetter <daniel.vetter at ffwll.ch> wrote:
>> Yeah I don't think tuning the spam level will ever work. What we need
>> is some external input (most likely from the user clicking the "my
>> external screen doesn't work" button, or maybe the compositor
>> realizing something that should work didn't, or some other thing that
>> indicates trouble), and then retroactively capture all
>> debug/informational message leading up to doom.
>> But without that external "houston we have a problem" input all the
>> debug spam is really just spam and unwanted. btw even if we don't spam
>> dmesg if we enable too much we might have simply trouble with all the
>> printk formatting work we do for nothing. So maybe we need something
>> like trace_printk which iirc delays the formatting until the stuff
>> actually gets read from the log buffer. Plus trace_printk might make
>> it clear enough that it's not stable uapi ... so maybe we do want
>> trace_printk in the end?
>> Just not really looking forward to reimplementing half the tracing
>> infrastructure just for this ...
> Hi,
> a thought about the UAPI:
> Debugfs is not good because it's not supposed to be touched or even
> present in production, right?

I'm running Tumbleweed where debugfs is mounted by default for root. I
could live having the user to mount debugfs to get the file's content.

> specifically be available in production. So a new file in some fs
> somewhere it should be, and userspace in production can read it at will
> to attach to a bug report.
> Those semantics, "only use this content for attaching into a bug
> report" should be made very clear in the UAPI.

Has this ever worked? As soon as a userspace program starts depending on
the content of this file, it becomes kabi. From the incidents I know,
Linus has always been quite strict about this. Even for broken interfaces.

> I believe it has to be a ring buffer that is being continuously written
> also during normal operations, so that we don't have to ask end users
> to reproduce the issue again just to get some logs. Maybe the issue
> happens once in a fortnight. The information must be extractable after
> the fact, without before-hand preparations.


Best regards

> Thanks,
> pq

