[PATCH v3 1/1] drm/xe/guc: Fix GuC log/ct output via debugfs

Wed Jan 8 22:11:45 UTC 2025

+Jose
+Rodrigo

On Wed, Jan 08, 2025 at 12:14:49PM -0800, John Harrison wrote:
>On 1/7/2025 13:10, Lucas De Marchi wrote:
>>On Tue, Jan 07, 2025 at 12:22:52PM -0800, Julia Filipchuk wrote:
>>>Change to disable asci85 GuC logging only when output to devcoredump
>>>(was temporarily disabled for all code paths).
>>>
>>>v2: Ignore only for devcoredump case (not dmesg output).
>>>v3: Rebase to resolve parent tag mismatch.
>>>
>>>Signed-off-by: Julia Filipchuk <julia.filipchuk at intel.com>
>>>---
>>>drivers/gpu/drm/xe/xe_devcoredump.c | 8 +++++---
>>>include/drm/drm_print.h             | 2 ++
>>>2 files changed, 7 insertions(+), 3 deletions(-)
>>>
>>>diff --git a/drivers/gpu/drm/xe/xe_devcoredump.c 
>>>b/drivers/gpu/drm/xe/xe_devcoredump.c
>>>index 6980304c8903..8e5d1f9866a7 100644
>>>--- a/drivers/gpu/drm/xe/xe_devcoredump.c
>>>+++ b/drivers/gpu/drm/xe/xe_devcoredump.c
>>>@@ -424,10 +424,12 @@ void xe_print_blob_ascii85(struct 
>>>drm_printer *p, const char *prefix,
>>>     * Splitting blobs across multiple lines is not compatible 
>>>with the mesa
>>>     * debug decoder tool. Note that even dropping the explicit 
>>>'\n' below
>>>     * doesn't help because the GuC log is so big some underlying 
>>>implementation
>>>-     * still splits the lines at 512K characters. So just bail 
>>>completely for
>>>-     * the moment.
>>>+     * still splits the lines at 512K characters.
>>
>>did we investigate where this is done and how we can overcome it? I
>Yes. And the comment could be updated as part of this patch to make it 
>clearer...
>
>
>     * Splitting blobs across multiple lines is not compatible with 
>the mesa
>     * debug decoder tool. Note that even dropping the explicit line 
>wrapping
>     * below doesn't help because the GuC log can be so big it needs 
>to be split
>     * into multiple 2MB chunks, each of which must be printed 
>individually and
>     * therefore will be a separate line.

so there's no "underlying implemenation that splits like at 512K chars"?
Nowe it's something else: a 2MB chunk because of what?

I don't really want to keep updating this instead of fixing it.

>
>The dump helper could be updated to never add any line feeds, not even 

when you say "dump helper", what exactly are you talking about?
xe_print_blob_ascii85(), __xe_devcoredump_read(), __drm_puts_coredump()
or what?

If it's about xe_print_blob_ascii85(), I think this would be the first step:
drop the \n you are adding. AFAICS calling the print function in chunks
of 800B would be fine, just don't add any \n. If that still adds \n
somewhere, we can check how to drop it or you then may have a pretty
solid argument to adapt the mesa tool with a newline
continuar char and send them the MR.  

>at the end. And then the burden would be on the callers to add line 
>feeds as appropriate. That seems extremely messy, though. And it will 
>break the dmesg output facility. That does need the line wrapping at 
>~800 characters which can't be done from outside the helper.
>
>>
>>understand having to split it into multiple calls, but not something
>>adding a \n. particularly for the functions dealing with seq_file and
>>devcoredump.
>>
>>>+     *
>>>+     * Only disable from devcoredump output.
>>>     */
>>>-    return;
>>>+    if (p->coredump)
>>
>>but we do want the guc log to be inside the devcoredump, so rather than
>>adding more workarounds, can we fix it ?
>The simplest fix is your suggestion of adding an extra white space 
>character at the end of the line in the helper function and updating 
>the mesa tool to support multi-line output. That seems like it should 
>be a trivial update to the mesa tool and it keeps the KMD side simple 
>and consistent across all potential output methods. But Jose seemed 
>absolutely dead set against any updates to the mesa tool at all :(.
>
>The more complex fix is Joonas' idea of using an official structured 
>output format of some sort - TOML, YAML, JSON or whatever. Something 

it actually doesn't fix this particular issue. It fixes others, but not
this one.

	"Guc Log": "<encoded data .........."

if you need to split lines, you still need something to do do it in
whatever format you choose. One json I remember working on in the past
that embeds the binary is the .apj format... example:
https://firmware.ardupilot.org/Copter/2024-12/2024-12-02-13:12/CubeBlack/arducopter.apj

	{
	    "board_id": 9,
	    "magic": "APJFWv1",
	    "description": "Firmware for a STM32F427xx board",
	    "image": "eNqUvAlcU1faMH5u7k1yCQGCICCgBoIaRW0EtVSxhg ...

here "image" is base64-encoded  afair. But in your case, just printing
the json to dmesg wouldn't work because of needing to split the line.
It wouldn't fix the kind of bugs we had in the past too:  in json it
would be the equivalent of adding a field to the wrong object and then
move it elsewhere. An application trying to read that object in the
previous place would be broken regardless.

toml is more for configuration and yaml isn't a good one for this case
as it further relies on indentation. I wouldn't oppose to use json, but
don't expect to have dmesg to magically work because of that.

So let's fix the issue at hand and let the talk about xe's coredump
format for a later time.

>that ideally supports binary data and compression natively and has a 
>helper library that can be included in the kernel source tree. That 

oh no, we'd need to define the schema - there are plenty of json libraries
in multiple languages for users to choose from  and that would be basically
then main benefit of using it.

Lucas De Marchi

>will then remove any and all confusion over interpretation of the file 
>forever more. But it would require updating all userland tools to use 
>the appropriate helper library.
>
>John.
>
>>
>>Lucas De Marchi
>