[PATCH v9 04/11] drm/xe/devcoredump: Add ASCII85 dump helper function

Thu Dec 12 20:52:45 UTC 2024

On Thu, Dec 12, 2024 at 11:14:23AM -0800, John Harrison wrote:
>On 12/12/2024 10:45, Lucas De Marchi wrote:
>>On Thu, Dec 12, 2024 at 05:41:41PM +0000, Jose Souza wrote:
>>>On Wed, 2024-10-02 at 17:46 -0700, John.C.Harrison at Intel.com wrote:
>>>>From: John Harrison <John.C.Harrison at Intel.com>
>>>>
>>>>There is a need to include the GuC log and other large binary objects
>>>>in core dumps and via dmesg. So add a helper for dumping to a printer
>>>>function via conversion to ASCII85 encoding.
>>>>
>>>>Another issue with dumping such a large buffer is that it can be slow,
>>>>especially if dumping to dmesg over a serial port. So add a yield to
>>>>prevent the 'task has been stuck for 120s' kernel hang check feature
>>>>from firing.
>>>>
>>>>v2: Add a prefix to the output string. Fix memory allocation bug.
>>>>v3: Correct a string size calculation and clean up a define (review
>>>>feedback from Julia F).
>>>>
>>>>Signed-off-by: John Harrison <John.C.Harrison at Intel.com>
>>>>Reviewed-by: Julia Filipchuk <julia.filipchuk at intel.com>
>>>>---
>>>> drivers/gpu/drm/xe/xe_devcoredump.c | 87 +++++++++++++++++++++++++++++
>>>> drivers/gpu/drm/xe/xe_devcoredump.h |  6 ++
>>>> 2 files changed, 93 insertions(+)
>>>>
>>>>diff --git a/drivers/gpu/drm/xe/xe_devcoredump.c 
>>>>b/drivers/gpu/drm/xe/xe_devcoredump.c
>>>>index 2690f1d1cde4..0884c49942fe 100644
>>>>--- a/drivers/gpu/drm/xe/xe_devcoredump.c
>>>>+++ b/drivers/gpu/drm/xe/xe_devcoredump.c
>>>>@@ -6,6 +6,7 @@
>>>> #include "xe_devcoredump.h"
>>>> #include "xe_devcoredump_types.h"
>>>>
>>>>+#include <linux/ascii85.h>
>>>> #include <linux/devcoredump.h>
>>>> #include <generated/utsrelease.h>
>>>>
>>>>@@ -315,3 +316,89 @@ int xe_devcoredump_init(struct xe_device *xe)
>>>> }
>>>>
>>>> #endif
>>>>+
>>>>+/**
>>>>+ * xe_print_blob_ascii85 - print a BLOB to some useful location 
>>>>in ASCII85
>>>>+ *
>>>>+ * The output is split to multiple lines because some print 
>>>>targets, e.g. dmesg
>>>>+ * cannot handle arbitrarily long lines. Note also that 
>>>>printing to dmesg in
>>>>+ * piece-meal fashion is not possible, each separate call to 
>>>>drm_puts() has a
>>>>+ * line-feed automatically added! Therefore, the entire output 
>>>>line must be
>>>>+ * constructed in a local buffer first, then printed in one 
>>>>atomic output call.
>>>>+ *
>>>>+ * There is also a scheduler yield call to prevent the 'task 
>>>>has been stuck for
>>>>+ * 120s' kernel hang check feature from firing when printing to 
>>>>a slow target
>>>>+ * such as dmesg over a serial port.
>>>>+ *
>>>>+ * TODO: Add compression prior to the ASCII85 encoding to 
>>>>shrink huge buffers down.
>>>>+ *
>>>>+ * @p: the printer object to output to
>>>>+ * @prefix: optional prefix to add to output string
>>>>+ * @blob: the Binary Large OBject to dump out
>>>>+ * @offset: offset in bytes to skip from the front of the BLOB, 
>>>>must be a multiple of sizeof(u32)
>>>>+ * @size: the size in bytes of the BLOB, must be a multiple of 
>>>>sizeof(u32)
>>>>+ */
>>>>+void xe_print_blob_ascii85(struct drm_printer *p, const char *prefix,
>>>>+               const void *blob, size_t offset, size_t size)
>>>>+{
>>>>+    const u32 *blob32 = (const u32 *)blob;
>>>>+    char buff[ASCII85_BUFSZ], *line_buff;
>>>>+    size_t line_pos = 0;
>>>>+
>>>>+#define DMESG_MAX_LINE_LEN    800
>>>>+#define MIN_SPACE        (ASCII85_BUFSZ + 2)        /* 85 + "\n\0" */
>>>>+
>>>>+    if (size & 3)
>>>>+        drm_printf(p, "Size not word aligned: %zu", size);
>>>>+    if (offset & 3)
>>>>+        drm_printf(p, "Offset not word aligned: %zu", size);
>>>>+
>>>>+    line_buff = kzalloc(DMESG_MAX_LINE_LEN, GFP_KERNEL);
>>>>+    if (IS_ERR_OR_NULL(line_buff)) {
>>>>+        drm_printf(p, "Failed to allocate line buffer: %pe", 
>>>>line_buff);
>>>>+        return;
>>>>+    }
>>>>+
>>>>+    blob32 += offset / sizeof(*blob32);
>>>>+    size /= sizeof(*blob32);
>>>>+
>>>>+    if (prefix) {
>>>>+        strscpy(line_buff, prefix, DMESG_MAX_LINE_LEN - 
>>>>MIN_SPACE - 2);
>>>>+        line_pos = strlen(line_buff);
>>>>+
>>>>+        line_buff[line_pos++] = ':';
>>>>+        line_buff[line_pos++] = ' ';
>>>>+    }
>>>>+
>>>>+    while (size--) {
>>>>+        u32 val = *(blob32++);
>>>>+
>>>>+        strscpy(line_buff + line_pos, ascii85_encode(val, buff),
>>>>+            DMESG_MAX_LINE_LEN - line_pos);
>>>>+        line_pos += strlen(line_buff + line_pos);
>>>>+
>>>>+        if ((line_pos + MIN_SPACE) >= DMESG_MAX_LINE_LEN) {
>>>>+            line_buff[line_pos++] = '\n';
>>>>+            line_buff[line_pos++] = 0;
>>>
>>>This breaks ascii85 parser that we had up to now.
>It should not break the decoding of existing sections. This helper is 
>only being used for the new GuC objects at present. As per the TODO, 
>the intent is to use it for all blobs in the future. But that was 
>deliberately left to a future update (which hasn't been posted yet) to 
>avoid breaking the mesa tool.

it broke nonetheless because then it fails to decode this line the way
it was doing:  each line starts a new key.

>
>
>>>And I think there is not safe way to parse it now, how would the 
>>>parser know that the blob reach to end?
>The GuC decoder tool simply ignores leading/trailing whitespace and 
>keeps decoding until it finds a line with a new field - i.e. anything 
>with a non-ASCII85 character such as ':' or '*'. It is totally 

character set for ascii85 is [33, 117] and 122, which includes both ':'
and '*'. Hopefully it's hard to hit a collision, but we shouldn't design
it in a way that is possible.

>reliable for me.
>
>>
>>Just looked at this code to check if we also had a "Size" as I remember
>>seeing it in previous related patches, but we don't. If
>>we did then you could use that to calculate how much you still had to
>>read, ignoring the \n. With the current scheme I think one
>>way would be to continue the previous key when the line doesn't start
>>with a new key. Awful, yes, and not future proof.
>Any new field is guaranteed to have a colon in it and any new section 
>header is guaranteed to have a star in it. Sounds pretty reliable to 
>me.

that's what I said by "continue the previous key when the line doesn't
start with a new one".

>
>The other option would be to have an explicit prefix at the start of 
>each continuation line. E.g. "A85:".

let's not make it worse than it already is. And not future proof by
given the encoding algorithm the meaning of "continuation line".

Lucas De Marchi

>
>The problem with a size field is that ASCII85 encoding is data 
>dependent - it has extremely basic run length encoding of strings of 
>zeros. So you only know the size after the encoding is complete. Which 
>means either the size field is in the dump after the encoded data 
>(which is therefore useless) or you can't stream the encode and must 
>encode to an in-memory buffer first, then print the size, then print 
>your pre-encoded data.
>
>
>>
>>John, I explicitly said *we can't break the existent users*,
>>pointed to the one in the mesa repo and confirmed with José it would be
>>indeed a breakage. Please check again the past email thread at
>>https://lore.kernel.org/all/3jexgpnh3br3gqi4ol4c3hx3tyhwevq5nqo5xssyie3xglqohz@e7mnj4dewupu/
>>
>That discussion was a question for Jose not a statement. As there was 
>no response for several weeks, I assumed that it wasn't a problem 
>after all.
>
>>
>>Did you at least prepare the mesa parser for that additional \n?
>>
>>We shouldn't have merged the kernel patch as is with the excuse it reads
>>better in dmesg, when we also said we should not print that garbage to
>>dmesg :(
>Not following. Should not print what garbage in dmesg? As I keep 
>saying, none of this goes to dmesg by default except in the case of 
>catastrophic CT failure. And in that case, it is extremely useful and 
>the only way to debug issues.
>
>John.
>
>>
>>Lucas De Marchi
>