[PATCH v9 04/11] drm/xe/devcoredump: Add ASCII85 dump helper function
Lucas De Marchi
lucas.demarchi at intel.com
Thu Dec 12 20:52:45 UTC 2024
On Thu, Dec 12, 2024 at 11:14:23AM -0800, John Harrison wrote:
>On 12/12/2024 10:45, Lucas De Marchi wrote:
>>On Thu, Dec 12, 2024 at 05:41:41PM +0000, Jose Souza wrote:
>>>On Wed, 2024-10-02 at 17:46 -0700, John.C.Harrison at Intel.com wrote:
>>>>From: John Harrison <John.C.Harrison at Intel.com>
>>>>
>>>>There is a need to include the GuC log and other large binary objects
>>>>in core dumps and via dmesg. So add a helper for dumping to a printer
>>>>function via conversion to ASCII85 encoding.
>>>>
>>>>Another issue with dumping such a large buffer is that it can be slow,
>>>>especially if dumping to dmesg over a serial port. So add a yield to
>>>>prevent the 'task has been stuck for 120s' kernel hang check feature
>>>>from firing.
>>>>
>>>>v2: Add a prefix to the output string. Fix memory allocation bug.
>>>>v3: Correct a string size calculation and clean up a define (review
>>>>feedback from Julia F).
>>>>
>>>>Signed-off-by: John Harrison <John.C.Harrison at Intel.com>
>>>>Reviewed-by: Julia Filipchuk <julia.filipchuk at intel.com>
>>>>---
>>>> drivers/gpu/drm/xe/xe_devcoredump.c | 87 +++++++++++++++++++++++++++++
>>>> drivers/gpu/drm/xe/xe_devcoredump.h | 6 ++
>>>> 2 files changed, 93 insertions(+)
>>>>
>>>>diff --git a/drivers/gpu/drm/xe/xe_devcoredump.c
>>>>b/drivers/gpu/drm/xe/xe_devcoredump.c
>>>>index 2690f1d1cde4..0884c49942fe 100644
>>>>--- a/drivers/gpu/drm/xe/xe_devcoredump.c
>>>>+++ b/drivers/gpu/drm/xe/xe_devcoredump.c
>>>>@@ -6,6 +6,7 @@
>>>> #include "xe_devcoredump.h"
>>>> #include "xe_devcoredump_types.h"
>>>>
>>>>+#include <linux/ascii85.h>
>>>> #include <linux/devcoredump.h>
>>>> #include <generated/utsrelease.h>
>>>>
>>>>@@ -315,3 +316,89 @@ int xe_devcoredump_init(struct xe_device *xe)
>>>> }
>>>>
>>>> #endif
>>>>+
>>>>+/**
>>>>+ * xe_print_blob_ascii85 - print a BLOB to some useful location
>>>>in ASCII85
>>>>+ *
>>>>+ * The output is split to multiple lines because some print
>>>>targets, e.g. dmesg
>>>>+ * cannot handle arbitrarily long lines. Note also that
>>>>printing to dmesg in
>>>>+ * piece-meal fashion is not possible, each separate call to
>>>>drm_puts() has a
>>>>+ * line-feed automatically added! Therefore, the entire output
>>>>line must be
>>>>+ * constructed in a local buffer first, then printed in one
>>>>atomic output call.
>>>>+ *
>>>>+ * There is also a scheduler yield call to prevent the 'task
>>>>has been stuck for
>>>>+ * 120s' kernel hang check feature from firing when printing to
>>>>a slow target
>>>>+ * such as dmesg over a serial port.
>>>>+ *
>>>>+ * TODO: Add compression prior to the ASCII85 encoding to
>>>>shrink huge buffers down.
>>>>+ *
>>>>+ * @p: the printer object to output to
>>>>+ * @prefix: optional prefix to add to output string
>>>>+ * @blob: the Binary Large OBject to dump out
>>>>+ * @offset: offset in bytes to skip from the front of the BLOB,
>>>>must be a multiple of sizeof(u32)
>>>>+ * @size: the size in bytes of the BLOB, must be a multiple of
>>>>sizeof(u32)
>>>>+ */
>>>>+void xe_print_blob_ascii85(struct drm_printer *p, const char *prefix,
>>>>+ const void *blob, size_t offset, size_t size)
>>>>+{
>>>>+ const u32 *blob32 = (const u32 *)blob;
>>>>+ char buff[ASCII85_BUFSZ], *line_buff;
>>>>+ size_t line_pos = 0;
>>>>+
>>>>+#define DMESG_MAX_LINE_LEN 800
>>>>+#define MIN_SPACE (ASCII85_BUFSZ + 2) /* 85 + "\n\0" */
>>>>+
>>>>+ if (size & 3)
>>>>+ drm_printf(p, "Size not word aligned: %zu", size);
>>>>+ if (offset & 3)
>>>>+ drm_printf(p, "Offset not word aligned: %zu", size);
>>>>+
>>>>+ line_buff = kzalloc(DMESG_MAX_LINE_LEN, GFP_KERNEL);
>>>>+ if (IS_ERR_OR_NULL(line_buff)) {
>>>>+ drm_printf(p, "Failed to allocate line buffer: %pe",
>>>>line_buff);
>>>>+ return;
>>>>+ }
>>>>+
>>>>+ blob32 += offset / sizeof(*blob32);
>>>>+ size /= sizeof(*blob32);
>>>>+
>>>>+ if (prefix) {
>>>>+ strscpy(line_buff, prefix, DMESG_MAX_LINE_LEN -
>>>>MIN_SPACE - 2);
>>>>+ line_pos = strlen(line_buff);
>>>>+
>>>>+ line_buff[line_pos++] = ':';
>>>>+ line_buff[line_pos++] = ' ';
>>>>+ }
>>>>+
>>>>+ while (size--) {
>>>>+ u32 val = *(blob32++);
>>>>+
>>>>+ strscpy(line_buff + line_pos, ascii85_encode(val, buff),
>>>>+ DMESG_MAX_LINE_LEN - line_pos);
>>>>+ line_pos += strlen(line_buff + line_pos);
>>>>+
>>>>+ if ((line_pos + MIN_SPACE) >= DMESG_MAX_LINE_LEN) {
>>>>+ line_buff[line_pos++] = '\n';
>>>>+ line_buff[line_pos++] = 0;
>>>
>>>This breaks ascii85 parser that we had up to now.
>It should not break the decoding of existing sections. This helper is
>only being used for the new GuC objects at present. As per the TODO,
>the intent is to use it for all blobs in the future. But that was
>deliberately left to a future update (which hasn't been posted yet) to
>avoid breaking the mesa tool.
it broke nonetheless because then it fails to decode this line the way
it was doing: each line starts a new key.
>
>
>>>And I think there is not safe way to parse it now, how would the
>>>parser know that the blob reach to end?
>The GuC decoder tool simply ignores leading/trailing whitespace and
>keeps decoding until it finds a line with a new field - i.e. anything
>with a non-ASCII85 character such as ':' or '*'. It is totally
character set for ascii85 is [33, 117] and 122, which includes both ':'
and '*'. Hopefully it's hard to hit a collision, but we shouldn't design
it in a way that is possible.
>reliable for me.
>
>>
>>Just looked at this code to check if we also had a "Size" as I remember
>>seeing it in previous related patches, but we don't. If
>>we did then you could use that to calculate how much you still had to
>>read, ignoring the \n. With the current scheme I think one
>>way would be to continue the previous key when the line doesn't start
>>with a new key. Awful, yes, and not future proof.
>Any new field is guaranteed to have a colon in it and any new section
>header is guaranteed to have a star in it. Sounds pretty reliable to
>me.
that's what I said by "continue the previous key when the line doesn't
start with a new one".
>
>The other option would be to have an explicit prefix at the start of
>each continuation line. E.g. "A85:".
let's not make it worse than it already is. And not future proof by
given the encoding algorithm the meaning of "continuation line".
Lucas De Marchi
>
>The problem with a size field is that ASCII85 encoding is data
>dependent - it has extremely basic run length encoding of strings of
>zeros. So you only know the size after the encoding is complete. Which
>means either the size field is in the dump after the encoded data
>(which is therefore useless) or you can't stream the encode and must
>encode to an in-memory buffer first, then print the size, then print
>your pre-encoded data.
>
>
>>
>>John, I explicitly said *we can't break the existent users*,
>>pointed to the one in the mesa repo and confirmed with José it would be
>>indeed a breakage. Please check again the past email thread at
>>https://lore.kernel.org/all/3jexgpnh3br3gqi4ol4c3hx3tyhwevq5nqo5xssyie3xglqohz@e7mnj4dewupu/
>>
>That discussion was a question for Jose not a statement. As there was
>no response for several weeks, I assumed that it wasn't a problem
>after all.
>
>>
>>Did you at least prepare the mesa parser for that additional \n?
>>
>>We shouldn't have merged the kernel patch as is with the excuse it reads
>>better in dmesg, when we also said we should not print that garbage to
>>dmesg :(
>Not following. Should not print what garbage in dmesg? As I keep
>saying, none of this goes to dmesg by default except in the case of
>catastrophic CT failure. And in that case, it is extremely useful and
>the only way to debug issues.
>
>John.
>
>>
>>Lucas De Marchi
>
More information about the Intel-xe
mailing list