[PATCH v9 04/11] drm/xe/devcoredump: Add ASCII85 dump helper function
John Harrison
john.c.harrison at intel.com
Thu Dec 12 21:04:23 UTC 2024
On 12/12/2024 12:52, Lucas De Marchi wrote:
> On Thu, Dec 12, 2024 at 11:14:23AM -0800, John Harrison wrote:
>> On 12/12/2024 10:45, Lucas De Marchi wrote:
>>> On Thu, Dec 12, 2024 at 05:41:41PM +0000, Jose Souza wrote:
>>>> On Wed, 2024-10-02 at 17:46 -0700, John.C.Harrison at Intel.com wrote:
>>>>> From: John Harrison <John.C.Harrison at Intel.com>
>>>>>
>>>>> There is a need to include the GuC log and other large binary objects
>>>>> in core dumps and via dmesg. So add a helper for dumping to a printer
>>>>> function via conversion to ASCII85 encoding.
>>>>>
>>>>> Another issue with dumping such a large buffer is that it can be
>>>>> slow,
>>>>> especially if dumping to dmesg over a serial port. So add a yield to
>>>>> prevent the 'task has been stuck for 120s' kernel hang check feature
>>>>> from firing.
>>>>>
>>>>> v2: Add a prefix to the output string. Fix memory allocation bug.
>>>>> v3: Correct a string size calculation and clean up a define (review
>>>>> feedback from Julia F).
>>>>>
>>>>> Signed-off-by: John Harrison <John.C.Harrison at Intel.com>
>>>>> Reviewed-by: Julia Filipchuk <julia.filipchuk at intel.com>
>>>>> ---
>>>>> drivers/gpu/drm/xe/xe_devcoredump.c | 87
>>>>> +++++++++++++++++++++++++++++
>>>>> drivers/gpu/drm/xe/xe_devcoredump.h | 6 ++
>>>>> 2 files changed, 93 insertions(+)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/xe/xe_devcoredump.c
>>>>> b/drivers/gpu/drm/xe/xe_devcoredump.c
>>>>> index 2690f1d1cde4..0884c49942fe 100644
>>>>> --- a/drivers/gpu/drm/xe/xe_devcoredump.c
>>>>> +++ b/drivers/gpu/drm/xe/xe_devcoredump.c
>>>>> @@ -6,6 +6,7 @@
>>>>> #include "xe_devcoredump.h"
>>>>> #include "xe_devcoredump_types.h"
>>>>>
>>>>> +#include <linux/ascii85.h>
>>>>> #include <linux/devcoredump.h>
>>>>> #include <generated/utsrelease.h>
>>>>>
>>>>> @@ -315,3 +316,89 @@ int xe_devcoredump_init(struct xe_device *xe)
>>>>> }
>>>>>
>>>>> #endif
>>>>> +
>>>>> +/**
>>>>> + * xe_print_blob_ascii85 - print a BLOB to some useful location
>>>>> in ASCII85
>>>>> + *
>>>>> + * The output is split to multiple lines because some print
>>>>> targets, e.g. dmesg
>>>>> + * cannot handle arbitrarily long lines. Note also that printing
>>>>> to dmesg in
>>>>> + * piece-meal fashion is not possible, each separate call to
>>>>> drm_puts() has a
>>>>> + * line-feed automatically added! Therefore, the entire output
>>>>> line must be
>>>>> + * constructed in a local buffer first, then printed in one
>>>>> atomic output call.
>>>>> + *
>>>>> + * There is also a scheduler yield call to prevent the 'task has
>>>>> been stuck for
>>>>> + * 120s' kernel hang check feature from firing when printing to a
>>>>> slow target
>>>>> + * such as dmesg over a serial port.
>>>>> + *
>>>>> + * TODO: Add compression prior to the ASCII85 encoding to shrink
>>>>> huge buffers down.
>>>>> + *
>>>>> + * @p: the printer object to output to
>>>>> + * @prefix: optional prefix to add to output string
>>>>> + * @blob: the Binary Large OBject to dump out
>>>>> + * @offset: offset in bytes to skip from the front of the BLOB,
>>>>> must be a multiple of sizeof(u32)
>>>>> + * @size: the size in bytes of the BLOB, must be a multiple of
>>>>> sizeof(u32)
>>>>> + */
>>>>> +void xe_print_blob_ascii85(struct drm_printer *p, const char
>>>>> *prefix,
>>>>> + const void *blob, size_t offset, size_t size)
>>>>> +{
>>>>> + const u32 *blob32 = (const u32 *)blob;
>>>>> + char buff[ASCII85_BUFSZ], *line_buff;
>>>>> + size_t line_pos = 0;
>>>>> +
>>>>> +#define DMESG_MAX_LINE_LEN 800
>>>>> +#define MIN_SPACE (ASCII85_BUFSZ + 2) /* 85 +
>>>>> "\n\0" */
>>>>> +
>>>>> + if (size & 3)
>>>>> + drm_printf(p, "Size not word aligned: %zu", size);
>>>>> + if (offset & 3)
>>>>> + drm_printf(p, "Offset not word aligned: %zu", size);
>>>>> +
>>>>> + line_buff = kzalloc(DMESG_MAX_LINE_LEN, GFP_KERNEL);
>>>>> + if (IS_ERR_OR_NULL(line_buff)) {
>>>>> + drm_printf(p, "Failed to allocate line buffer: %pe",
>>>>> line_buff);
>>>>> + return;
>>>>> + }
>>>>> +
>>>>> + blob32 += offset / sizeof(*blob32);
>>>>> + size /= sizeof(*blob32);
>>>>> +
>>>>> + if (prefix) {
>>>>> + strscpy(line_buff, prefix, DMESG_MAX_LINE_LEN - MIN_SPACE
>>>>> - 2);
>>>>> + line_pos = strlen(line_buff);
>>>>> +
>>>>> + line_buff[line_pos++] = ':';
>>>>> + line_buff[line_pos++] = ' ';
>>>>> + }
>>>>> +
>>>>> + while (size--) {
>>>>> + u32 val = *(blob32++);
>>>>> +
>>>>> + strscpy(line_buff + line_pos, ascii85_encode(val, buff),
>>>>> + DMESG_MAX_LINE_LEN - line_pos);
>>>>> + line_pos += strlen(line_buff + line_pos);
>>>>> +
>>>>> + if ((line_pos + MIN_SPACE) >= DMESG_MAX_LINE_LEN) {
>>>>> + line_buff[line_pos++] = '\n';
>>>>> + line_buff[line_pos++] = 0;
>>>>
>>>> This breaks ascii85 parser that we had up to now.
>> It should not break the decoding of existing sections. This helper is
>> only being used for the new GuC objects at present. As per the TODO,
>> the intent is to use it for all blobs in the future. But that was
>> deliberately left to a future update (which hasn't been posted yet)
>> to avoid breaking the mesa tool.
>
> it broke nonetheless because then it fails to decode this line the way
> it was doing: each line starts a new key.
It barfs on any line it doesn't understand? Seems like it could be made
to be more robust in general.
>
>>
>>
>>>> And I think there is not safe way to parse it now, how would the
>>>> parser know that the blob reach to end?
>> The GuC decoder tool simply ignores leading/trailing whitespace and
>> keeps decoding until it finds a line with a new field - i.e. anything
>> with a non-ASCII85 character such as ':' or '*'. It is totally
>
> character set for ascii85 is [33, 117] and 122, which includes both ':'
> and '*'. Hopefully it's hard to hit a collision, but we shouldn't design
> it in a way that is possible.
Sorry, getting myself confused - it was a while ago when I was writing
this. Yes, punctuation marks are part of the ASCII85 character set.
The GuC decoder looks for white space - a blank line or a line with a
space in it. Any new field is guaranteed to be "name: data" so will have
a space. And sections are delimited with blank lines, plus section
headers are "**** name ****" so also guaranteed to have a space.
John.
>
>
>> reliable for me.
>>
>>>
>>> Just looked at this code to check if we also had a "Size" as I remember
>>> seeing it in previous related patches, but we don't. If
>>> we did then you could use that to calculate how much you still had to
>>> read, ignoring the \n. With the current scheme I think one
>>> way would be to continue the previous key when the line doesn't start
>>> with a new key. Awful, yes, and not future proof.
>> Any new field is guaranteed to have a colon in it and any new section
>> header is guaranteed to have a star in it. Sounds pretty reliable to me.
>
> that's what I said by "continue the previous key when the line doesn't
> start with a new one".
>
>
>>
>> The other option would be to have an explicit prefix at the start of
>> each continuation line. E.g. "A85:".
>
> let's not make it worse than it already is. And not future proof by
> given the encoding algorithm the meaning of "continuation line".
>
> Lucas De Marchi
>
>>
>> The problem with a size field is that ASCII85 encoding is data
>> dependent - it has extremely basic run length encoding of strings of
>> zeros. So you only know the size after the encoding is complete.
>> Which means either the size field is in the dump after the encoded
>> data (which is therefore useless) or you can't stream the encode and
>> must encode to an in-memory buffer first, then print the size, then
>> print your pre-encoded data.
>>
>>
>>>
>>> John, I explicitly said *we can't break the existent users*,
>>> pointed to the one in the mesa repo and confirmed with José it would be
>>> indeed a breakage. Please check again the past email thread at
>>> https://lore.kernel.org/all/3jexgpnh3br3gqi4ol4c3hx3tyhwevq5nqo5xssyie3xglqohz@e7mnj4dewupu/
>>>
>>>
>> That discussion was a question for Jose not a statement. As there was
>> no response for several weeks, I assumed that it wasn't a problem
>> after all.
>>
>>>
>>> Did you at least prepare the mesa parser for that additional \n?
>>>
>>> We shouldn't have merged the kernel patch as is with the excuse it
>>> reads
>>> better in dmesg, when we also said we should not print that garbage to
>>> dmesg :(
>> Not following. Should not print what garbage in dmesg? As I keep
>> saying, none of this goes to dmesg by default except in the case of
>> catastrophic CT failure. And in that case, it is extremely useful and
>> the only way to debug issues.
>>
>> John.
>>
>>>
>>> Lucas De Marchi
>>
More information about the Intel-xe
mailing list