[PATCH v9 04/11] drm/xe/devcoredump: Add ASCII85 dump helper function

Thu Dec 12 21:04:23 UTC 2024

On 12/12/2024 12:52, Lucas De Marchi wrote:
> On Thu, Dec 12, 2024 at 11:14:23AM -0800, John Harrison wrote:
>> On 12/12/2024 10:45, Lucas De Marchi wrote:
>>> On Thu, Dec 12, 2024 at 05:41:41PM +0000, Jose Souza wrote:
>>>> On Wed, 2024-10-02 at 17:46 -0700, John.C.Harrison at Intel.com wrote:
>>>>> From: John Harrison <John.C.Harrison at Intel.com>
>>>>>
>>>>> There is a need to include the GuC log and other large binary objects
>>>>> in core dumps and via dmesg. So add a helper for dumping to a printer
>>>>> function via conversion to ASCII85 encoding.
>>>>>
>>>>> Another issue with dumping such a large buffer is that it can be 
>>>>> slow,
>>>>> especially if dumping to dmesg over a serial port. So add a yield to
>>>>> prevent the 'task has been stuck for 120s' kernel hang check feature
>>>>> from firing.
>>>>>
>>>>> v2: Add a prefix to the output string. Fix memory allocation bug.
>>>>> v3: Correct a string size calculation and clean up a define (review
>>>>> feedback from Julia F).
>>>>>
>>>>> Signed-off-by: John Harrison <John.C.Harrison at Intel.com>
>>>>> Reviewed-by: Julia Filipchuk <julia.filipchuk at intel.com>
>>>>> ---
>>>>>  drivers/gpu/drm/xe/xe_devcoredump.c | 87 
>>>>> +++++++++++++++++++++++++++++
>>>>>  drivers/gpu/drm/xe/xe_devcoredump.h |  6 ++
>>>>>  2 files changed, 93 insertions(+)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/xe/xe_devcoredump.c 
>>>>> b/drivers/gpu/drm/xe/xe_devcoredump.c
>>>>> index 2690f1d1cde4..0884c49942fe 100644
>>>>> --- a/drivers/gpu/drm/xe/xe_devcoredump.c
>>>>> +++ b/drivers/gpu/drm/xe/xe_devcoredump.c
>>>>> @@ -6,6 +6,7 @@
>>>>>  #include "xe_devcoredump.h"
>>>>>  #include "xe_devcoredump_types.h"
>>>>>
>>>>> +#include <linux/ascii85.h>
>>>>>  #include <linux/devcoredump.h>
>>>>>  #include <generated/utsrelease.h>
>>>>>
>>>>> @@ -315,3 +316,89 @@ int xe_devcoredump_init(struct xe_device *xe)
>>>>>  }
>>>>>
>>>>>  #endif
>>>>> +
>>>>> +/**
>>>>> + * xe_print_blob_ascii85 - print a BLOB to some useful location 
>>>>> in ASCII85
>>>>> + *
>>>>> + * The output is split to multiple lines because some print 
>>>>> targets, e.g. dmesg
>>>>> + * cannot handle arbitrarily long lines. Note also that printing 
>>>>> to dmesg in
>>>>> + * piece-meal fashion is not possible, each separate call to 
>>>>> drm_puts() has a
>>>>> + * line-feed automatically added! Therefore, the entire output 
>>>>> line must be
>>>>> + * constructed in a local buffer first, then printed in one 
>>>>> atomic output call.
>>>>> + *
>>>>> + * There is also a scheduler yield call to prevent the 'task has 
>>>>> been stuck for
>>>>> + * 120s' kernel hang check feature from firing when printing to a 
>>>>> slow target
>>>>> + * such as dmesg over a serial port.
>>>>> + *
>>>>> + * TODO: Add compression prior to the ASCII85 encoding to shrink 
>>>>> huge buffers down.
>>>>> + *
>>>>> + * @p: the printer object to output to
>>>>> + * @prefix: optional prefix to add to output string
>>>>> + * @blob: the Binary Large OBject to dump out
>>>>> + * @offset: offset in bytes to skip from the front of the BLOB, 
>>>>> must be a multiple of sizeof(u32)
>>>>> + * @size: the size in bytes of the BLOB, must be a multiple of 
>>>>> sizeof(u32)
>>>>> + */
>>>>> +void xe_print_blob_ascii85(struct drm_printer *p, const char 
>>>>> *prefix,
>>>>> +               const void *blob, size_t offset, size_t size)
>>>>> +{
>>>>> +    const u32 *blob32 = (const u32 *)blob;
>>>>> +    char buff[ASCII85_BUFSZ], *line_buff;
>>>>> +    size_t line_pos = 0;
>>>>> +
>>>>> +#define DMESG_MAX_LINE_LEN    800
>>>>> +#define MIN_SPACE        (ASCII85_BUFSZ + 2)        /* 85 + 
>>>>> "\n\0" */
>>>>> +
>>>>> +    if (size & 3)
>>>>> +        drm_printf(p, "Size not word aligned: %zu", size);
>>>>> +    if (offset & 3)
>>>>> +        drm_printf(p, "Offset not word aligned: %zu", size);
>>>>> +
>>>>> +    line_buff = kzalloc(DMESG_MAX_LINE_LEN, GFP_KERNEL);
>>>>> +    if (IS_ERR_OR_NULL(line_buff)) {
>>>>> +        drm_printf(p, "Failed to allocate line buffer: %pe", 
>>>>> line_buff);
>>>>> +        return;
>>>>> +    }
>>>>> +
>>>>> +    blob32 += offset / sizeof(*blob32);
>>>>> +    size /= sizeof(*blob32);
>>>>> +
>>>>> +    if (prefix) {
>>>>> +        strscpy(line_buff, prefix, DMESG_MAX_LINE_LEN - MIN_SPACE 
>>>>> - 2);
>>>>> +        line_pos = strlen(line_buff);
>>>>> +
>>>>> +        line_buff[line_pos++] = ':';
>>>>> +        line_buff[line_pos++] = ' ';
>>>>> +    }
>>>>> +
>>>>> +    while (size--) {
>>>>> +        u32 val = *(blob32++);
>>>>> +
>>>>> +        strscpy(line_buff + line_pos, ascii85_encode(val, buff),
>>>>> +            DMESG_MAX_LINE_LEN - line_pos);
>>>>> +        line_pos += strlen(line_buff + line_pos);
>>>>> +
>>>>> +        if ((line_pos + MIN_SPACE) >= DMESG_MAX_LINE_LEN) {
>>>>> +            line_buff[line_pos++] = '\n';
>>>>> +            line_buff[line_pos++] = 0;
>>>>
>>>> This breaks ascii85 parser that we had up to now.
>> It should not break the decoding of existing sections. This helper is 
>> only being used for the new GuC objects at present. As per the TODO, 
>> the intent is to use it for all blobs in the future. But that was 
>> deliberately left to a future update (which hasn't been posted yet) 
>> to avoid breaking the mesa tool.
>
> it broke nonetheless because then it fails to decode this line the way
> it was doing:  each line starts a new key.
It barfs on any line it doesn't understand? Seems like it could be made 
to be more robust in general.

>
>>
>>
>>>> And I think there is not safe way to parse it now, how would the 
>>>> parser know that the blob reach to end?
>> The GuC decoder tool simply ignores leading/trailing whitespace and 
>> keeps decoding until it finds a line with a new field - i.e. anything 
>> with a non-ASCII85 character such as ':' or '*'. It is totally 
>
> character set for ascii85 is [33, 117] and 122, which includes both ':'
> and '*'. Hopefully it's hard to hit a collision, but we shouldn't design
> it in a way that is possible.
Sorry, getting myself confused - it was a while ago when I was writing 
this. Yes, punctuation marks are part of the ASCII85 character set.

The GuC decoder looks for white space - a blank line or a line with a 
space in it. Any new field is guaranteed to be "name: data" so will have 
a space. And sections are delimited with blank lines, plus section 
headers are "**** name ****" so also guaranteed to have a space.

John.

>
>
>> reliable for me.
>>
>>>
>>> Just looked at this code to check if we also had a "Size" as I remember
>>> seeing it in previous related patches, but we don't. If
>>> we did then you could use that to calculate how much you still had to
>>> read, ignoring the \n. With the current scheme I think one
>>> way would be to continue the previous key when the line doesn't start
>>> with a new key. Awful, yes, and not future proof.
>> Any new field is guaranteed to have a colon in it and any new section 
>> header is guaranteed to have a star in it. Sounds pretty reliable to me.
>
> that's what I said by "continue the previous key when the line doesn't
> start with a new one".
>
>
>>
>> The other option would be to have an explicit prefix at the start of 
>> each continuation line. E.g. "A85:".
>
> let's not make it worse than it already is. And not future proof by
> given the encoding algorithm the meaning of "continuation line".
>
> Lucas De Marchi
>
>>
>> The problem with a size field is that ASCII85 encoding is data 
>> dependent - it has extremely basic run length encoding of strings of 
>> zeros. So you only know the size after the encoding is complete. 
>> Which means either the size field is in the dump after the encoded 
>> data (which is therefore useless) or you can't stream the encode and 
>> must encode to an in-memory buffer first, then print the size, then 
>> print your pre-encoded data.
>>
>>
>>>
>>> John, I explicitly said *we can't break the existent users*,
>>> pointed to the one in the mesa repo and confirmed with José it would be
>>> indeed a breakage. Please check again the past email thread at
>>> https://lore.kernel.org/all/3jexgpnh3br3gqi4ol4c3hx3tyhwevq5nqo5xssyie3xglqohz@e7mnj4dewupu/ 
>>>
>>>
>> That discussion was a question for Jose not a statement. As there was 
>> no response for several weeks, I assumed that it wasn't a problem 
>> after all.
>>
>>>
>>> Did you at least prepare the mesa parser for that additional \n?
>>>
>>> We shouldn't have merged the kernel patch as is with the excuse it 
>>> reads
>>> better in dmesg, when we also said we should not print that garbage to
>>> dmesg :(
>> Not following. Should not print what garbage in dmesg? As I keep 
>> saying, none of this goes to dmesg by default except in the case of 
>> catastrophic CT failure. And in that case, it is extremely useful and 
>> the only way to debug issues.
>>
>> John.
>>
>>>
>>> Lucas De Marchi
>>