[PATCH v7 03/10] drm/xe/devcoredump: Add ASCII85 dump helper function

Tue Sep 10 19:43:44 UTC 2024

On Mon, Sep 09, 2024 at 06:31:41PM GMT, John Harrison wrote:
>On 9/6/2024 19:06, John Harrison wrote:
>>On 9/5/2024 20:04, Lucas De Marchi wrote:
>>>On Thu, Sep 05, 2024 at 07:01:33PM GMT, John Harrison wrote:
>>>>On 9/5/2024 18:54, Lucas De Marchi wrote:
>>>>>On Thu, Sep 05, 2024 at 01:50:58PM GMT, 
>>>>>John.C.Harrison at Intel.com wrote:
>>>>>>From: John Harrison <John.C.Harrison at Intel.com>
>>>>>>
>>>>>>There is a need to include the GuC log and other large binary objects
>>>>>>in core dumps and via dmesg. So add a helper for dumping to a printer
>>>>>>function via conversion to ASCII85 encoding.
>>>>>
>>>>>why are we not dumping the binary data directly to devcoredump?
>>>>As per earlier comments, there is a WiFi driver or some such 
>>>>that does exactly that. But all they are dumping is a binary 
>>>>blob.
>>>
>>>In your v5 I see you mentioned
>>>drivers/net/wireless/ath/ath10k/coredump.c, but that is a precedence for
>>>including it as is from the device rather converting it to ASCII85 or
>>>something else. It seems odd to do that type of conversion in kernel
>>>space when it could be perfectly done in userspace.
>>It really can't. An end user could maybe be expected to zip or tar a 
>>coredump file before attaching it to a bug report but they are 
>>certainly not going to try to ASCII85 encode random bits of it. 
>>Whereas, putting that in the kernel means it is just there. It is 
>>done. And it is pretty trivial - just call a helper function and it 
>>does everything for you. Also, I very much doubt you can spew raw 
>>binary data via dmesg. Even if the kernel would print it for you 
>>(which I doubt), the user tools like syslogd and dmesg itself are 
>>going to filter it to make it ASCII safe.
>>
>>The i915 error dumps have been ASCII85 encoded using the kernel's 
>>ASCII85 encoding helper function since forever. This patch is just a 
>>wrapper around the kernel's existing implementation in order to make 
>>it more compatible with printing to dmesg. This is not creating a 
>>new precedent. It already exists.
>>
>>>
>>>$ git grep ascii85.h
>>>drivers/gpu/drm/i915/i915_gpu_error.c:#include <linux/ascii85.h>
>>>drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c:#include <linux/ascii85.h>
>>>drivers/gpu/drm/msm/adreno/adreno_gpu.c:#include <linux/ascii85.h>
>>>drivers/gpu/drm/xe/xe_lrc.c:#include <linux/ascii85.h>
>>>drivers/gpu/drm/xe/xe_vm.c:#include <linux/ascii85.h>
>>And the list of drivers which dump raw binary data in a coredump 
>>file is... ath10k. ASCII85 wins 3 to 1.
>>
>>
>>>
>>>>
>>>>We want the devcoredump file to still be human readable. That 
>>>>won't be the case if you stuff binary data in the middle of it. 
>>>>Most obvious problem - the zeros in the data will terminate your 
>>>>text file at that point. Potentially bigger problem for end 
>>>>users - random fake ANSI codes will destroy your terminal window 
>>>>if you try to cat the file to read it.
>>>
>>>Users don't get a coredump and cat it to the terminal.
>>>=(lk%A8`T7AKYH#FD,6++EqOABHUhsG%5H2ARoq#E$/V$Bl7Q+@<5pmBe<q;Bk;0mCj at .3DIal2FD5Q-+E_RBART+X@VfTuGA2/4Dfp.E at 3BN0DfB9.+E1b0F(KAV+:8
>>>
>>>
>>>Lucas De Marchi
>>They might. Either intentionally or accidentally. I've certainly 
>>done it myself. And people will certainly want to look at it in any 
>>random choice of text editor, pager, etc. 'cos you know, it is meant 
>>to be read by humans. If it is full of binary data then that becomes 
>>even more difficult than simply being full of ASCII gibberish. No 
>>matter what you are doing, the ASCII version is safer and easier to 
>>look at the rest of the file around it.
>>
>>I don't understand why you are so desperate to have raw binary data 
>>in the middle of a text file. The disadvantages are multiple but the 
>>only advantage is a slightly smaller file. And the true route to 
>>smaller files is to add compression like we have in i915.
>>
>>John.
>>
>PS: Also meant to add that one of the important uses cases for dumping 
>logs to dmesg is for the really hard to repro bugs that show up in CI 
>extremely rarely. We get the driver to dump an error capture to dmesg 
>and pull that out from the CI logs. Even if you could get binary data 
>through dmesg, pretty sure the CI tools would also not be happy with 
>it. Anything non-printable will get munged for sure when turning it 
>into a web page.

I think that's the main source of confusion on what we are discussing.
I was not talking about dmesg at all. I'm only complaining about feeding
ascii85-encoded data into a *devcoredump* when apparently there isn't a
good reason to do so. I'd rather copy the binary data to the
devcoredump.

For dmesg, there's a reason to encode it as you pointed out... but
no users shouldn't actually see it - we should be getting all of those
cases in CI. For the escape scenarios, yeah... better having it
ascii85-encoded.

What you are adding to devcoredump also doesn't even seem to be an
ascii85 representation, but a multiple lines that should be concatenated
to form the ascii85 representation. For dmesg it makes sense. Not for
devcoredump.  We should also probabaly need a length field (correctly
accounting for the additional characters for each line) so we don't
have an implicit dependency on what's the next field to know how much to
parse.

Lucas De Marchi

>
>John.
>