[PATCH] Revert "drm/xe/devcoredump: Add ASCII85 dump helper function"

Sat Dec 14 01:08:58 UTC 2024

On 12/13/2024 13:39, Lucas De Marchi wrote:
> On Fri, Dec 13, 2024 at 12:26:23PM -0800, John Harrison wrote:
>> On 12/13/2024 11:43, Souza, Jose wrote:
>>> On Fri, 2024-12-13 at 09:38 -0800, John Harrison wrote:
>>>> On 12/13/2024 09:25, Souza, Jose wrote:
>>>>> On Fri, 2024-12-13 at 08:56 -0800, John Harrison wrote:
>>>>>> On 12/13/2024 08:48, Lucas De Marchi wrote:
>>>>>>> On Fri, Dec 13, 2024 at 04:28:58PM +0000, Jose Souza wrote:
>>>>>>>> On Fri, 2024-12-13 at 09:50 -0600, Lucas De Marchi wrote:
>>>>>>>>> On Fri, Dec 13, 2024 at 03:24:59PM +0000, Jose Souza wrote:
>>>>>>>>>> On Fri, 2024-12-13 at 07:10 -0800, José Roberto de Souza wrote:
>>>>>>>>>>> On Fri, 2024-12-13 at 08:38 -0600, Lucas De Marchi wrote:
>>>>>>>>>>>> On Fri, Dec 13, 2024 at 09:12:52AM -0500, Rodrigo Vivi wrote:
>>>>>>>>>>>>> We do not break userspace.
>>>>>>>>>>> There is other patch that also breaks Mesa parser:
>>>>>>>>>>>
>>>>>>>>>>> drm/xe/devcoredump: Improve section headings and add tile info
>>>>>>>>>>>
>>>>>>>>>>>>> This reverts commit ec1455ce7e35a31289d2dbc1070b980538698921.
>>>>>>>>>>>> But we have users calling this function.... the revert is not
>>>>>>>>> so simple.
>>>>>>>>>>>> I think we need to revert the functionality rather than
>>>>>>>>> reverting all
>>>>>>>>>>>> the patches, otherwise it will cause a lot of headaches.
>>>>>>>>>>>>
>>>>>>>>>>>> I propose we go with:
>>>>>>>>>>>>
>>>>>>>>>>>> a) drop the \n that broke mesa and merge that with cc stable.
>>>>>>>>>>>>
>>>>>>>>>>>> b) move back the entry to the previous section that broke mesa
>>>>>>>>> and cc
>>>>>>>>>>>>       stable.
>>>>>>>>>>>>
>>>>>>>>>>>>       José, would it be ok to merge a patch in mesa and 
>>>>>>>>>>>> port that
>>>>>>>>>>>>       to mesa stable that simply looks at 2 possible sections?
>>>>>>>>> Or even
>>>>>>>>>>>>       drop the section checks... ?
>>>>>>>>>> But if Xe KMD is reverting the patch that changed the hwctx
>>>>>>>>> section why would Mesa need to also parse the new(future to be
>>>>>>>>> reverted) section?
>>>>>>>>>
>>>>>>>>> first is to undo the damage, with 0 changes in mesa. We do that
>>>>>>>>> first and
>>>>>>>>> *then* we agree on what's possible to do to accomodate the 2 
>>>>>>>>> parsers we
>>>>>>>>> have.
>>>>>>>>>
>>>>>>>>> If we can get something in mesa to work that is backward 
>>>>>>>>> compatible
>>>>>>>>> (i.e. the
>>>>>>>>> changed parser is able to parse both before and after the kernel
>>>>>>>>> change),
>>>>>>>>> then it could be considered to a mesa stable and the kernel side
>>>>>>>>> changed.
>>>>>>>> Okay, reasonable plan. But the ascii85 encoder with \n will not be
>>>>>>>> brought back right?
>>>>>>> maybe let's agree on how to possibly bring it back? I suggested 
>>>>>>> using a
>>>>>>> space as continuation line char. This way you can just check the 
>>>>>>> last
>>>>>>> char
>>>>>>> returned by getline() you are calling and see if you can go 
>>>>>>> ahead and
>>>>>>> proceed or if you still need to get more data. Neither space nor 
>>>>>>> newline
>>>>>>> are part of the ascii85 character set, so it's safe and you can 
>>>>>>> handle
>>>>>>> continuation in one place in your loop.
>>>>>>>
>>>>>>> if you are just ignoring any ascii85, then I believe it's even 
>>>>>>> simpler:
>>>>>>> you check sections and keys with a space since both keys and 
>>>>>>> section
>>>>>>> titles contain space, which is not part of the ascii85 char set.
>>>>>>>
>>>>>>> Lucas De Marchi
>>>>>> Yes, I would strongly prefer to use line wrapped ASCII85 data for 
>>>>>> all
>>>>>> blobs in the devcoredump. Including things like batch buffers and 
>>>>>> other
>>>>>> VM entries that the mesa tool is presumably wanting to decode.
>>>>>>
>>>>>> If adding a <space> character to the end of each line is an 
>>>>>> acceptable
>>>>>> fix then I have no problems with that. But not line wrapping at all
>>>>>> means having to carry that change as a non-upstream patch in 
>>>>>> either the
>>>>>> internal tree or in individual developer's local trees. Either 
>>>>>> that or
>>>>>> we just cannot debug a lot of hard to repro problems.
>>>>> Can't Xe KMD use the line wrapped version of ASCII85 when printing 
>>>>> to dmesg and keep the regular encoder when devcoredump file is read?
>>>> Apparently not. Even when not deliberately line wrapping the output, I
>>>> am still seeing it being wrapped when dumping very large buffers 
>>>> such as
>>>> the GuC log. It looks like something in a lower layer is also forcing
>>>> line wrapping of super long lines. So either we can't add full size 
>>>> GuC
>>>> logs to the devcoredump or we need to support line wrapped data in the
>>>> mesa tool.
>>> It is probably some implementation detail of 
>>> __drm_printfn_coredump(). That could be replaced to something like 
>>> i915 error dump write functions to not
>
> it doesn't look like it as there's nothing there looping through chunks.
> And particularly nothing that would really add a '\n', corrupting
> the output.
>
>>> have any limit at least when the target is a file descriptor, for 
>>> dmesg you are free to add any line wrapping.
>> Rather than re-writing the entire drm printer infrastructure, could 
>> we not just update the mesa decoder tool?
>
> I think it would be great to add line continuation there and document
> so it's more future proof and we don't break it again because of this.
>
>>
>> Note that the existing VM dumps are using the same drm_puts printer 
>
> would be good to know where that is coming from. I don't see anything in
> drm_puts() that would add newlines.... it looks more like how we are
> calling xe_print_blob_ascii85() in a loop, but that's strange since we
Doh! Yes. The GuC log splits the log data into 2MB chunks to avoid the 
kmalloc size limit (of 8MB IIRC). Which means it would either have to 
dump each chunk as a separately tagged devcoredump field/data pair or 
the final linefeed would have to be added manually by the caller and not 
automatically by the helper. Neither option is particular nice.

> are actually passing 2M chunks. What did you use to test?
I've been mostly using igt at xe_exec_reset@cat-error with a modified IGT 
to save out each coredump rather than splatting it. We don't actually 
have any devcoredump tests at the moment. Zhanjun has been writing one. 
Apparently Maarten Lankhorst had also started writing one but it never 
got anywhere?

It would be useful to also add the mesa decoder tool itself as test for 
CI too. Otherwise it may still be making assumptions that we do not 
realise and are not testing for in an IGT. E.g. the thing about section 
headers.

John.

>
> Lucas De Marchi
>
>> function. So if a very large user buffer was included in the dump, it 
>> seems like that would also break the mesa parser. And that is not 
>> something under the control of the KMD.
>>
>> John.
>>
>>>
>>>> John.
>>>>
>>>>>> John.
>>>>>>
>>