[PATCH 1/2] drm/xe/devcoredump: Move exec queue snapshot to Contexts section

Lucas De Marchi lucas.demarchi at intel.com
Thu Jan 23 16:45:16 UTC 2025


On Thu, Jan 23, 2025 at 08:52:13AM -0600, Jose Souza wrote:
>On Thu, 2025-01-23 at 08:30 -0600, Lucas De Marchi wrote:
>> On Thu, Jan 23, 2025 at 08:14:11AM -0600, Jose Souza wrote:
>> > On Wed, 2025-01-22 at 21:11 -0800, Lucas De Marchi wrote:
>> > > Having the exec queue snapshot inside a "GuC CT" section was always
>> > > wrong.  Commit c28fd6c358db ("drm/xe/devcoredump: Improve section
>> > > headings and add tile info") tried to fix that bug, but with that also
>> > > broke the mesa tool that parses the devcoredump, hence it was reverted
>> > > in commit 70fb86a85dc9 ("drm/xe: Revert some changes that break a mesa
>> > > debug tool").
>> > >
>> > > With the mesa tool also fixed, this can propagate as a fix on both
>> > > kernel and userspace side to avoid unnecessary headache for a debug
>> > > feature.
>> >
>> > This will break older versions of the Mesa parser. Is this really necessary?
>>
>> See cover letter with the mesa MR that would fix the tool to follow the
>> kernel fix and work with both newer and older format. Linking it here
>> anyway: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33177
>
>Still someone running the older version of the parser with a new Xe KMD would not be able to parse it.
>I understand that we can break it but is this really worthy? not in my opinion.

because for the debug nature of this file, it's hard if we always keep
the cruft around. In 5 years developers implementing new decoders will
have to get data from random places because we will notice things are
wrongly placed.

isn't it easier to do do it early so we don't increase the exposure
and just say "kernel screwed that up and fixed it", then propagate the
change in mesa in a stable release, just like we are doing in the
kernel?

>
>>
>> It's a fix so simple that IMO it's better than carrying the cruft ad
>> infinitum on all the tools that may possibly parse the devcoredump.
>>
>>
>> > Is it worth breaking the tool? In my opinion, it is not.
>> >
>> > Also, do we need to discuss this now? Wouldn't it be better to focus on bringing the GuC log in first?
>>
>> That's what the second patch does. We need to discuss both now and
>> decide, otherwise we can't re-enable it and have either the guc log
>> parser or mesa's aubinator_error_decode_xe broken.
>
>I can't understand why it needs both, could you explain further?

we are already discussing it, why not?  Also as I said there's the guc
log parser, another tool, that is already expecting it in the other
place. So if we are going to re-enable the guc log, it's the best
opportunity to fix this, otherwise we will probably never do it and keep
accumulating.

Lucas De Marchi

>
>>
>> Lucas De Marchi
>>
>> >
>> > >
>> > > Cc: John Harrison <John.C.Harrison at Intel.com>
>> > > Cc: Julia Filipchuk <julia.filipchuk at intel.com>
>> > > Cc: José Roberto de Souza <jose.souza at intel.com>
>> > > Cc: stable at vger.kernel.org
>> > > Fixes: 70fb86a85dc9 ("drm/xe: Revert some changes that break a mesa debug tool")
>> > > Signed-off-by: Lucas De Marchi <lucas.demarchi at intel.com>
>> > > ---
>> > >  drivers/gpu/drm/xe/xe_devcoredump.c | 6 +-----
>> > >  1 file changed, 1 insertion(+), 5 deletions(-)
>> > >
>> > > diff --git a/drivers/gpu/drm/xe/xe_devcoredump.c b/drivers/gpu/drm/xe/xe_devcoredump.c
>> > > index 81dc7795c0651..a7946a76777e7 100644
>> > > --- a/drivers/gpu/drm/xe/xe_devcoredump.c
>> > > +++ b/drivers/gpu/drm/xe/xe_devcoredump.c
>> > > @@ -119,11 +119,7 @@ static ssize_t __xe_devcoredump_read(char *buffer, size_t count,
>> > >  	drm_puts(&p, "\n**** GuC CT ****\n");
>> > >  	xe_guc_ct_snapshot_print(ss->guc.ct, &p);
>> > >
>> > > -	/*
>> > > -	 * Don't add a new section header here because the mesa debug decoder
>> > > -	 * tool expects the context information to be in the 'GuC CT' section.
>> > > -	 */
>> > > -	/* drm_puts(&p, "\n**** Contexts ****\n"); */
>> > > +	drm_puts(&p, "\n**** Contexts ****\n");
>> > >  	xe_guc_exec_queue_snapshot_print(ss->ge, &p);
>> > >
>> > >  	drm_puts(&p, "\n**** Job ****\n");
>> >
>


More information about the Intel-xe mailing list