[PATCH v9 03/11] drm/xe/devcoredump: Improve section headings and add tile info
Souza, Jose
jose.souza at intel.com
Thu Dec 12 20:30:59 UTC 2024
On Thu, 2024-12-12 at 12:06 -0800, John Harrison wrote:
> On 12/12/2024 11:31, Souza, Jose wrote:
> > On Thu, 2024-12-12 at 10:59 -0800, John Harrison wrote:
> > > On 12/12/2024 10:17, Souza, Jose wrote:
> > > > On Wed, 2024-10-02 at 17:46 -0700, John.C.Harrison at Intel.com wrote:
> > > > > From: John Harrison <John.C.Harrison at Intel.com>
> > > > >
> > > > > The xe_guc_exec_queue_snapshot is not really a GuC internal thing and
> > > > > is definitely not a GuC CT thing. So give it its own section heading.
> > > > > The snapshot itself is really a capture of the submission backend's
> > > > > internal state. Although all it currently prints out is the submission
> > > > > contexts. So label it as 'Contexts'. If more general state is added
> > > > > later then it could be change to 'Submission backend' or some such.
> > > > >
> > > > > Further, everything from the GuC CT section onwards is GT specific but
> > > > > there was no indication of which GT it was related to (and that is
> > > > > impossible to work out from the other fields that are given). So add a
> > > > > GT section heading. Also include the tile id of the GT, because again
> > > > > significant information.
> > > > >
> > > > > Lastly, drop a couple of unnecessary line feeds within sections.
> > > > >
> > > > > v2: Add GT section heading, add tile id to device section.
> > > > >
> > > > > Signed-off-by: John Harrison <John.C.Harrison at Intel.com>
> > > > > Reviewed-by: Julia Filipchuk <julia.filipchuk at intel.com>
> > > > > ---
> > > > > drivers/gpu/drm/xe/xe_devcoredump.c | 5 +++++
> > > > > drivers/gpu/drm/xe/xe_devcoredump_types.h | 3 ++-
> > > > > drivers/gpu/drm/xe/xe_device.c | 1 +
> > > > > drivers/gpu/drm/xe/xe_guc_submit.c | 2 +-
> > > > > drivers/gpu/drm/xe/xe_hw_engine.c | 1 -
> > > > > 5 files changed, 9 insertions(+), 3 deletions(-)
> > > > >
> > > > > diff --git a/drivers/gpu/drm/xe/xe_devcoredump.c b/drivers/gpu/drm/xe/xe_devcoredump.c
> > > > > index d23719d5c2a3..2690f1d1cde4 100644
> > > > > --- a/drivers/gpu/drm/xe/xe_devcoredump.c
> > > > > +++ b/drivers/gpu/drm/xe/xe_devcoredump.c
> > > > > @@ -96,8 +96,13 @@ static ssize_t __xe_devcoredump_read(char *buffer, size_t count,
> > > > > drm_printf(&p, "Process: %s\n", ss->process_name);
> > > > > xe_device_snapshot_print(xe, &p);
> > > > >
> > > > > + drm_printf(&p, "\n**** GT #%d ****\n", ss->gt->info.id);
> > > > > + drm_printf(&p, "\tTile: %d\n", ss->gt->tile->id);
> > > > > +
> > > > > drm_puts(&p, "\n**** GuC CT ****\n");
> > > > > xe_guc_ct_snapshot_print(ss->ct, &p);
> > > > > +
> > > > > + drm_puts(&p, "\n**** Contexts ****\n");
> > > > > xe_guc_exec_queue_snapshot_print(ss->ge, &p);
> > > > This broke Mesa parser!
> > > > It can't now parse the exec_queue context because it was expected to be on the '**** GuC CT ****' section.
> > > Then the mesa parse needs to be updated. That was clearly a bug - exec
> > > queue contexts are absolutely not GuC CT data and should not be in the
> > > GuC CT section.
> > Don't matter if it is a bug or not, it broke the parser.
> > If this is not reverted we will have older Kernel versions that don't work with newer Mesa and newer Kernel versions that don't with old Mesa.
> Debug tools cannot count as UAPI that must never change.
That is not my understating from previous threads.
Imagine that a big costumer file a bug to us and attach the devcoredump of a older kernel version.
devcoredump parser will not work. If the developer is aware of this "contract" break he can checkout to a older UMD version, build it and then parse
the devcoredump. Then checkout again to main/master branch and work on the fix... Not viable at all.
At least UMD teams should be notified. At the moment Mesa debugging is blocked because of this patches.
>
> The devcoredump contains much information that is essentially the
> internals of the kernel. It is going to change. That is about the only
> guarantee that we can make about it. And saying that we must
> intentionally break the output of a developer only debug feature in
> order to support older mesa is plain wrong. End users do not care about
> debug tools. All user applications will still work just perfectly.
>
> We can start adding version numbers to the devcoredump format if we
> really need to. But that was already shot down as a bad idea. It is
> debug information and not UAPI. So version incompatibilities are
> expected from time to time.
>
> John.
>
>
> >
> > > John.
> > >
> > > > >
> > > > > drm_puts(&p, "\n**** Job ****\n");
> > > > > diff --git a/drivers/gpu/drm/xe/xe_devcoredump_types.h b/drivers/gpu/drm/xe/xe_devcoredump_types.h
> > > > > index 440d05d77a5a..3cc2f095fdfb 100644
> > > > > --- a/drivers/gpu/drm/xe/xe_devcoredump_types.h
> > > > > +++ b/drivers/gpu/drm/xe/xe_devcoredump_types.h
> > > > > @@ -37,7 +37,8 @@ struct xe_devcoredump_snapshot {
> > > > > /* GuC snapshots */
> > > > > /** @ct: GuC CT snapshot */
> > > > > struct xe_guc_ct_snapshot *ct;
> > > > > - /** @ge: Guc Engine snapshot */
> > > > > +
> > > > > + /** @ge: GuC Submission Engine snapshot */
> > > > > struct xe_guc_submit_exec_queue_snapshot *ge;
> > > > >
> > > > > /** @hwe: HW Engine snapshot array */
> > > > > diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
> > > > > index 09a7ad830e69..030cf703e970 100644
> > > > > --- a/drivers/gpu/drm/xe/xe_device.c
> > > > > +++ b/drivers/gpu/drm/xe/xe_device.c
> > > > > @@ -961,6 +961,7 @@ void xe_device_snapshot_print(struct xe_device *xe, struct drm_printer *p)
> > > > >
> > > > > for_each_gt(gt, xe, id) {
> > > > > drm_printf(p, "GT id: %u\n", id);
> > > > > + drm_printf(p, "\tTile: %u\n", gt->tile->id);
> > > > > drm_printf(p, "\tType: %s\n",
> > > > > gt->info.type == XE_GT_TYPE_MAIN ? "main" : "media");
> > > > > drm_printf(p, "\tIP ver: %u.%u.%u\n",
> > > > > diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
> > > > > index 0ac4a19ec9cc..8690df699170 100644
> > > > > --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> > > > > +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> > > > > @@ -2240,7 +2240,7 @@ xe_guc_exec_queue_snapshot_print(struct xe_guc_submit_exec_queue_snapshot *snaps
> > > > > if (!snapshot)
> > > > > return;
> > > > >
> > > > > - drm_printf(p, "\nGuC ID: %d\n", snapshot->guc.id);
> > > > > + drm_printf(p, "GuC ID: %d\n", snapshot->guc.id);
> > > > > drm_printf(p, "\tName: %s\n", snapshot->name);
> > > > > drm_printf(p, "\tClass: %d\n", snapshot->class);
> > > > > drm_printf(p, "\tLogical mask: 0x%x\n", snapshot->logical_mask);
> > > > > diff --git a/drivers/gpu/drm/xe/xe_hw_engine.c b/drivers/gpu/drm/xe/xe_hw_engine.c
> > > > > index ea6d9ef7fab6..6c9c27304cdc 100644
> > > > > --- a/drivers/gpu/drm/xe/xe_hw_engine.c
> > > > > +++ b/drivers/gpu/drm/xe/xe_hw_engine.c
> > > > > @@ -1084,7 +1084,6 @@ void xe_hw_engine_snapshot_print(struct xe_hw_engine_snapshot *snapshot,
> > > > > if (snapshot->hwe->class == XE_ENGINE_CLASS_COMPUTE)
> > > > > drm_printf(p, "\tRCU_MODE: 0x%08x\n",
> > > > > snapshot->reg.rcu_mode);
> > > > > - drm_puts(p, "\n");
> > > > > }
> > > > >
> > > > > /**
>
More information about the Intel-xe
mailing list