[Intel-xe] [PATCH v3 1/3] drm/xe: Store xe_he_engine in xe_hw_engine_snapshot
Rodrigo Vivi
rodrigo.vivi at intel.com
Fri Oct 13 20:40:49 UTC 2023
On Thu, Oct 12, 2023 at 10:30:36AM -0700, Matt Roper wrote:
> On Thu, Oct 12, 2023 at 10:20:33AM -0700, Souza, Jose wrote:
> > On Thu, 2023-10-12 at 10:16 -0700, Matt Roper wrote:
> > > On Tue, Oct 10, 2023 at 11:41:49AM -0700, José Roberto de Souza wrote:
> > > > A future patch will require gt and xe device structs, so here
> > > > replacing class by hwe.
> > > >
> > > > Cc: Rodrigo Vivi <rodrigo.vivi at intel.com>
> > > > Cc: Matt Roper <matthew.d.roper at intel.com>
> > > > Signed-off-by: José Roberto de Souza <jose.souza at intel.com>
> > > > ---
> > > > drivers/gpu/drm/xe/xe_hw_engine.c | 6 +++---
> > > > drivers/gpu/drm/xe/xe_hw_engine_types.h | 4 ++--
> > > > 2 files changed, 5 insertions(+), 5 deletions(-)
> > > >
> > > > diff --git a/drivers/gpu/drm/xe/xe_hw_engine.c b/drivers/gpu/drm/xe/xe_hw_engine.c
> > > > index b5b0845908887..37b3ee1268090 100644
> > > > --- a/drivers/gpu/drm/xe/xe_hw_engine.c
> > > > +++ b/drivers/gpu/drm/xe/xe_hw_engine.c
> > > > @@ -684,7 +684,7 @@ xe_hw_engine_snapshot_capture(struct xe_hw_engine *hwe)
> > > > if (snapshot->name)
> > > > strscpy(snapshot->name, hwe->name, len);
> > > >
> > > > - snapshot->class = hwe->class;
> > > > + snapshot->hwe = hwe;
> > >
> > > I haven't really looked into the devcoredump stuff much, but it's not
> > > clear to me from a really quick skim whether it's safe to save away the
> > > hwe like this. Is there ever a case where the snapshot can persist
> > > after the engine itself has been destroyed (e.g., if we capture an error
> > > encountered during a device probe that results in the device being torn
> > > back down)? If so, then you'd probably need to make sure that the hwe
> > > field here only gets used during the capture, and the 'print' routine
> > > would still need a cached snapshot->class to avoid use-after-free
> > > errors.
This is very real! We need to cache everything we can and consider the
device itself is dead at print. But here we are at the capture no?!
> >
> > Xe devcoredump dump is also removed with Xe module, so no problems with this.
>
> With the module? Or the device? Even if probe fails on a single
> device, the module may remain loaded (and might even still be driving
> other GPUs). If the dumps for failed device(s) are still accessible,
> there might be a problem?
Well, this is a bad devcoredump limitation at this moment. It doesn't go
away with the device or with the module. It blocks for 5 seconds before
it is gone. I have couple patches [1] to suggest to devcoredump itself so the
driver can also decide when the dump device is deleted. But I'm trying to
get us first in upstream for a proper handling of that.
[1] https://gitlab.freedesktop.org/rodrigovivi/drm-xe/-/commits/devcoredump-removal/
>
>
> Matt
>
> >
> > >
> > > Rodrigo probably has better insight as to whether that's something that
> > > we need to worry about.
> > >
> > >
> > > Matt
> > >
> > > > snapshot->logical_instance = hwe->logical_instance;
> > > > snapshot->forcewake.domain = hwe->domain;
> > > > snapshot->forcewake.ref = xe_force_wake_ref(gt_to_fw(hwe->gt),
> > > > @@ -731,7 +731,7 @@ xe_hw_engine_snapshot_capture(struct xe_hw_engine *hwe)
> > > > hw_engine_mmio_read32(hwe, RING_DMA_FADD(0));
> > > > snapshot->reg.ipehr = hw_engine_mmio_read32(hwe, RING_IPEHR(0));
> > > >
> > > > - if (snapshot->class == XE_ENGINE_CLASS_COMPUTE)
> > > > + if (snapshot->hwe->class == XE_ENGINE_CLASS_COMPUTE)
> > > > snapshot->reg.rcu_mode = xe_mmio_read32(hwe->gt, RCU_MODE);
> > > >
> > > > return snapshot;
> > > > @@ -786,7 +786,7 @@ void xe_hw_engine_snapshot_print(struct xe_hw_engine_snapshot *snapshot,
> > > > snapshot->reg.ring_dma_fadd_udw,
> > > > snapshot->reg.ring_dma_fadd);
> > > > drm_printf(p, "\tIPEHR: 0x%08x\n\n", snapshot->reg.ipehr);
> > > > - if (snapshot->class == XE_ENGINE_CLASS_COMPUTE)
> > > > + if (snapshot->hwe->class == XE_ENGINE_CLASS_COMPUTE)
> > > > drm_printf(p, "\tRCU_MODE: 0x%08x\n",
> > > > snapshot->reg.rcu_mode);
> > > > }
> > > > diff --git a/drivers/gpu/drm/xe/xe_hw_engine_types.h b/drivers/gpu/drm/xe/xe_hw_engine_types.h
> > > > index 5d4ee29042407..71c0a0169e62a 100644
> > > > --- a/drivers/gpu/drm/xe/xe_hw_engine_types.h
> > > > +++ b/drivers/gpu/drm/xe/xe_hw_engine_types.h
> > > > @@ -156,8 +156,8 @@ struct xe_hw_engine {
> > > > struct xe_hw_engine_snapshot {
> > > > /** @name: name of the hw engine */
> > > > char *name;
> > > > - /** @class: class of this hw engine */
> > > > - enum xe_engine_class class;
> > > > + /** @hwe: hw engine */
> > > > + struct xe_hw_engine *hwe;
> > > > /** @logical_instance: logical instance of this hw engine */
> > > > u16 logical_instance;
> > > > /** @forcewake: Force Wake information snapshot */
> > > > --
> > > > 2.42.0
> > > >
> > >
> >
>
> --
> Matt Roper
> Graphics Software Engineer
> Linux GPU Platform Enablement
> Intel Corporation
More information about the Intel-xe
mailing list