[Intel-xe] [PATCH v3 1/3] drm/xe: Store xe_he_engine in xe_hw_engine_snapshot

Matt Roper matthew.d.roper at intel.com
Fri Oct 13 21:07:21 UTC 2023


On Fri, Oct 13, 2023 at 04:40:49PM -0400, Rodrigo Vivi wrote:
> On Thu, Oct 12, 2023 at 10:30:36AM -0700, Matt Roper wrote:
> > On Thu, Oct 12, 2023 at 10:20:33AM -0700, Souza, Jose wrote:
> > > On Thu, 2023-10-12 at 10:16 -0700, Matt Roper wrote:
> > > > On Tue, Oct 10, 2023 at 11:41:49AM -0700, José Roberto de Souza wrote:
> > > > > A future patch will require gt and xe device structs, so here
> > > > > replacing class by hwe.
> > > > > 
> > > > > Cc: Rodrigo Vivi <rodrigo.vivi at intel.com>
> > > > > Cc: Matt Roper <matthew.d.roper at intel.com>
> > > > > Signed-off-by: José Roberto de Souza <jose.souza at intel.com>
> > > > > ---
> > > > >  drivers/gpu/drm/xe/xe_hw_engine.c       | 6 +++---
> > > > >  drivers/gpu/drm/xe/xe_hw_engine_types.h | 4 ++--
> > > > >  2 files changed, 5 insertions(+), 5 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/gpu/drm/xe/xe_hw_engine.c b/drivers/gpu/drm/xe/xe_hw_engine.c
> > > > > index b5b0845908887..37b3ee1268090 100644
> > > > > --- a/drivers/gpu/drm/xe/xe_hw_engine.c
> > > > > +++ b/drivers/gpu/drm/xe/xe_hw_engine.c
> > > > > @@ -684,7 +684,7 @@ xe_hw_engine_snapshot_capture(struct xe_hw_engine *hwe)
> > > > >  	if (snapshot->name)
> > > > >  		strscpy(snapshot->name, hwe->name, len);
> > > > >  
> > > > > -	snapshot->class = hwe->class;
> > > > > +	snapshot->hwe = hwe;
> > > > 
> > > > I haven't really looked into the devcoredump stuff much, but it's not
> > > > clear to me from a really quick skim whether it's safe to save away the
> > > > hwe like this.  Is there ever a case where the snapshot can persist
> > > > after the engine itself has been destroyed (e.g., if we capture an error
> > > > encountered during a device probe that results in the device being torn
> > > > back down)?  If so, then you'd probably need to make sure that the hwe
> > > > field here only gets used during the capture, and the 'print' routine
> > > > would still need a cached snapshot->class to avoid use-after-free
> > > > errors.
> 
> This is very real! We need to cache everything we can and consider the
> device itself is dead at print. But here we are at the capture no?!

This specific line is in capture.  But then the captured 'hwe' pointer
gets used farther down in this patch during the print step since we're
not caching 'class' anymore.  And a later patch in this series also
tries to do a version check on gt->xe during the print function, so it's
potentially chasing two invalid pointers.


Matt

> 
> > > 
> > > Xe devcoredump dump is also removed with Xe module, so no problems with this.
> > 
> > With the module?  Or the device?  Even if probe fails on a single
> > device, the module may remain loaded (and might even still be driving
> > other GPUs).  If the dumps for failed device(s) are still accessible,
> > there might be a problem?
> 
> Well, this is a bad devcoredump limitation at this moment. It doesn't go
> away with the device or with the module. It blocks for 5 seconds before
> it is gone. I have couple patches [1] to suggest to devcoredump itself so the
> driver can also decide when the dump device is deleted. But I'm trying to
> get us first in upstream for a proper handling of that.
> 
> [1] https://gitlab.freedesktop.org/rodrigovivi/drm-xe/-/commits/devcoredump-removal/
> 
> > 
> > 
> > Matt
> > 
> > > 
> > > > 
> > > > Rodrigo probably has better insight as to whether that's something that
> > > > we need to worry about.
> > > > 
> > > > 
> > > > Matt
> > > > 
> > > > >  	snapshot->logical_instance = hwe->logical_instance;
> > > > >  	snapshot->forcewake.domain = hwe->domain;
> > > > >  	snapshot->forcewake.ref = xe_force_wake_ref(gt_to_fw(hwe->gt),
> > > > > @@ -731,7 +731,7 @@ xe_hw_engine_snapshot_capture(struct xe_hw_engine *hwe)
> > > > >  		hw_engine_mmio_read32(hwe, RING_DMA_FADD(0));
> > > > >  	snapshot->reg.ipehr = hw_engine_mmio_read32(hwe, RING_IPEHR(0));
> > > > >  
> > > > > -	if (snapshot->class == XE_ENGINE_CLASS_COMPUTE)
> > > > > +	if (snapshot->hwe->class == XE_ENGINE_CLASS_COMPUTE)
> > > > >  		snapshot->reg.rcu_mode = xe_mmio_read32(hwe->gt, RCU_MODE);
> > > > >  
> > > > >  	return snapshot;
> > > > > @@ -786,7 +786,7 @@ void xe_hw_engine_snapshot_print(struct xe_hw_engine_snapshot *snapshot,
> > > > >  		   snapshot->reg.ring_dma_fadd_udw,
> > > > >  		   snapshot->reg.ring_dma_fadd);
> > > > >  	drm_printf(p, "\tIPEHR: 0x%08x\n\n", snapshot->reg.ipehr);
> > > > > -	if (snapshot->class == XE_ENGINE_CLASS_COMPUTE)
> > > > > +	if (snapshot->hwe->class == XE_ENGINE_CLASS_COMPUTE)
> > > > >  		drm_printf(p, "\tRCU_MODE: 0x%08x\n",
> > > > >  			   snapshot->reg.rcu_mode);
> > > > >  }
> > > > > diff --git a/drivers/gpu/drm/xe/xe_hw_engine_types.h b/drivers/gpu/drm/xe/xe_hw_engine_types.h
> > > > > index 5d4ee29042407..71c0a0169e62a 100644
> > > > > --- a/drivers/gpu/drm/xe/xe_hw_engine_types.h
> > > > > +++ b/drivers/gpu/drm/xe/xe_hw_engine_types.h
> > > > > @@ -156,8 +156,8 @@ struct xe_hw_engine {
> > > > >  struct xe_hw_engine_snapshot {
> > > > >  	/** @name: name of the hw engine */
> > > > >  	char *name;
> > > > > -	/** @class: class of this hw engine */
> > > > > -	enum xe_engine_class class;
> > > > > +	/** @hwe: hw engine */
> > > > > +	struct xe_hw_engine *hwe;
> > > > >  	/** @logical_instance: logical instance of this hw engine */
> > > > >  	u16 logical_instance;
> > > > >  	/** @forcewake: Force Wake information snapshot */
> > > > > -- 
> > > > > 2.42.0
> > > > > 
> > > > 
> > > 
> > 
> > -- 
> > Matt Roper
> > Graphics Software Engineer
> > Linux GPU Platform Enablement
> > Intel Corporation

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


More information about the Intel-xe mailing list