[RFC 06/20] drm/xe: Convert mem_access assertion towards the runtime_pm state

Rodrigo Vivi rodrigo.vivi at intel.com
Tue Jan 9 17:50:43 UTC 2024


On Tue, Jan 09, 2024 at 11:06:19AM +0000, Matthew Auld wrote:
> On 28/12/2023 02:12, Rodrigo Vivi wrote:
> > The mem_access helpers are going away and getting replaced by
> > direct calls of the xe_pm_runtime_{get,put} functions. However, an
> > assertion with a warning splat is desired when we hit the worst
> > case of a memory access with the device really in the 'suspended'
> > state.
> > 
> > Also, this needs to be the first step. Otherwise, the upcoming
> > conversion would be really noise with warn splats of missing mem_access
> > gets.
> > 
> > Signed-off-by: Rodrigo Vivi <rodrigo.vivi at intel.com>
> > ---
> >   drivers/gpu/drm/xe/xe_device.c | 13 ++++++++++++-
> >   drivers/gpu/drm/xe/xe_pm.c     | 16 ++++++++++++++++
> >   drivers/gpu/drm/xe/xe_pm.h     |  1 +
> >   3 files changed, 29 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
> > index 86867d42d5329..dc3721bb37b1e 100644
> > --- a/drivers/gpu/drm/xe/xe_device.c
> > +++ b/drivers/gpu/drm/xe/xe_device.c
> > @@ -631,9 +631,20 @@ bool xe_device_mem_access_ongoing(struct xe_device *xe)
> >   	return atomic_read(&xe->mem_access.ref);
> >   }
> > +/**
> > + * xe_device_assert_mem_access - Inspect the current runtime_pm state.
> > + * @xe: xe device instance
> > + *
> > + * To be used before any kind of memory access. It will splat a debug warning
> > + * if the device is currently sleeping. But it doesn't guarantee in any way
> > + * that the device is going to continue awake. Xe PM runtime get and put
> > + * functions might be added to the outer bound of the memory access, while
> > + * this check is intended for inner usage to splat some warning if the worst
> > + * case has just happened.
> > + */
> >   void xe_device_assert_mem_access(struct xe_device *xe)
> >   {
> > -	XE_WARN_ON(!xe_device_mem_access_ongoing(xe));
> > +	XE_WARN_ON(xe_pm_runtime_suspended(xe));
> >   }
> >   bool xe_device_mem_access_get_if_ongoing(struct xe_device *xe)
> > diff --git a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c
> > index cabed94a21873..45114e4e76a5a 100644
> > --- a/drivers/gpu/drm/xe/xe_pm.c
> > +++ b/drivers/gpu/drm/xe/xe_pm.c
> > @@ -246,6 +246,22 @@ struct task_struct *xe_pm_read_callback_task(struct xe_device *xe)
> >   	return READ_ONCE(xe->pm_callback_task);
> >   }
> > +/**
> > + * xe_pm_runtime_suspended - Inspect the current runtime_pm state.
> > + * @xe: xe device instance
> > + *
> > + * This does not provide any guarantee that the device is going to continue
> > + * suspended as it might be racing with the runtime state transitions.
> > + * It can be used only as a non-reliable assertion, to ensure that we are not in
> > + * the sleep state while trying to access some memory for instance.
> > + *
> > + * Returns true if PCI device is suspended, false otherwise.
> > + */
> > +bool xe_pm_runtime_suspended(struct xe_device *xe)
> > +{
> > +	return pm_runtime_suspended(xe->drm.dev);
> 
> Would it not be better to check for active instead? That way we can check
> for !active above and create a bigger net with SUSPENDING and RESUMING
> states also being invalid i.e another task is about to suspend or hasn't
> fully resumed yet. We might also need to also check the callback task
> though.

In both transition cases, the device is anyway awake. And this check will
be called from places that we know that we are in the transition, like
eviction/restore. So we could convert to active, but then we would need
to check if the task is not current as well. And also have the risk with
calls from workqueues that doesn't come from the same task.

Since this pm_runtime var cannot be trusted anyway because we can not
hold the dev power lock, this will be unreliable it doesn't matter
how you put. So I decided for the quick single shot only to ensure
that we get some clue if we hit the worst case that is device for
sure in the sleep state while accessing some memory.

> 
> > +}
> > +
> >   /**
> >    * xe_pm_runtime_suspend - Prepare our device for D3hot/D3Cold
> >    * @xe: xe device instance
> > diff --git a/drivers/gpu/drm/xe/xe_pm.h b/drivers/gpu/drm/xe/xe_pm.h
> > index 069f41c61505b..67a9bf3dd379b 100644
> > --- a/drivers/gpu/drm/xe/xe_pm.h
> > +++ b/drivers/gpu/drm/xe/xe_pm.h
> > @@ -22,6 +22,7 @@ int xe_pm_resume(struct xe_device *xe);
> >   void xe_pm_init(struct xe_device *xe);
> >   void xe_pm_runtime_fini(struct xe_device *xe);
> > +bool xe_pm_runtime_suspended(struct xe_device *xe);
> >   int xe_pm_runtime_suspend(struct xe_device *xe);
> >   int xe_pm_runtime_resume(struct xe_device *xe);
> >   void xe_pm_runtime_get(struct xe_device *xe);


More information about the Intel-xe mailing list