[PATCH] drm/xe: Convert xe_pm_runtime_{get,put} to void and protect from recursion

Fri Mar 1 18:06:57 UTC 2024

On Fri, Mar 01, 2024 at 05:52:02PM +0000, Matthew Auld wrote:
> On 27/02/2024 18:35, Rodrigo Vivi wrote:
> > With mem_access going away and pm_runtime getting called instead,
> > we need to protect these against recursions.
> > 
> > For D3cold, the TTM migration helpers will call for the job execution.
> > Jobs execution will be protected by direct runtime_pm calls, but they
> > cannot be called again if we are already at a runtime suspend/resume
> > transaction when evicting/restoring memory for D3Cold. So, we will check
> > for the xe_pm_read_callback_task.
> > 
> > The put is asynchronous so there's no need to block it. However, for a
> > proper balance, we need to ensure that the references are taken and
> > restored regardless of the flow. So, let's convert them all to void and
> > use some direct linux/pm_runtime functions.
> > 
> > Cc: Matthew Auld <matthew.auld at intel.com>
> > Signed-off-by: Rodrigo Vivi <rodrigo.vivi at intel.com>
> > ---
> >   drivers/gpu/drm/xe/xe_pm.c | 25 ++++++++++++++-----------
> >   drivers/gpu/drm/xe/xe_pm.h |  4 ++--
> >   2 files changed, 16 insertions(+), 13 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c
> > index b5511e3c3153..3664480b21ba 100644
> > --- a/drivers/gpu/drm/xe/xe_pm.c
> > +++ b/drivers/gpu/drm/xe/xe_pm.c
> > @@ -408,26 +408,29 @@ int xe_pm_runtime_resume(struct xe_device *xe)
> >   /**
> >    * xe_pm_runtime_get - Get a runtime_pm reference and resume synchronously
> >    * @xe: xe device instance
> > - *
> > - * Returns: Any number greater than or equal to 0 for success, negative error
> > - * code otherwise.
> >    */
> > -int xe_pm_runtime_get(struct xe_device *xe)
> > +void xe_pm_runtime_get(struct xe_device *xe)
> 
> Actually there is still the caller in intel_runtime_pm_get() compat. What is
> the correct patch order here? It's kind of hard to follow.

Sorry for the conflicting parallel shot.
Put them together now: https://patchwork.freedesktop.org/series/130625/

I hope this makes sense now.

> 
> >   {
> > -	return pm_runtime_get_sync(xe->drm.dev);
> > +	pm_runtime_get_noresume(xe->drm.dev);
> > +
> > +	if (xe_pm_read_callback_task(xe) == current)
> > +		return;
> > +
> > +	pm_runtime_resume(xe->drm.dev);
> >   }
> >   /**
> >    * xe_pm_runtime_put - Put the runtime_pm reference back and mark as idle
> >    * @xe: xe device instance
> > - *
> > - * Returns: Any number greater than or equal to 0 for success, negative error
> > - * code otherwise.
> >    */
> > -int xe_pm_runtime_put(struct xe_device *xe)
> > +void xe_pm_runtime_put(struct xe_device *xe)
> >   {
> > -	pm_runtime_mark_last_busy(xe->drm.dev);
> > -	return pm_runtime_put(xe->drm.dev);
> > +	if (xe_pm_read_callback_task(xe) == current) {
> > +		pm_runtime_put_noidle(xe->drm.dev);
> > +	} else {
> > +		pm_runtime_mark_last_busy(xe->drm.dev);
> > +		pm_runtime_put(xe->drm.dev);
> > +	}
> >   }
> >   /**
> > diff --git a/drivers/gpu/drm/xe/xe_pm.h b/drivers/gpu/drm/xe/xe_pm.h
> > index 7f5884babb29..fdc2a49c1a1f 100644
> > --- a/drivers/gpu/drm/xe/xe_pm.h
> > +++ b/drivers/gpu/drm/xe/xe_pm.h
> > @@ -26,9 +26,9 @@ void xe_pm_runtime_fini(struct xe_device *xe);
> >   bool xe_pm_runtime_suspended(struct xe_device *xe);
> >   int xe_pm_runtime_suspend(struct xe_device *xe);
> >   int xe_pm_runtime_resume(struct xe_device *xe);
> > -int xe_pm_runtime_get(struct xe_device *xe);
> > +void xe_pm_runtime_get(struct xe_device *xe);
> >   int xe_pm_runtime_get_ioctl(struct xe_device *xe);
> > -int xe_pm_runtime_put(struct xe_device *xe);
> > +void xe_pm_runtime_put(struct xe_device *xe);
> >   int xe_pm_runtime_get_if_active(struct xe_device *xe);
> >   void xe_pm_assert_unbounded_bridge(struct xe_device *xe);
> >   int xe_pm_set_vram_threshold(struct xe_device *xe, u32 threshold);