[RFC 06/20] drm/xe: Convert mem_access assertion towards the runtime_pm state

Tue Jan 9 11:06:19 UTC 2024

On 28/12/2023 02:12, Rodrigo Vivi wrote:
> The mem_access helpers are going away and getting replaced by
> direct calls of the xe_pm_runtime_{get,put} functions. However, an
> assertion with a warning splat is desired when we hit the worst
> case of a memory access with the device really in the 'suspended'
> state.
> 
> Also, this needs to be the first step. Otherwise, the upcoming
> conversion would be really noise with warn splats of missing mem_access
> gets.
> 
> Signed-off-by: Rodrigo Vivi <rodrigo.vivi at intel.com>
> ---
>   drivers/gpu/drm/xe/xe_device.c | 13 ++++++++++++-
>   drivers/gpu/drm/xe/xe_pm.c     | 16 ++++++++++++++++
>   drivers/gpu/drm/xe/xe_pm.h     |  1 +
>   3 files changed, 29 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
> index 86867d42d5329..dc3721bb37b1e 100644
> --- a/drivers/gpu/drm/xe/xe_device.c
> +++ b/drivers/gpu/drm/xe/xe_device.c
> @@ -631,9 +631,20 @@ bool xe_device_mem_access_ongoing(struct xe_device *xe)
>   	return atomic_read(&xe->mem_access.ref);
>   }
>   
> +/**
> + * xe_device_assert_mem_access - Inspect the current runtime_pm state.
> + * @xe: xe device instance
> + *
> + * To be used before any kind of memory access. It will splat a debug warning
> + * if the device is currently sleeping. But it doesn't guarantee in any way
> + * that the device is going to continue awake. Xe PM runtime get and put
> + * functions might be added to the outer bound of the memory access, while
> + * this check is intended for inner usage to splat some warning if the worst
> + * case has just happened.
> + */
>   void xe_device_assert_mem_access(struct xe_device *xe)
>   {
> -	XE_WARN_ON(!xe_device_mem_access_ongoing(xe));
> +	XE_WARN_ON(xe_pm_runtime_suspended(xe));
>   }
>   
>   bool xe_device_mem_access_get_if_ongoing(struct xe_device *xe)
> diff --git a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c
> index cabed94a21873..45114e4e76a5a 100644
> --- a/drivers/gpu/drm/xe/xe_pm.c
> +++ b/drivers/gpu/drm/xe/xe_pm.c
> @@ -246,6 +246,22 @@ struct task_struct *xe_pm_read_callback_task(struct xe_device *xe)
>   	return READ_ONCE(xe->pm_callback_task);
>   }
>   
> +/**
> + * xe_pm_runtime_suspended - Inspect the current runtime_pm state.
> + * @xe: xe device instance
> + *
> + * This does not provide any guarantee that the device is going to continue
> + * suspended as it might be racing with the runtime state transitions.
> + * It can be used only as a non-reliable assertion, to ensure that we are not in
> + * the sleep state while trying to access some memory for instance.
> + *
> + * Returns true if PCI device is suspended, false otherwise.
> + */
> +bool xe_pm_runtime_suspended(struct xe_device *xe)
> +{
> +	return pm_runtime_suspended(xe->drm.dev);

Would it not be better to check for active instead? That way we can 
check for !active above and create a bigger net with SUSPENDING and 
RESUMING states also being invalid i.e another task is about to suspend 
or hasn't fully resumed yet. We might also need to also check the 
callback task though.

> +}
> +
>   /**
>    * xe_pm_runtime_suspend - Prepare our device for D3hot/D3Cold
>    * @xe: xe device instance
> diff --git a/drivers/gpu/drm/xe/xe_pm.h b/drivers/gpu/drm/xe/xe_pm.h
> index 069f41c61505b..67a9bf3dd379b 100644
> --- a/drivers/gpu/drm/xe/xe_pm.h
> +++ b/drivers/gpu/drm/xe/xe_pm.h
> @@ -22,6 +22,7 @@ int xe_pm_resume(struct xe_device *xe);
>   
>   void xe_pm_init(struct xe_device *xe);
>   void xe_pm_runtime_fini(struct xe_device *xe);
> +bool xe_pm_runtime_suspended(struct xe_device *xe);
>   int xe_pm_runtime_suspend(struct xe_device *xe);
>   int xe_pm_runtime_resume(struct xe_device *xe);
>   void xe_pm_runtime_get(struct xe_device *xe);