[Intel-xe] [PATCH] drm/xe: Fix locking in CT fast path

Tue Mar 21 12:25:51 UTC 2023

Hey,

I'm afraid this is not allowed, you can't take a mutex in an irq handler, not even a trylock.

 From Documentation/locking/mutex-design.rst:

The mutex subsystem checks and enforces the following rules:
...
     - Mutexes may not be used in hardware or software interrupt
       contexts such as tasklets and timers.

Lockdep will likely still splat too as a result.

Cheers,
~Maarten

On 2023-03-17 01:22, Matthew Brost wrote:
> We can't sleep in the CT fast but need to ensure we can access VRAM. Use
> a trylock + reference counter check to ensure safe access to VRAM, if
> either check fails, fall back to slow path.
>
> VLK-45296
>
> Signed-off-by: Matthew Brost<matthew.brost at intel.com>
> ---
>   drivers/gpu/drm/xe/xe_device.h |  9 ++++++++-
>   drivers/gpu/drm/xe/xe_guc_ct.c | 11 ++++++++++-
>   2 files changed, 18 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/xe_device.h
> index 25c5087f5aad..0cc4f52098a1 100644
> --- a/drivers/gpu/drm/xe/xe_device.h
> +++ b/drivers/gpu/drm/xe/xe_device.h
> @@ -95,12 +95,19 @@ static inline void xe_device_assert_mem_access(struct xe_device *xe)
>   	XE_WARN_ON(!xe->mem_access.ref);
>   }
>   
> +static inline bool __xe_device_mem_access_ongoing(struct xe_device *xe)
> +{
> +	lockdep_assert_held(&xe->mem_access.lock);
> +
> +	return xe->mem_access.ref;
> +}
> +
>   static inline bool xe_device_mem_access_ongoing(struct xe_device *xe)
>   {
>   	bool ret;
>   
>   	mutex_lock(&xe->mem_access.lock);
> -	ret = xe->mem_access.ref;
> +	ret = __xe_device_mem_access_ongoing(xe);
>   	mutex_unlock(&xe->mem_access.lock);
>   
>   	return ret;
> diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
> index e5ed9022a0a2..bba0ef21c9e5 100644
> --- a/drivers/gpu/drm/xe/xe_guc_ct.c
> +++ b/drivers/gpu/drm/xe/xe_guc_ct.c
> @@ -1030,9 +1030,15 @@ void xe_guc_ct_fast_path(struct xe_guc_ct *ct)
>   	struct xe_device *xe = ct_to_xe(ct);
>   	int len;
>   
> -	if (!xe_device_in_fault_mode(xe) || !xe_device_mem_access_ongoing(xe))
> +	if (!xe_device_in_fault_mode(xe))
>   		return;
>   
> +	if (!mutex_trylock(&xe->mem_access.lock))
> +		return;
> +
> +	if (!__xe_device_mem_access_ongoing(xe))
> +		goto unlock;
> +
>   	spin_lock(&ct->fast_lock);
>   	do {
>   		len = g2h_read(ct, ct->fast_msg, true);
> @@ -1040,6 +1046,9 @@ void xe_guc_ct_fast_path(struct xe_guc_ct *ct)
>   			g2h_fast_path(ct, ct->fast_msg, len);
>   	} while (len > 0);
>   	spin_unlock(&ct->fast_lock);
> +
> +unlock:
> +	mutex_unlock(&xe->mem_access.lock);
>   }
>   
>   /* Returns less than zero on error, 0 on done, 1 on more available */
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/intel-xe/attachments/20230321/b787c08e/attachment-0001.htm>