[Intel-xe] [PATCH] drm/xe: Fix locking in CT fast path

Tue Mar 21 14:34:29 UTC 2023

On Tue, Mar 21, 2023 at 01:25:51PM +0100, Maarten Lankhorst wrote:
> Hey,
> 
> I'm afraid this is not allowed, you can't take a mutex in an irq handler, not even a trylock.
> 
> From Documentation/locking/mutex-design.rst:
> 
> The mutex subsystem checks and enforces the following rules:
> ...
>     - Mutexes may not be used in hardware or software interrupt
>       contexts such as tasklets and timers.
> 

I wasn't aware of this byr DOC makes it clear this isn;t allowed. 

> Lockdep will likely still splat too as a result.
>

Lockdep is happy which is very odd since clearly this isn't allowed per
the DOC.

Anyways, I'm thinking your atomic fix is needed but likely also need a
follow on to this patch as well something like:

xe_device_mem_access_get_if_active();
do CT fast path... 
xe_device_mem_access_put_async();

The key being we can't sleep but also can't power down access to the
VRAM when the CT fast path is executing.

Matt

> Cheers,
> ~Maarten
> 
> On 2023-03-17 01:22, Matthew Brost wrote:
> > We can't sleep in the CT fast but need to ensure we can access VRAM. Use
> > a trylock + reference counter check to ensure safe access to VRAM, if
> > either check fails, fall back to slow path.
> > 
> > VLK-45296
> > 
> > Signed-off-by: Matthew Brost<matthew.brost at intel.com>
> > ---
> >   drivers/gpu/drm/xe/xe_device.h |  9 ++++++++-
> >   drivers/gpu/drm/xe/xe_guc_ct.c | 11 ++++++++++-
> >   2 files changed, 18 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/xe_device.h
> > index 25c5087f5aad..0cc4f52098a1 100644
> > --- a/drivers/gpu/drm/xe/xe_device.h
> > +++ b/drivers/gpu/drm/xe/xe_device.h
> > @@ -95,12 +95,19 @@ static inline void xe_device_assert_mem_access(struct xe_device *xe)
> >   	XE_WARN_ON(!xe->mem_access.ref);
> >   }
> > +static inline bool __xe_device_mem_access_ongoing(struct xe_device *xe)
> > +{
> > +	lockdep_assert_held(&xe->mem_access.lock);
> > +
> > +	return xe->mem_access.ref;
> > +}
> > +
> >   static inline bool xe_device_mem_access_ongoing(struct xe_device *xe)
> >   {
> >   	bool ret;
> >   	mutex_lock(&xe->mem_access.lock);
> > -	ret = xe->mem_access.ref;
> > +	ret = __xe_device_mem_access_ongoing(xe);
> >   	mutex_unlock(&xe->mem_access.lock);
> >   	return ret;
> > diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
> > index e5ed9022a0a2..bba0ef21c9e5 100644
> > --- a/drivers/gpu/drm/xe/xe_guc_ct.c
> > +++ b/drivers/gpu/drm/xe/xe_guc_ct.c
> > @@ -1030,9 +1030,15 @@ void xe_guc_ct_fast_path(struct xe_guc_ct *ct)
> >   	struct xe_device *xe = ct_to_xe(ct);
> >   	int len;
> > -	if (!xe_device_in_fault_mode(xe) || !xe_device_mem_access_ongoing(xe))
> > +	if (!xe_device_in_fault_mode(xe))
> >   		return;
> > +	if (!mutex_trylock(&xe->mem_access.lock))
> > +		return;
> > +
> > +	if (!__xe_device_mem_access_ongoing(xe))
> > +		goto unlock;
> > +
> >   	spin_lock(&ct->fast_lock);
> >   	do {
> >   		len = g2h_read(ct, ct->fast_msg, true);
> > @@ -1040,6 +1046,9 @@ void xe_guc_ct_fast_path(struct xe_guc_ct *ct)
> >   			g2h_fast_path(ct, ct->fast_msg, len);
> >   	} while (len > 0);
> >   	spin_unlock(&ct->fast_lock);
> > +
> > +unlock:
> > +	mutex_unlock(&xe->mem_access.lock);
> >   }
> >   /* Returns less than zero on error, 0 on done, 1 on more available */