[Intel-xe] [PATCH] drm/xe: Fix locking in CT fast path
Rodrigo Vivi
rodrigo.vivi at intel.com
Tue Mar 21 21:14:06 UTC 2023
On Tue, Mar 21, 2023 at 02:34:29PM +0000, Matthew Brost wrote:
> On Tue, Mar 21, 2023 at 01:25:51PM +0100, Maarten Lankhorst wrote:
> > Hey,
> >
> > I'm afraid this is not allowed, you can't take a mutex in an irq handler, not even a trylock.
> >
> > From Documentation/locking/mutex-design.rst:
> >
> > The mutex subsystem checks and enforces the following rules:
> > ...
> > - Mutexes may not be used in hardware or software interrupt
> > contexts such as tasklets and timers.
> >
>
> I wasn't aware of this byr DOC makes it clear this isn;t allowed.
>
> > Lockdep will likely still splat too as a result.
> >
>
> Lockdep is happy which is very odd since clearly this isn't allowed per
> the DOC.
This is strange... in general it is loud when you try mutex inside irq.
some .config missing? or maybe the trylock itself misleading lockdep?!
>
> Anyways, I'm thinking your atomic fix is needed
Yes, this is becoming un-avoidable. I would prefer some lock than atomic.
Maybe a spinlock?
Or we need to be really sure that there won't be any race where we end
with an access before the wakeup.
The runtime_pm doc even suggest that all the memory accesses should be
serialized instead of what we are trying to do currently with the
mem_access. Thoughts on if it is possible to serialize them on our cases?
check for the 'foo_' examples at Documentation/power/runtime_pm.txt
> but likely also need a
> follow on to this patch as well something like:
>
> xe_device_mem_access_get_if_active();
hmmm... I didn't want to grow the mem_access into an rpm wrapper for all
cases like we ended up in i915... but this might be unavoidable for this
case...
> do CT fast path...
> xe_device_mem_access_put_async();
>
> The key being we can't sleep but also can't power down access to the
> VRAM when the CT fast path is executing.
>
> Matt
>
> > Cheers,
> > ~Maarten
> >
> > On 2023-03-17 01:22, Matthew Brost wrote:
> > > We can't sleep in the CT fast but need to ensure we can access VRAM. Use
> > > a trylock + reference counter check to ensure safe access to VRAM, if
> > > either check fails, fall back to slow path.
> > >
> > > VLK-45296
> > >
> > > Signed-off-by: Matthew Brost<matthew.brost at intel.com>
> > > ---
> > > drivers/gpu/drm/xe/xe_device.h | 9 ++++++++-
> > > drivers/gpu/drm/xe/xe_guc_ct.c | 11 ++++++++++-
> > > 2 files changed, 18 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/xe_device.h
> > > index 25c5087f5aad..0cc4f52098a1 100644
> > > --- a/drivers/gpu/drm/xe/xe_device.h
> > > +++ b/drivers/gpu/drm/xe/xe_device.h
> > > @@ -95,12 +95,19 @@ static inline void xe_device_assert_mem_access(struct xe_device *xe)
> > > XE_WARN_ON(!xe->mem_access.ref);
> > > }
> > > +static inline bool __xe_device_mem_access_ongoing(struct xe_device *xe)
> > > +{
> > > + lockdep_assert_held(&xe->mem_access.lock);
> > > +
> > > + return xe->mem_access.ref;
> > > +}
> > > +
> > > static inline bool xe_device_mem_access_ongoing(struct xe_device *xe)
> > > {
> > > bool ret;
> > > mutex_lock(&xe->mem_access.lock);
> > > - ret = xe->mem_access.ref;
> > > + ret = __xe_device_mem_access_ongoing(xe);
> > > mutex_unlock(&xe->mem_access.lock);
> > > return ret;
> > > diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
> > > index e5ed9022a0a2..bba0ef21c9e5 100644
> > > --- a/drivers/gpu/drm/xe/xe_guc_ct.c
> > > +++ b/drivers/gpu/drm/xe/xe_guc_ct.c
> > > @@ -1030,9 +1030,15 @@ void xe_guc_ct_fast_path(struct xe_guc_ct *ct)
> > > struct xe_device *xe = ct_to_xe(ct);
> > > int len;
> > > - if (!xe_device_in_fault_mode(xe) || !xe_device_mem_access_ongoing(xe))
> > > + if (!xe_device_in_fault_mode(xe))
> > > return;
> > > + if (!mutex_trylock(&xe->mem_access.lock))
> > > + return;
> > > +
> > > + if (!__xe_device_mem_access_ongoing(xe))
> > > + goto unlock;
> > > +
> > > spin_lock(&ct->fast_lock);
> > > do {
> > > len = g2h_read(ct, ct->fast_msg, true);
> > > @@ -1040,6 +1046,9 @@ void xe_guc_ct_fast_path(struct xe_guc_ct *ct)
> > > g2h_fast_path(ct, ct->fast_msg, len);
> > > } while (len > 0);
> > > spin_unlock(&ct->fast_lock);
> > > +
> > > +unlock:
> > > + mutex_unlock(&xe->mem_access.lock);
> > > }
> > > /* Returns less than zero on error, 0 on done, 1 on more available */
More information about the Intel-xe
mailing list