[Intel-xe] [PATCH v13 01/10] drm/xe: fix xe_device_mem_access_get() races
Rodrigo Vivi
rodrigo.vivi at intel.com
Fri Jul 14 15:34:20 UTC 2023
On Fri, Jul 14, 2023 at 01:04:51PM +0000, Gupta, Anshuman wrote:
>
>
> > -----Original Message-----
> > From: Auld, Matthew <matthew.auld at intel.com>
> > Sent: Thursday, July 13, 2023 6:53 PM
> > To: intel-xe at lists.freedesktop.org
> > Cc: Vivi, Rodrigo <rodrigo.vivi at intel.com>; Thomas Hellström
> > <thomas.hellstrom at linux.intel.com>; Brost, Matthew
> > <matthew.brost at intel.com>; Gupta, Anshuman
> > <anshuman.gupta at intel.com>
> > Subject: [PATCH v13 01/10] drm/xe: fix xe_device_mem_access_get() races
> >
> > It looks like there is at least one race here, given that the
> > pm_runtime_suspended() check looks to return false if we are in the process
> > of suspending the device (RPM_SUSPENDING vs RPM_SUSPENDED). We
> > later also do xe_pm_runtime_get_if_active(), but since the device is
> > suspending or has now suspended, this doesn't do anything either.
> > Following from this we can potentially return from
> > xe_device_mem_access_get() with the device suspended or about to be,
> > leading to broken behaviour.
> >
> > Attempt to fix this by always grabbing the runtime ref when our internal ref
> > transitions from 0 -> 1. The hard part is then dealing with the runtime_pm
> > callbacks also calling xe_device_mem_access_get() and deadlocking, which
> > the pm_runtime_suspended() check prevented.
> >
> > v2:
> > - ct->lock looks to be primed with fs_reclaim, so holding that and then
> > allocating memory will cause lockdep to complain. Now that we
> > unconditionally grab the mem_access.lock around mem_access_{get,put},
> > we
> > need to change the ordering wrt to grabbing the ct->lock, since some of
> > the runtime_pm routines can allocate memory (or at least that's what
> > lockdep seems to suggest). Hopefully not a big deal. It might be that
> > there were already issues with this, just that the atomics where
> > "hiding" the potential issues.
> > v3:
> > - Use Thomas Hellström' idea with tracking the active task that is
> > executing in the resume or suspend callback, in order to avoid
> > recursive resume/suspend calls deadlocking on itself.
> > - Split the ct->lock change.
> > v4:
> > - Add smb_mb() around accessing the pm_callback_task for extra safety.
> > (Thomas Hellström)
> > v5:
> > - Clarify the kernel-doc for the mem_access.lock, given that it is quite
> > strange in what it protects (data vs code). The real motivation is to
> > aid lockdep. (Rodrigo Vivi)
> > v6:
> > - Split out the lock change. We still want this as a lockdep aid but
> > only for the xe_device_mem_access_get() path. Sticking a lock on the
> > put() looks be a no-go, also the runtime_put() there is always async.
> > - Now that the lock is gone move to atomics and rely on the pm code
> > serialising multiple callers on the 0 -> 1 transition.
> > - g2h_worker_func() looks to be the next issue, given that
> > suspend-resume callbacks are using CT, so try to handle that.
> > v7:
> > - Add xe_device_mem_access_get_if_ongoing(), and use it in
> > g2h_worker_func().
> > v8 (Anshuman):
> > - Just always grab the rpm, instead of just on the 0 -> 1 transition,
> > which is a lot clearer and simplifies the code quite a bit.
> >
> > Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/258
> > Signed-off-by: Matthew Auld <matthew.auld at intel.com>
> > Cc: Rodrigo Vivi <rodrigo.vivi at intel.com>
> > Cc: Thomas Hellström <thomas.hellstrom at linux.intel.com>
> > Cc: Matthew Brost <matthew.brost at intel.com>
> > Cc: Anshuman Gupta <anshuman.gupta at intel.com>
> > ---
> > drivers/gpu/drm/xe/xe_device.c | 61 +++++++++++++++++++------
> > drivers/gpu/drm/xe/xe_device.h | 11 +----
> > drivers/gpu/drm/xe/xe_device_types.h | 8 +++-
> > drivers/gpu/drm/xe/xe_guc_ct.c | 34 +++++++++++++-
> > drivers/gpu/drm/xe/xe_pm.c | 68 ++++++++++++++++++----------
> > drivers/gpu/drm/xe/xe_pm.h | 2 +-
> > 6 files changed, 131 insertions(+), 53 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/xe/xe_device.c
> > b/drivers/gpu/drm/xe/xe_device.c index 42fedb267454..70f89869b8a6
> > 100644
> > --- a/drivers/gpu/drm/xe/xe_device.c
> > +++ b/drivers/gpu/drm/xe/xe_device.c
> > @@ -412,33 +412,66 @@ u32 xe_device_ccs_bytes(struct xe_device *xe,
> > u64 size)
> > DIV_ROUND_UP(size, NUM_BYTES_PER_CCS_BYTE) : 0; }
> >
> > +bool xe_device_mem_access_ongoing(struct xe_device *xe) {
> > + if (xe_pm_read_callback_task(xe) != NULL)
> > + return true;
> > +
> > + return atomic_read(&xe->mem_access.ref); }
> > +
> > +void xe_device_assert_mem_access(struct xe_device *xe) {
> > + XE_WARN_ON(!xe_device_mem_access_ongoing(xe));
> > +}
> > +
> > bool xe_device_mem_access_get_if_ongoing(struct xe_device *xe) {
> > - return atomic_inc_not_zero(&xe->mem_access.ref);
> > + bool active;
> > +
> > + if (xe_pm_read_callback_task(xe) == current)
> > + return true;
> > +
> > + active = xe_pm_runtime_get_if_active(xe);
> > + if (active) {
> > + int ref = atomic_inc_return(&xe->mem_access.ref);
> > +
> > + XE_WARN_ON(ref == S32_MAX);
> > + }
> > +
> > + return active;
> > }
> >
> > void xe_device_mem_access_get(struct xe_device *xe) {
> > - bool resumed = xe_pm_runtime_resume_if_suspended(xe);
> > - int ref = atomic_inc_return(&xe->mem_access.ref);
> > + int ref;
> >
> > - if (ref == 1)
> > - xe->mem_access.hold_rpm =
> > xe_pm_runtime_get_if_active(xe);
> > + /*
> > + * This looks racy, but should be fine since the pm_callback_task only
> > + * transitions from NULL -> current (and back to NULL again), during
> > the
> > + * runtime_resume() or runtime_suspend() callbacks, for which
> > there can
> > + * only be a single one running for our device. We only need to
> > prevent
> > + * recursively calling the runtime_get or runtime_put from those
> > + * callbacks, as well as preventing triggering any access_ongoing
> > + * asserts.
> > + */
> > + if (xe_pm_read_callback_task(xe) == current)
> > + return;
> >
> > - /* The usage counter increased if device was immediately resumed
> > */
> > - if (resumed)
> > - xe_pm_runtime_put(xe);
> > -
> > - XE_WARN_ON(ref == S32_MAX);
> > + xe_pm_runtime_get(xe);
> > + ref = atomic_inc_return(&xe->mem_access.ref);
> > + XE_WARN_ON(atomic_read(&xe->mem_access.ref) == S32_MAX);
> > }
> >
> > void xe_device_mem_access_put(struct xe_device *xe) {
> > - bool hold = xe->mem_access.hold_rpm;
> > - int ref = atomic_dec_return(&xe->mem_access.ref);
> > + int ref;
> >
> > - if (!ref && hold)
> > - xe_pm_runtime_put(xe);
> > + if (xe_pm_read_callback_task(xe) == current)
> > + return;
> > +
> > + ref = atomic_dec_return(&xe->mem_access.ref);
> > + xe_pm_runtime_put(xe);
> >
> > XE_WARN_ON(ref < 0);
> > }
> > diff --git a/drivers/gpu/drm/xe/xe_device.h
> > b/drivers/gpu/drm/xe/xe_device.h index 8e01bbadb149..0ce0351d7d23
> > 100644
> > --- a/drivers/gpu/drm/xe/xe_device.h
> > +++ b/drivers/gpu/drm/xe/xe_device.h
> > @@ -141,15 +141,8 @@ void xe_device_mem_access_get(struct xe_device
> > *xe); bool xe_device_mem_access_get_if_ongoing(struct xe_device *xe);
> > void xe_device_mem_access_put(struct xe_device *xe);
> >
> > -static inline bool xe_device_mem_access_ongoing(struct xe_device *xe) -{
> > - return atomic_read(&xe->mem_access.ref);
> > -}
> > -
> > -static inline void xe_device_assert_mem_access(struct xe_device *xe) -{
> > - XE_WARN_ON(!xe_device_mem_access_ongoing(xe));
> > -}
> > +void xe_device_assert_mem_access(struct xe_device *xe); bool
> > +xe_device_mem_access_ongoing(struct xe_device *xe);
> >
> > static inline bool xe_device_in_fault_mode(struct xe_device *xe) { diff --git
> > a/drivers/gpu/drm/xe/xe_device_types.h
> > b/drivers/gpu/drm/xe/xe_device_types.h
> > index 26a8de77138a..dcba8bb8f989 100644
> > --- a/drivers/gpu/drm/xe/xe_device_types.h
> > +++ b/drivers/gpu/drm/xe/xe_device_types.h
> > @@ -343,10 +343,14 @@ struct xe_device {
> > struct {
> > /** @ref: ref count of memory accesses */
> > atomic_t ref;
> > - /** @hold_rpm: need to put rpm ref back at the end */
> > - bool hold_rpm;
> > } mem_access;
> >
> > + /**
> > + * @pm_callback_task: Track the active task that is running in either
> > + * the runtime_suspend or runtime_resume callbacks.
> > + */
> > + struct task_struct *pm_callback_task;
> > +
> > /** @d3cold_allowed: Indicates if d3cold is a valid device state */
> > bool d3cold_allowed;
> >
> > diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c
> > b/drivers/gpu/drm/xe/xe_guc_ct.c index 9fb5fd4391d2..e7f7f9b1ce1c
> > 100644
> > --- a/drivers/gpu/drm/xe/xe_guc_ct.c
> > +++ b/drivers/gpu/drm/xe/xe_guc_ct.c
> > @@ -19,6 +19,7 @@
> > #include "xe_guc.h"
> > #include "xe_guc_submit.h"
> > #include "xe_map.h"
> > +#include "xe_pm.h"
> > #include "xe_trace.h"
> >
> > /* Used when a CT send wants to block and / or receive data */ @@ -1091,9
> > +1092,36 @@ static int dequeue_one_g2h(struct xe_guc_ct *ct) static void
> > g2h_worker_func(struct work_struct *w) {
> > struct xe_guc_ct *ct = container_of(w, struct xe_guc_ct,
> > g2h_worker);
> > + bool ongoing;
> > int ret;
> >
> > - xe_device_mem_access_get(ct_to_xe(ct));
> > + /*
> > + * Normal users must always hold mem_access.ref around CT calls.
> > However
> > + * during the runtime pm callbacks we rely on CT to talk to the GuC,
> > but
> > + * at this stage we can't rely on mem_access.ref and even the
> > + * callback_task will be different than current. For such cases we just
> > + * need to ensure we always process the responses from any
> > blocking
> > + * ct_send requests or where we otherwise expect some response
> > when
> > + * initiated from those callbacks (which will need to wait for the
> > below
> > + * dequeue_one_g2h()). The dequeue_one_g2h() will gracefully fail
> > if
> > + * the device has suspended to the point that the CT communication
> > has
> > + * been disabled.
> > + *
> > + * If we are inside the runtime pm callback, we can be the only task
> > + * still issuing CT requests (since that requires having the
> > + * mem_access.ref). It seems like it might in theory be possible to
> > + * receive unsolicited events from the GuC just as we are
> > + * suspending-resuming, but those will currently anyway be lost
> > when
> > + * eventually exiting from suspend, hence no need to wake up the
> > device
> > + * here. If we ever need something stronger than get_if_ongoing()
> > then
> > + * we need to be careful with blocking the pm callbacks from getting
> > CT
> > + * responses, if the worker here is blocked on those callbacks
> > + * completing, creating a deadlock.
> > + */
> > + ongoing = xe_device_mem_access_get_if_ongoing(ct_to_xe(ct));
> > + if (!ongoing && xe_pm_read_callback_task(ct_to_xe(ct)) == NULL)
> > + return;
> Patch looks good tome me from runtime PM pov but i am not familiar with GuC CT flow,
> I cannot provide RB specially for above code.
> For runtime PM perspective.
> Acked-by: Anshuman Gupta <anshuman.gupta at intel.com>
My only take on this guc ct flow is that we should add a debug(warning?) msg
in case we are skipping because !ongoing...
But this can be in a follow up.
Reviewed-by: Rodrigo Vivi <rodrigo.vivi at intel.com>
> Regards,
> Anshuman Gupta.
> > +
> > do {
> > mutex_lock(&ct->lock);
> > ret = dequeue_one_g2h(ct);
> > @@ -1107,7 +1135,9 @@ static void g2h_worker_func(struct work_struct
> > *w)
> > kick_reset(ct);
> > }
> > } while (ret == 1);
> > - xe_device_mem_access_put(ct_to_xe(ct));
> > +
> > + if (ongoing)
> > + xe_device_mem_access_put(ct_to_xe(ct));
> > }
> >
> > static void guc_ctb_snapshot_capture(struct xe_device *xe, struct guc_ctb
> > *ctb, diff --git a/drivers/gpu/drm/xe/xe_pm.c
> > b/drivers/gpu/drm/xe/xe_pm.c index c7901f379aee..fe99820db2df 100644
> > --- a/drivers/gpu/drm/xe/xe_pm.c
> > +++ b/drivers/gpu/drm/xe/xe_pm.c
> > @@ -137,43 +137,71 @@ void xe_pm_runtime_fini(struct xe_device *xe)
> > pm_runtime_forbid(dev);
> > }
> >
> > +static void xe_pm_write_callback_task(struct xe_device *xe,
> > + struct task_struct *task)
> > +{
> > + WRITE_ONCE(xe->pm_callback_task, task);
> > +
> > + /*
> > + * Just in case it's somehow possible for our writes to be reordered to
> > + * the extent that something else re-uses the task written in
> > + * pm_callback_task. For example after returning from the callback,
> > but
> > + * before the reordered write that resets pm_callback_task back to
> > NULL.
> > + */
> > + smp_mb(); /* pairs with xe_pm_read_callback_task */ }
> > +
> > +struct task_struct *xe_pm_read_callback_task(struct xe_device *xe) {
> > + smp_mb(); /* pairs with xe_pm_write_callback_task */
> > +
> > + return READ_ONCE(xe->pm_callback_task); }
> > +
> > int xe_pm_runtime_suspend(struct xe_device *xe) {
> > struct xe_gt *gt;
> > u8 id;
> > - int err;
> > + int err = 0;
> > +
> > + if (xe->d3cold_allowed && xe_device_mem_access_ongoing(xe))
> > + return -EBUSY;
> > +
> > + /* Disable access_ongoing asserts and prevent recursive pm calls */
> > + xe_pm_write_callback_task(xe, current);
> >
> > if (xe->d3cold_allowed) {
> > - if (xe_device_mem_access_ongoing(xe))
> > - return -EBUSY;
> > -
> > err = xe_bo_evict_all(xe);
> > if (err)
> > - return err;
> > + goto out;
> > }
> >
> > for_each_gt(gt, xe, id) {
> > err = xe_gt_suspend(gt);
> > if (err)
> > - return err;
> > + goto out;
> > }
> >
> > xe_irq_suspend(xe);
> > -
> > - return 0;
> > +out:
> > + xe_pm_write_callback_task(xe, NULL);
> > + return err;
> > }
> >
> > int xe_pm_runtime_resume(struct xe_device *xe) {
> > struct xe_gt *gt;
> > u8 id;
> > - int err;
> > + int err = 0;
> > +
> > + /* Disable access_ongoing asserts and prevent recursive pm calls */
> > + xe_pm_write_callback_task(xe, current);
> >
> > if (xe->d3cold_allowed) {
> > for_each_gt(gt, xe, id) {
> > err = xe_pcode_init(gt);
> > if (err)
> > - return err;
> > + goto out;
> > }
> >
> > /*
> > @@ -182,7 +210,7 @@ int xe_pm_runtime_resume(struct xe_device *xe)
> > */
> > err = xe_bo_restore_kernel(xe);
> > if (err)
> > - return err;
> > + goto out;
> > }
> >
> > xe_irq_resume(xe);
> > @@ -193,10 +221,11 @@ int xe_pm_runtime_resume(struct xe_device *xe)
> > if (xe->d3cold_allowed) {
> > err = xe_bo_restore_user(xe);
> > if (err)
> > - return err;
> > + goto out;
> > }
> > -
> > - return 0;
> > +out:
> > + xe_pm_write_callback_task(xe, NULL);
> > + return err;
> > }
> >
> > int xe_pm_runtime_get(struct xe_device *xe) @@ -210,19 +239,8 @@ int
> > xe_pm_runtime_put(struct xe_device *xe)
> > return pm_runtime_put_autosuspend(xe->drm.dev);
> > }
> >
> > -/* Return true if resume operation happened and usage count was
> > increased */ -bool xe_pm_runtime_resume_if_suspended(struct xe_device
> > *xe) -{
> > - /* In case we are suspended we need to immediately wake up */
> > - if (pm_runtime_suspended(xe->drm.dev))
> > - return !pm_runtime_resume_and_get(xe->drm.dev);
> > -
> > - return false;
> > -}
> > -
> > int xe_pm_runtime_get_if_active(struct xe_device *xe) {
> > - WARN_ON(pm_runtime_suspended(xe->drm.dev));
> > return pm_runtime_get_if_active(xe->drm.dev, true); }
> >
> > diff --git a/drivers/gpu/drm/xe/xe_pm.h b/drivers/gpu/drm/xe/xe_pm.h
> > index 8418ee6faac5..da05556d9a6e 100644
> > --- a/drivers/gpu/drm/xe/xe_pm.h
> > +++ b/drivers/gpu/drm/xe/xe_pm.h
> > @@ -19,8 +19,8 @@ int xe_pm_runtime_suspend(struct xe_device *xe); int
> > xe_pm_runtime_resume(struct xe_device *xe); int
> > xe_pm_runtime_get(struct xe_device *xe); int xe_pm_runtime_put(struct
> > xe_device *xe); -bool xe_pm_runtime_resume_if_suspended(struct
> > xe_device *xe); int xe_pm_runtime_get_if_active(struct xe_device *xe);
> > void xe_pm_assert_unbounded_bridge(struct xe_device *xe);
> > +struct task_struct *xe_pm_read_callback_task(struct xe_device *xe);
> >
> > #endif
> > --
> > 2.41.0
>
More information about the Intel-xe
mailing list