[Intel-xe] [PATCH v13 01/10] drm/xe: fix xe_device_mem_access_get() races
Gupta, Anshuman
anshuman.gupta at intel.com
Fri Jul 14 13:04:51 UTC 2023
> -----Original Message-----
> From: Auld, Matthew <matthew.auld at intel.com>
> Sent: Thursday, July 13, 2023 6:53 PM
> To: intel-xe at lists.freedesktop.org
> Cc: Vivi, Rodrigo <rodrigo.vivi at intel.com>; Thomas Hellström
> <thomas.hellstrom at linux.intel.com>; Brost, Matthew
> <matthew.brost at intel.com>; Gupta, Anshuman
> <anshuman.gupta at intel.com>
> Subject: [PATCH v13 01/10] drm/xe: fix xe_device_mem_access_get() races
>
> It looks like there is at least one race here, given that the
> pm_runtime_suspended() check looks to return false if we are in the process
> of suspending the device (RPM_SUSPENDING vs RPM_SUSPENDED). We
> later also do xe_pm_runtime_get_if_active(), but since the device is
> suspending or has now suspended, this doesn't do anything either.
> Following from this we can potentially return from
> xe_device_mem_access_get() with the device suspended or about to be,
> leading to broken behaviour.
>
> Attempt to fix this by always grabbing the runtime ref when our internal ref
> transitions from 0 -> 1. The hard part is then dealing with the runtime_pm
> callbacks also calling xe_device_mem_access_get() and deadlocking, which
> the pm_runtime_suspended() check prevented.
>
> v2:
> - ct->lock looks to be primed with fs_reclaim, so holding that and then
> allocating memory will cause lockdep to complain. Now that we
> unconditionally grab the mem_access.lock around mem_access_{get,put},
> we
> need to change the ordering wrt to grabbing the ct->lock, since some of
> the runtime_pm routines can allocate memory (or at least that's what
> lockdep seems to suggest). Hopefully not a big deal. It might be that
> there were already issues with this, just that the atomics where
> "hiding" the potential issues.
> v3:
> - Use Thomas Hellström' idea with tracking the active task that is
> executing in the resume or suspend callback, in order to avoid
> recursive resume/suspend calls deadlocking on itself.
> - Split the ct->lock change.
> v4:
> - Add smb_mb() around accessing the pm_callback_task for extra safety.
> (Thomas Hellström)
> v5:
> - Clarify the kernel-doc for the mem_access.lock, given that it is quite
> strange in what it protects (data vs code). The real motivation is to
> aid lockdep. (Rodrigo Vivi)
> v6:
> - Split out the lock change. We still want this as a lockdep aid but
> only for the xe_device_mem_access_get() path. Sticking a lock on the
> put() looks be a no-go, also the runtime_put() there is always async.
> - Now that the lock is gone move to atomics and rely on the pm code
> serialising multiple callers on the 0 -> 1 transition.
> - g2h_worker_func() looks to be the next issue, given that
> suspend-resume callbacks are using CT, so try to handle that.
> v7:
> - Add xe_device_mem_access_get_if_ongoing(), and use it in
> g2h_worker_func().
> v8 (Anshuman):
> - Just always grab the rpm, instead of just on the 0 -> 1 transition,
> which is a lot clearer and simplifies the code quite a bit.
>
> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/258
> Signed-off-by: Matthew Auld <matthew.auld at intel.com>
> Cc: Rodrigo Vivi <rodrigo.vivi at intel.com>
> Cc: Thomas Hellström <thomas.hellstrom at linux.intel.com>
> Cc: Matthew Brost <matthew.brost at intel.com>
> Cc: Anshuman Gupta <anshuman.gupta at intel.com>
> ---
> drivers/gpu/drm/xe/xe_device.c | 61 +++++++++++++++++++------
> drivers/gpu/drm/xe/xe_device.h | 11 +----
> drivers/gpu/drm/xe/xe_device_types.h | 8 +++-
> drivers/gpu/drm/xe/xe_guc_ct.c | 34 +++++++++++++-
> drivers/gpu/drm/xe/xe_pm.c | 68 ++++++++++++++++++----------
> drivers/gpu/drm/xe/xe_pm.h | 2 +-
> 6 files changed, 131 insertions(+), 53 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_device.c
> b/drivers/gpu/drm/xe/xe_device.c index 42fedb267454..70f89869b8a6
> 100644
> --- a/drivers/gpu/drm/xe/xe_device.c
> +++ b/drivers/gpu/drm/xe/xe_device.c
> @@ -412,33 +412,66 @@ u32 xe_device_ccs_bytes(struct xe_device *xe,
> u64 size)
> DIV_ROUND_UP(size, NUM_BYTES_PER_CCS_BYTE) : 0; }
>
> +bool xe_device_mem_access_ongoing(struct xe_device *xe) {
> + if (xe_pm_read_callback_task(xe) != NULL)
> + return true;
> +
> + return atomic_read(&xe->mem_access.ref); }
> +
> +void xe_device_assert_mem_access(struct xe_device *xe) {
> + XE_WARN_ON(!xe_device_mem_access_ongoing(xe));
> +}
> +
> bool xe_device_mem_access_get_if_ongoing(struct xe_device *xe) {
> - return atomic_inc_not_zero(&xe->mem_access.ref);
> + bool active;
> +
> + if (xe_pm_read_callback_task(xe) == current)
> + return true;
> +
> + active = xe_pm_runtime_get_if_active(xe);
> + if (active) {
> + int ref = atomic_inc_return(&xe->mem_access.ref);
> +
> + XE_WARN_ON(ref == S32_MAX);
> + }
> +
> + return active;
> }
>
> void xe_device_mem_access_get(struct xe_device *xe) {
> - bool resumed = xe_pm_runtime_resume_if_suspended(xe);
> - int ref = atomic_inc_return(&xe->mem_access.ref);
> + int ref;
>
> - if (ref == 1)
> - xe->mem_access.hold_rpm =
> xe_pm_runtime_get_if_active(xe);
> + /*
> + * This looks racy, but should be fine since the pm_callback_task only
> + * transitions from NULL -> current (and back to NULL again), during
> the
> + * runtime_resume() or runtime_suspend() callbacks, for which
> there can
> + * only be a single one running for our device. We only need to
> prevent
> + * recursively calling the runtime_get or runtime_put from those
> + * callbacks, as well as preventing triggering any access_ongoing
> + * asserts.
> + */
> + if (xe_pm_read_callback_task(xe) == current)
> + return;
>
> - /* The usage counter increased if device was immediately resumed
> */
> - if (resumed)
> - xe_pm_runtime_put(xe);
> -
> - XE_WARN_ON(ref == S32_MAX);
> + xe_pm_runtime_get(xe);
> + ref = atomic_inc_return(&xe->mem_access.ref);
> + XE_WARN_ON(atomic_read(&xe->mem_access.ref) == S32_MAX);
> }
>
> void xe_device_mem_access_put(struct xe_device *xe) {
> - bool hold = xe->mem_access.hold_rpm;
> - int ref = atomic_dec_return(&xe->mem_access.ref);
> + int ref;
>
> - if (!ref && hold)
> - xe_pm_runtime_put(xe);
> + if (xe_pm_read_callback_task(xe) == current)
> + return;
> +
> + ref = atomic_dec_return(&xe->mem_access.ref);
> + xe_pm_runtime_put(xe);
>
> XE_WARN_ON(ref < 0);
> }
> diff --git a/drivers/gpu/drm/xe/xe_device.h
> b/drivers/gpu/drm/xe/xe_device.h index 8e01bbadb149..0ce0351d7d23
> 100644
> --- a/drivers/gpu/drm/xe/xe_device.h
> +++ b/drivers/gpu/drm/xe/xe_device.h
> @@ -141,15 +141,8 @@ void xe_device_mem_access_get(struct xe_device
> *xe); bool xe_device_mem_access_get_if_ongoing(struct xe_device *xe);
> void xe_device_mem_access_put(struct xe_device *xe);
>
> -static inline bool xe_device_mem_access_ongoing(struct xe_device *xe) -{
> - return atomic_read(&xe->mem_access.ref);
> -}
> -
> -static inline void xe_device_assert_mem_access(struct xe_device *xe) -{
> - XE_WARN_ON(!xe_device_mem_access_ongoing(xe));
> -}
> +void xe_device_assert_mem_access(struct xe_device *xe); bool
> +xe_device_mem_access_ongoing(struct xe_device *xe);
>
> static inline bool xe_device_in_fault_mode(struct xe_device *xe) { diff --git
> a/drivers/gpu/drm/xe/xe_device_types.h
> b/drivers/gpu/drm/xe/xe_device_types.h
> index 26a8de77138a..dcba8bb8f989 100644
> --- a/drivers/gpu/drm/xe/xe_device_types.h
> +++ b/drivers/gpu/drm/xe/xe_device_types.h
> @@ -343,10 +343,14 @@ struct xe_device {
> struct {
> /** @ref: ref count of memory accesses */
> atomic_t ref;
> - /** @hold_rpm: need to put rpm ref back at the end */
> - bool hold_rpm;
> } mem_access;
>
> + /**
> + * @pm_callback_task: Track the active task that is running in either
> + * the runtime_suspend or runtime_resume callbacks.
> + */
> + struct task_struct *pm_callback_task;
> +
> /** @d3cold_allowed: Indicates if d3cold is a valid device state */
> bool d3cold_allowed;
>
> diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c
> b/drivers/gpu/drm/xe/xe_guc_ct.c index 9fb5fd4391d2..e7f7f9b1ce1c
> 100644
> --- a/drivers/gpu/drm/xe/xe_guc_ct.c
> +++ b/drivers/gpu/drm/xe/xe_guc_ct.c
> @@ -19,6 +19,7 @@
> #include "xe_guc.h"
> #include "xe_guc_submit.h"
> #include "xe_map.h"
> +#include "xe_pm.h"
> #include "xe_trace.h"
>
> /* Used when a CT send wants to block and / or receive data */ @@ -1091,9
> +1092,36 @@ static int dequeue_one_g2h(struct xe_guc_ct *ct) static void
> g2h_worker_func(struct work_struct *w) {
> struct xe_guc_ct *ct = container_of(w, struct xe_guc_ct,
> g2h_worker);
> + bool ongoing;
> int ret;
>
> - xe_device_mem_access_get(ct_to_xe(ct));
> + /*
> + * Normal users must always hold mem_access.ref around CT calls.
> However
> + * during the runtime pm callbacks we rely on CT to talk to the GuC,
> but
> + * at this stage we can't rely on mem_access.ref and even the
> + * callback_task will be different than current. For such cases we just
> + * need to ensure we always process the responses from any
> blocking
> + * ct_send requests or where we otherwise expect some response
> when
> + * initiated from those callbacks (which will need to wait for the
> below
> + * dequeue_one_g2h()). The dequeue_one_g2h() will gracefully fail
> if
> + * the device has suspended to the point that the CT communication
> has
> + * been disabled.
> + *
> + * If we are inside the runtime pm callback, we can be the only task
> + * still issuing CT requests (since that requires having the
> + * mem_access.ref). It seems like it might in theory be possible to
> + * receive unsolicited events from the GuC just as we are
> + * suspending-resuming, but those will currently anyway be lost
> when
> + * eventually exiting from suspend, hence no need to wake up the
> device
> + * here. If we ever need something stronger than get_if_ongoing()
> then
> + * we need to be careful with blocking the pm callbacks from getting
> CT
> + * responses, if the worker here is blocked on those callbacks
> + * completing, creating a deadlock.
> + */
> + ongoing = xe_device_mem_access_get_if_ongoing(ct_to_xe(ct));
> + if (!ongoing && xe_pm_read_callback_task(ct_to_xe(ct)) == NULL)
> + return;
Patch looks good tome me from runtime PM pov but i am not familiar with GuC CT flow,
I cannot provide RB specially for above code.
For runtime PM perspective.
Acked-by: Anshuman Gupta <anshuman.gupta at intel.com>
Regards,
Anshuman Gupta.
> +
> do {
> mutex_lock(&ct->lock);
> ret = dequeue_one_g2h(ct);
> @@ -1107,7 +1135,9 @@ static void g2h_worker_func(struct work_struct
> *w)
> kick_reset(ct);
> }
> } while (ret == 1);
> - xe_device_mem_access_put(ct_to_xe(ct));
> +
> + if (ongoing)
> + xe_device_mem_access_put(ct_to_xe(ct));
> }
>
> static void guc_ctb_snapshot_capture(struct xe_device *xe, struct guc_ctb
> *ctb, diff --git a/drivers/gpu/drm/xe/xe_pm.c
> b/drivers/gpu/drm/xe/xe_pm.c index c7901f379aee..fe99820db2df 100644
> --- a/drivers/gpu/drm/xe/xe_pm.c
> +++ b/drivers/gpu/drm/xe/xe_pm.c
> @@ -137,43 +137,71 @@ void xe_pm_runtime_fini(struct xe_device *xe)
> pm_runtime_forbid(dev);
> }
>
> +static void xe_pm_write_callback_task(struct xe_device *xe,
> + struct task_struct *task)
> +{
> + WRITE_ONCE(xe->pm_callback_task, task);
> +
> + /*
> + * Just in case it's somehow possible for our writes to be reordered to
> + * the extent that something else re-uses the task written in
> + * pm_callback_task. For example after returning from the callback,
> but
> + * before the reordered write that resets pm_callback_task back to
> NULL.
> + */
> + smp_mb(); /* pairs with xe_pm_read_callback_task */ }
> +
> +struct task_struct *xe_pm_read_callback_task(struct xe_device *xe) {
> + smp_mb(); /* pairs with xe_pm_write_callback_task */
> +
> + return READ_ONCE(xe->pm_callback_task); }
> +
> int xe_pm_runtime_suspend(struct xe_device *xe) {
> struct xe_gt *gt;
> u8 id;
> - int err;
> + int err = 0;
> +
> + if (xe->d3cold_allowed && xe_device_mem_access_ongoing(xe))
> + return -EBUSY;
> +
> + /* Disable access_ongoing asserts and prevent recursive pm calls */
> + xe_pm_write_callback_task(xe, current);
>
> if (xe->d3cold_allowed) {
> - if (xe_device_mem_access_ongoing(xe))
> - return -EBUSY;
> -
> err = xe_bo_evict_all(xe);
> if (err)
> - return err;
> + goto out;
> }
>
> for_each_gt(gt, xe, id) {
> err = xe_gt_suspend(gt);
> if (err)
> - return err;
> + goto out;
> }
>
> xe_irq_suspend(xe);
> -
> - return 0;
> +out:
> + xe_pm_write_callback_task(xe, NULL);
> + return err;
> }
>
> int xe_pm_runtime_resume(struct xe_device *xe) {
> struct xe_gt *gt;
> u8 id;
> - int err;
> + int err = 0;
> +
> + /* Disable access_ongoing asserts and prevent recursive pm calls */
> + xe_pm_write_callback_task(xe, current);
>
> if (xe->d3cold_allowed) {
> for_each_gt(gt, xe, id) {
> err = xe_pcode_init(gt);
> if (err)
> - return err;
> + goto out;
> }
>
> /*
> @@ -182,7 +210,7 @@ int xe_pm_runtime_resume(struct xe_device *xe)
> */
> err = xe_bo_restore_kernel(xe);
> if (err)
> - return err;
> + goto out;
> }
>
> xe_irq_resume(xe);
> @@ -193,10 +221,11 @@ int xe_pm_runtime_resume(struct xe_device *xe)
> if (xe->d3cold_allowed) {
> err = xe_bo_restore_user(xe);
> if (err)
> - return err;
> + goto out;
> }
> -
> - return 0;
> +out:
> + xe_pm_write_callback_task(xe, NULL);
> + return err;
> }
>
> int xe_pm_runtime_get(struct xe_device *xe) @@ -210,19 +239,8 @@ int
> xe_pm_runtime_put(struct xe_device *xe)
> return pm_runtime_put_autosuspend(xe->drm.dev);
> }
>
> -/* Return true if resume operation happened and usage count was
> increased */ -bool xe_pm_runtime_resume_if_suspended(struct xe_device
> *xe) -{
> - /* In case we are suspended we need to immediately wake up */
> - if (pm_runtime_suspended(xe->drm.dev))
> - return !pm_runtime_resume_and_get(xe->drm.dev);
> -
> - return false;
> -}
> -
> int xe_pm_runtime_get_if_active(struct xe_device *xe) {
> - WARN_ON(pm_runtime_suspended(xe->drm.dev));
> return pm_runtime_get_if_active(xe->drm.dev, true); }
>
> diff --git a/drivers/gpu/drm/xe/xe_pm.h b/drivers/gpu/drm/xe/xe_pm.h
> index 8418ee6faac5..da05556d9a6e 100644
> --- a/drivers/gpu/drm/xe/xe_pm.h
> +++ b/drivers/gpu/drm/xe/xe_pm.h
> @@ -19,8 +19,8 @@ int xe_pm_runtime_suspend(struct xe_device *xe); int
> xe_pm_runtime_resume(struct xe_device *xe); int
> xe_pm_runtime_get(struct xe_device *xe); int xe_pm_runtime_put(struct
> xe_device *xe); -bool xe_pm_runtime_resume_if_suspended(struct
> xe_device *xe); int xe_pm_runtime_get_if_active(struct xe_device *xe);
> void xe_pm_assert_unbounded_bridge(struct xe_device *xe);
> +struct task_struct *xe_pm_read_callback_task(struct xe_device *xe);
>
> #endif
> --
> 2.41.0
More information about the Intel-xe
mailing list