[Intel-xe] [PATCH v13 01/10] drm/xe: fix xe_device_mem_access_get() races

Fri Jul 14 13:04:51 UTC 2023


> -----Original Message-----
> From: Auld, Matthew <matthew.auld at intel.com>
> Sent: Thursday, July 13, 2023 6:53 PM
> To: intel-xe at lists.freedesktop.org
> Cc: Vivi, Rodrigo <rodrigo.vivi at intel.com>; Thomas Hellström
> <thomas.hellstrom at linux.intel.com>; Brost, Matthew
> <matthew.brost at intel.com>; Gupta, Anshuman
> <anshuman.gupta at intel.com>
> Subject: [PATCH v13 01/10] drm/xe: fix xe_device_mem_access_get() races
> 
> It looks like there is at least one race here, given that the
> pm_runtime_suspended() check looks to return false if we are in the process
> of suspending the device (RPM_SUSPENDING vs RPM_SUSPENDED).  We
> later also do xe_pm_runtime_get_if_active(), but since the device is
> suspending or has now suspended, this doesn't do anything either.
> Following from this we can potentially return from
> xe_device_mem_access_get() with the device suspended or about to be,
> leading to broken behaviour.
> 
> Attempt to fix this by always grabbing the runtime ref when our internal ref
> transitions from 0 -> 1. The hard part is then dealing with the runtime_pm
> callbacks also calling xe_device_mem_access_get() and deadlocking, which
> the pm_runtime_suspended() check prevented.
> 
> v2:
>  - ct->lock looks to be primed with fs_reclaim, so holding that and then
>    allocating memory will cause lockdep to complain. Now that we
>    unconditionally grab the mem_access.lock around mem_access_{get,put},
> we
>    need to change the ordering wrt to grabbing the ct->lock, since some of
>    the runtime_pm routines can allocate memory (or at least that's what
>    lockdep seems to suggest). Hopefully not a big deal.  It might be that
>    there were already issues with this, just that the atomics where
>    "hiding" the potential issues.
> v3:
>  - Use Thomas Hellström' idea with tracking the active task that is
>    executing in the resume or suspend callback, in order to avoid
>    recursive resume/suspend calls deadlocking on itself.
>  - Split the ct->lock change.
> v4:
>  - Add smb_mb() around accessing the pm_callback_task for extra safety.
>    (Thomas Hellström)
> v5:
>  - Clarify the kernel-doc for the mem_access.lock, given that it is quite
>    strange in what it protects (data vs code). The real motivation is to
>    aid lockdep. (Rodrigo Vivi)
> v6:
>  - Split out the lock change. We still want this as a lockdep aid but
>    only for the xe_device_mem_access_get() path. Sticking a lock on the
>    put() looks be a no-go, also the runtime_put() there is always async.
>  - Now that the lock is gone move to atomics and rely on the pm code
>    serialising multiple callers on the 0 -> 1 transition.
>  - g2h_worker_func() looks to be the next issue, given that
>    suspend-resume callbacks are using CT, so try to handle that.
> v7:
>  - Add xe_device_mem_access_get_if_ongoing(), and use it in
>    g2h_worker_func().
> v8 (Anshuman):
>  - Just always grab the rpm, instead of just on the 0 -> 1 transition,
>    which is a lot clearer and simplifies the code quite a bit.
> 
> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/258
> Signed-off-by: Matthew Auld <matthew.auld at intel.com>
> Cc: Rodrigo Vivi <rodrigo.vivi at intel.com>
> Cc: Thomas Hellström <thomas.hellstrom at linux.intel.com>
> Cc: Matthew Brost <matthew.brost at intel.com>
> Cc: Anshuman Gupta <anshuman.gupta at intel.com>
> ---
>  drivers/gpu/drm/xe/xe_device.c       | 61 +++++++++++++++++++------
>  drivers/gpu/drm/xe/xe_device.h       | 11 +----
>  drivers/gpu/drm/xe/xe_device_types.h |  8 +++-
>  drivers/gpu/drm/xe/xe_guc_ct.c       | 34 +++++++++++++-
>  drivers/gpu/drm/xe/xe_pm.c           | 68 ++++++++++++++++++----------
>  drivers/gpu/drm/xe/xe_pm.h           |  2 +-
>  6 files changed, 131 insertions(+), 53 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_device.c
> b/drivers/gpu/drm/xe/xe_device.c index 42fedb267454..70f89869b8a6
> 100644
> --- a/drivers/gpu/drm/xe/xe_device.c
> +++ b/drivers/gpu/drm/xe/xe_device.c
> @@ -412,33 +412,66 @@ u32 xe_device_ccs_bytes(struct xe_device *xe,
> u64 size)
>  		DIV_ROUND_UP(size, NUM_BYTES_PER_CCS_BYTE) : 0;  }
> 
> +bool xe_device_mem_access_ongoing(struct xe_device *xe) {
> +	if (xe_pm_read_callback_task(xe) != NULL)
> +		return true;
> +
> +	return atomic_read(&xe->mem_access.ref); }
> +
> +void xe_device_assert_mem_access(struct xe_device *xe) {
> +	XE_WARN_ON(!xe_device_mem_access_ongoing(xe));
> +}
> +
>  bool xe_device_mem_access_get_if_ongoing(struct xe_device *xe)  {
> -	return atomic_inc_not_zero(&xe->mem_access.ref);
> +	bool active;
> +
> +	if (xe_pm_read_callback_task(xe) == current)
> +		return true;
> +
> +	active = xe_pm_runtime_get_if_active(xe);
> +	if (active) {
> +		int ref = atomic_inc_return(&xe->mem_access.ref);
> +
> +		XE_WARN_ON(ref == S32_MAX);
> +	}
> +
> +	return active;
>  }
> 
>  void xe_device_mem_access_get(struct xe_device *xe)  {
> -	bool resumed = xe_pm_runtime_resume_if_suspended(xe);
> -	int ref = atomic_inc_return(&xe->mem_access.ref);
> +	int ref;
> 
> -	if (ref == 1)
> -		xe->mem_access.hold_rpm =
> xe_pm_runtime_get_if_active(xe);
> +	/*
> +	 * This looks racy, but should be fine since the pm_callback_task only
> +	 * transitions from NULL -> current (and back to NULL again), during
> the
> +	 * runtime_resume() or runtime_suspend() callbacks, for which
> there can
> +	 * only be a single one running for our device. We only need to
> prevent
> +	 * recursively calling the runtime_get or runtime_put from those
> +	 * callbacks, as well as preventing triggering any access_ongoing
> +	 * asserts.
> +	 */
> +	if (xe_pm_read_callback_task(xe) == current)
> +		return;
> 
> -	/* The usage counter increased if device was immediately resumed
> */
> -	if (resumed)
> -		xe_pm_runtime_put(xe);
> -
> -	XE_WARN_ON(ref == S32_MAX);
> +	xe_pm_runtime_get(xe);
> +	ref = atomic_inc_return(&xe->mem_access.ref);
> +	XE_WARN_ON(atomic_read(&xe->mem_access.ref) == S32_MAX);
>  }
> 
>  void xe_device_mem_access_put(struct xe_device *xe)  {
> -	bool hold = xe->mem_access.hold_rpm;
> -	int ref = atomic_dec_return(&xe->mem_access.ref);
> +	int ref;
> 
> -	if (!ref && hold)
> -		xe_pm_runtime_put(xe);
> +	if (xe_pm_read_callback_task(xe) == current)
> +		return;
> +
> +	ref = atomic_dec_return(&xe->mem_access.ref);
> +	xe_pm_runtime_put(xe);
> 
>  	XE_WARN_ON(ref < 0);
>  }
> diff --git a/drivers/gpu/drm/xe/xe_device.h
> b/drivers/gpu/drm/xe/xe_device.h index 8e01bbadb149..0ce0351d7d23
> 100644
> --- a/drivers/gpu/drm/xe/xe_device.h
> +++ b/drivers/gpu/drm/xe/xe_device.h
> @@ -141,15 +141,8 @@ void xe_device_mem_access_get(struct xe_device
> *xe);  bool xe_device_mem_access_get_if_ongoing(struct xe_device *xe);
> void xe_device_mem_access_put(struct xe_device *xe);
> 
> -static inline bool xe_device_mem_access_ongoing(struct xe_device *xe) -{
> -	return atomic_read(&xe->mem_access.ref);
> -}
> -
> -static inline void xe_device_assert_mem_access(struct xe_device *xe) -{
> -	XE_WARN_ON(!xe_device_mem_access_ongoing(xe));
> -}
> +void xe_device_assert_mem_access(struct xe_device *xe); bool
> +xe_device_mem_access_ongoing(struct xe_device *xe);
> 
>  static inline bool xe_device_in_fault_mode(struct xe_device *xe)  { diff --git
> a/drivers/gpu/drm/xe/xe_device_types.h
> b/drivers/gpu/drm/xe/xe_device_types.h
> index 26a8de77138a..dcba8bb8f989 100644
> --- a/drivers/gpu/drm/xe/xe_device_types.h
> +++ b/drivers/gpu/drm/xe/xe_device_types.h
> @@ -343,10 +343,14 @@ struct xe_device {
>  	struct {
>  		/** @ref: ref count of memory accesses */
>  		atomic_t ref;
> -		/** @hold_rpm: need to put rpm ref back at the end */
> -		bool hold_rpm;
>  	} mem_access;
> 
> +	/**
> +	 * @pm_callback_task: Track the active task that is running in either
> +	 * the runtime_suspend or runtime_resume callbacks.
> +	 */
> +	struct task_struct *pm_callback_task;
> +
>  	/** @d3cold_allowed: Indicates if d3cold is a valid device state */
>  	bool d3cold_allowed;
> 
> diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c
> b/drivers/gpu/drm/xe/xe_guc_ct.c index 9fb5fd4391d2..e7f7f9b1ce1c
> 100644
> --- a/drivers/gpu/drm/xe/xe_guc_ct.c
> +++ b/drivers/gpu/drm/xe/xe_guc_ct.c
> @@ -19,6 +19,7 @@
>  #include "xe_guc.h"
>  #include "xe_guc_submit.h"
>  #include "xe_map.h"
> +#include "xe_pm.h"
>  #include "xe_trace.h"
> 
>  /* Used when a CT send wants to block and / or receive data */ @@ -1091,9
> +1092,36 @@ static int dequeue_one_g2h(struct xe_guc_ct *ct)  static void
> g2h_worker_func(struct work_struct *w)  {
>  	struct xe_guc_ct *ct = container_of(w, struct xe_guc_ct,
> g2h_worker);
> +	bool ongoing;
>  	int ret;
> 
> -	xe_device_mem_access_get(ct_to_xe(ct));
> +	/*
> +	 * Normal users must always hold mem_access.ref around CT calls.
> However
> +	 * during the runtime pm callbacks we rely on CT to talk to the GuC,
> but
> +	 * at this stage we can't rely on mem_access.ref and even the
> +	 * callback_task will be different than current.  For such cases we just
> +	 * need to ensure we always process the responses from any
> blocking
> +	 * ct_send requests or where we otherwise expect some response
> when
> +	 * initiated from those callbacks (which will need to wait for the
> below
> +	 * dequeue_one_g2h()).  The dequeue_one_g2h() will gracefully fail
> if
> +	 * the device has suspended to the point that the CT communication
> has
> +	 * been disabled.
> +	 *
> +	 * If we are inside the runtime pm callback, we can be the only task
> +	 * still issuing CT requests (since that requires having the
> +	 * mem_access.ref).  It seems like it might in theory be possible to
> +	 * receive unsolicited events from the GuC just as we are
> +	 * suspending-resuming, but those will currently anyway be lost
> when
> +	 * eventually exiting from suspend, hence no need to wake up the
> device
> +	 * here. If we ever need something stronger than get_if_ongoing()
> then
> +	 * we need to be careful with blocking the pm callbacks from getting
> CT
> +	 * responses, if the worker here is blocked on those callbacks
> +	 * completing, creating a deadlock.
> +	 */
> +	ongoing = xe_device_mem_access_get_if_ongoing(ct_to_xe(ct));
> +	if (!ongoing && xe_pm_read_callback_task(ct_to_xe(ct)) == NULL)
> +		return;
Patch looks good tome me from runtime PM pov but i am not familiar with GuC CT flow,
I cannot provide RB specially for above code.
For runtime PM perspective.
Acked-by: Anshuman Gupta <anshuman.gupta at intel.com>
Regards,
Anshuman Gupta.
> +
>  	do {
>  		mutex_lock(&ct->lock);
>  		ret = dequeue_one_g2h(ct);
> @@ -1107,7 +1135,9 @@ static void g2h_worker_func(struct work_struct
> *w)
>  			kick_reset(ct);
>  		}
>  	} while (ret == 1);
> -	xe_device_mem_access_put(ct_to_xe(ct));
> +
> +	if (ongoing)
> +		xe_device_mem_access_put(ct_to_xe(ct));
>  }
> 
>  static void guc_ctb_snapshot_capture(struct xe_device *xe, struct guc_ctb
> *ctb, diff --git a/drivers/gpu/drm/xe/xe_pm.c
> b/drivers/gpu/drm/xe/xe_pm.c index c7901f379aee..fe99820db2df 100644
> --- a/drivers/gpu/drm/xe/xe_pm.c
> +++ b/drivers/gpu/drm/xe/xe_pm.c
> @@ -137,43 +137,71 @@ void xe_pm_runtime_fini(struct xe_device *xe)
>  	pm_runtime_forbid(dev);
>  }
> 
> +static void xe_pm_write_callback_task(struct xe_device *xe,
> +				      struct task_struct *task)
> +{
> +	WRITE_ONCE(xe->pm_callback_task, task);
> +
> +	/*
> +	 * Just in case it's somehow possible for our writes to be reordered to
> +	 * the extent that something else re-uses the task written in
> +	 * pm_callback_task. For example after returning from the callback,
> but
> +	 * before the reordered write that resets pm_callback_task back to
> NULL.
> +	 */
> +	smp_mb(); /* pairs with xe_pm_read_callback_task */ }
> +
> +struct task_struct *xe_pm_read_callback_task(struct xe_device *xe) {
> +	smp_mb(); /* pairs with xe_pm_write_callback_task */
> +
> +	return READ_ONCE(xe->pm_callback_task); }
> +
>  int xe_pm_runtime_suspend(struct xe_device *xe)  {
>  	struct xe_gt *gt;
>  	u8 id;
> -	int err;
> +	int err = 0;
> +
> +	if (xe->d3cold_allowed && xe_device_mem_access_ongoing(xe))
> +		return -EBUSY;
> +
> +	/* Disable access_ongoing asserts and prevent recursive pm calls */
> +	xe_pm_write_callback_task(xe, current);
> 
>  	if (xe->d3cold_allowed) {
> -		if (xe_device_mem_access_ongoing(xe))
> -			return -EBUSY;
> -
>  		err = xe_bo_evict_all(xe);
>  		if (err)
> -			return err;
> +			goto out;
>  	}
> 
>  	for_each_gt(gt, xe, id) {
>  		err = xe_gt_suspend(gt);
>  		if (err)
> -			return err;
> +			goto out;
>  	}
> 
>  	xe_irq_suspend(xe);
> -
> -	return 0;
> +out:
> +	xe_pm_write_callback_task(xe, NULL);
> +	return err;
>  }
> 
>  int xe_pm_runtime_resume(struct xe_device *xe)  {
>  	struct xe_gt *gt;
>  	u8 id;
> -	int err;
> +	int err = 0;
> +
> +	/* Disable access_ongoing asserts and prevent recursive pm calls */
> +	xe_pm_write_callback_task(xe, current);
> 
>  	if (xe->d3cold_allowed) {
>  		for_each_gt(gt, xe, id) {
>  			err = xe_pcode_init(gt);
>  			if (err)
> -				return err;
> +				goto out;
>  		}
> 
>  		/*
> @@ -182,7 +210,7 @@ int xe_pm_runtime_resume(struct xe_device *xe)
>  		 */
>  		err = xe_bo_restore_kernel(xe);
>  		if (err)
> -			return err;
> +			goto out;
>  	}
> 
>  	xe_irq_resume(xe);
> @@ -193,10 +221,11 @@ int xe_pm_runtime_resume(struct xe_device *xe)
>  	if (xe->d3cold_allowed) {
>  		err = xe_bo_restore_user(xe);
>  		if (err)
> -			return err;
> +			goto out;
>  	}
> -
> -	return 0;
> +out:
> +	xe_pm_write_callback_task(xe, NULL);
> +	return err;
>  }
> 
>  int xe_pm_runtime_get(struct xe_device *xe) @@ -210,19 +239,8 @@ int
> xe_pm_runtime_put(struct xe_device *xe)
>  	return pm_runtime_put_autosuspend(xe->drm.dev);
>  }
> 
> -/* Return true if resume operation happened and usage count was
> increased */ -bool xe_pm_runtime_resume_if_suspended(struct xe_device
> *xe) -{
> -	/* In case we are suspended we need to immediately wake up */
> -	if (pm_runtime_suspended(xe->drm.dev))
> -		return !pm_runtime_resume_and_get(xe->drm.dev);
> -
> -	return false;
> -}
> -
>  int xe_pm_runtime_get_if_active(struct xe_device *xe)  {
> -	WARN_ON(pm_runtime_suspended(xe->drm.dev));
>  	return pm_runtime_get_if_active(xe->drm.dev, true);  }
> 
> diff --git a/drivers/gpu/drm/xe/xe_pm.h b/drivers/gpu/drm/xe/xe_pm.h
> index 8418ee6faac5..da05556d9a6e 100644
> --- a/drivers/gpu/drm/xe/xe_pm.h
> +++ b/drivers/gpu/drm/xe/xe_pm.h
> @@ -19,8 +19,8 @@ int xe_pm_runtime_suspend(struct xe_device *xe);  int
> xe_pm_runtime_resume(struct xe_device *xe);  int
> xe_pm_runtime_get(struct xe_device *xe);  int xe_pm_runtime_put(struct
> xe_device *xe); -bool xe_pm_runtime_resume_if_suspended(struct
> xe_device *xe);  int xe_pm_runtime_get_if_active(struct xe_device *xe);
> void xe_pm_assert_unbounded_bridge(struct xe_device *xe);
> +struct task_struct *xe_pm_read_callback_task(struct xe_device *xe);
> 
>  #endif
> --
> 2.41.0