[Intel-xe] [PATCH v3 5/6] drm/xe: fix xe_device_mem_access_get() races
Matthew Auld
matthew.auld at intel.com
Tue May 16 10:02:33 UTC 2023
On 16/05/2023 08:06, Thomas Hellström wrote:
>
> On 5/15/23 16:40, Matthew Auld wrote:
>> It looks like there is at least one race here, given that the
>> pm_runtime_suspended() check looks to return false if we are in the
>> process of suspending the device (RPM_SUSPENDING vs RPM_SUSPENDED). We
>> later also do xe_pm_runtime_get_if_active(), but since the device is
>> suspending or has now suspended, this doesn't do anything either.
>> Following from this we can potentially return from
>> xe_device_mem_access_get() with the device suspended or about to be,
>> leading to broken behaviour.
>>
>> Attempt to fix this by always grabbing the runtime ref when our
>> internal ref transitions from 0 -> 1, and then wrap the whole thing with
>> a lock to ensure callers are serialized.
>>
>> v2:
>> - ct->lock looks to be primed with fs_reclaim, so holding that and then
>> allocating memory will cause lockdep to complain. Now that we
>> unconditionally grab the mem_access.lock around
>> mem_access_{get,put}, we
>> need to change the ordering wrt to grabbing the ct->lock, since
>> some of
>> the runtime_pm routines can allocate memory (or at least that's what
>> lockdep seems to suggest). Hopefully not a big deal. It might be
>> that
>> there were already issues with this, just that the atomics where
>> "hiding" the potential issues.
>> v3:
>> - Use Thomas Hellström' idea with tracking the active task that is
>> executing in the resume or suspend callback, in order to avoid
>> recursive resume/suspend calls deadlocking on itself.
>> - Split the ct->lock change.
>>
>> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/258
>> Signed-off-by: Matthew Auld <matthew.auld at intel.com>
>> Cc: Rodrigo Vivi <rodrigo.vivi at intel.com>
>> Cc: Thomas Hellström <thomas.hellstrom at linux.intel.com>
>> Cc: Matthew Brost <matthew.brost at intel.com>
>> Signed-off-by: Matthew Auld <matthew.auld at intel.com>
>> ---
>> drivers/gpu/drm/xe/xe_device.c | 30 ++++++++++++++-----
>> drivers/gpu/drm/xe/xe_device.h | 6 ++++
>> drivers/gpu/drm/xe/xe_device_types.h | 11 ++++++-
>> drivers/gpu/drm/xe/xe_pm.c | 45 +++++++++++++++-------------
>> drivers/gpu/drm/xe/xe_pm.h | 2 +-
>> 5 files changed, 64 insertions(+), 30 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_device.c
>> b/drivers/gpu/drm/xe/xe_device.c
>> index d0297ee432b2..123343bc33b0 100644
>> --- a/drivers/gpu/drm/xe/xe_device.c
>> +++ b/drivers/gpu/drm/xe/xe_device.c
>> @@ -405,22 +405,38 @@ u32 xe_device_ccs_bytes(struct xe_device *xe,
>> u64 size)
>> void xe_device_mem_access_get(struct xe_device *xe)
>> {
>> - bool resumed = xe_pm_runtime_resume_if_suspended(xe);
>> + /*
>> + * This looks racy, but should be fine since the pm_callback_task
>> only
>> + * transitions from NULL -> current (and back to NULL again),
>> during the
>> + * runtime_resume() or runtime_suspend() callbacks, for which
>> there can
>> + * only be a single one running for our device. We only need to
>> prevent
>> + * recursively calling the runtime_get or runtime_put from those
>> + * callbacks, as well as preventing triggering any access_ongoing
>> + * asserts.
>> + */
>> + if (READ_ONCE(xe->pm_callback_task) == current)
>> + return;
>
> I figure in theory with excessive write reordering if the
> pm_callback_task writes NULL and then dies and a new task gets assigned
> that exact same task pointer on another processor, it might incorrectly
> read the old value pointing to itself before the reordered NULL write is
> actually executed, so we might actually need an smp_mb() before the
> pm_callback_task read and after the smp_mb() of the pm_callback_task
> write, with a "pairs with" comment. This is of course not needed if a
> task exit and a task init execute a memory barrier, but I don't know if
> they do. What do you think?
Good point, yeah I figure we just add the barrier and play it safe here.
>
> Otherwise LGTM,
Thanks.
>
> /Thomas
>
>
>> + /*
>> + * It's possible to get rid of the mem_access.lock here, however
>> lockdep
>> + * has an easier time digesting the mem_access.lock -> runtime_pm
>> chain
>> + * (and any locks the caller is holding), on all transitions, and
>> not
>> + * for example just on the 0 -> 1.
>> + */
>> mutex_lock(&xe->mem_access.lock);
>> - if (xe->mem_access.ref++ == 0)
>> - xe->mem_access.hold_rpm = xe_pm_runtime_get_if_active(xe);
>> + if (xe->mem_access.ref == 0)
>> + xe->mem_access.hold_rpm = xe_pm_runtime_resume_and_get(xe);
>> + xe->mem_access.ref++;
>> mutex_unlock(&xe->mem_access.lock);
>> - /* The usage counter increased if device was immediately resumed */
>> - if (resumed)
>> - xe_pm_runtime_put(xe);
>> -
>> XE_WARN_ON(xe->mem_access.ref == S32_MAX);
>> }
>> void xe_device_mem_access_put(struct xe_device *xe)
>> {
>> + if (READ_ONCE(xe->pm_callback_task) == current)
>> + return;
>> +
>> mutex_lock(&xe->mem_access.lock);
>> if (--xe->mem_access.ref == 0 && xe->mem_access.hold_rpm)
>> xe_pm_runtime_put(xe);
>> diff --git a/drivers/gpu/drm/xe/xe_device.h
>> b/drivers/gpu/drm/xe/xe_device.h
>> index 9ab7e6134f89..7a4ba5212d9c 100644
>> --- a/drivers/gpu/drm/xe/xe_device.h
>> +++ b/drivers/gpu/drm/xe/xe_device.h
>> @@ -92,6 +92,9 @@ void xe_device_mem_access_put(struct xe_device *xe);
>> static inline void xe_device_assert_mem_access(struct xe_device *xe)
>> {
>> + if (READ_ONCE(xe->pm_callback_task) == current)
>> + return;
>> +
>> XE_WARN_ON(!xe->mem_access.ref);
>> }
>> @@ -99,6 +102,9 @@ static inline bool
>> xe_device_mem_access_ongoing(struct xe_device *xe)
>> {
>> bool ret;
>> + if (READ_ONCE(xe->pm_callback_task) == current)
>> + return true;
>> +
>> mutex_lock(&xe->mem_access.lock);
>> ret = xe->mem_access.ref;
>> mutex_unlock(&xe->mem_access.lock);
>> diff --git a/drivers/gpu/drm/xe/xe_device_types.h
>> b/drivers/gpu/drm/xe/xe_device_types.h
>> index d91193c41d2e..0f4f00fa4d18 100644
>> --- a/drivers/gpu/drm/xe/xe_device_types.h
>> +++ b/drivers/gpu/drm/xe/xe_device_types.h
>> @@ -256,7 +256,10 @@ struct xe_device {
>> * triggering additional actions when they occur.
>> */
>> struct {
>> - /** @lock: protect the ref count */
>> + /**
>> + * @lock: Serialize xe_device_mem_access users,
>> + * and protect the below internal state, like @ref.
>> + */
>> struct mutex lock;
>> /** @ref: ref count of memory accesses */
>> s32 ref;
>> @@ -264,6 +267,12 @@ struct xe_device {
>> bool hold_rpm;
>> } mem_access;
>> + /**
>> + * @pm_callback_task: Track the active task that is running in
>> either
>> + * the runtime_suspend or runtime_resume callbacks.
>> + */
>> + struct task_struct *pm_callback_task;
>> +
>> /** @d3cold_allowed: Indicates if d3cold is a valid device state */
>> bool d3cold_allowed;
>> diff --git a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c
>> index b7b57f10ba25..6f25268ca492 100644
>> --- a/drivers/gpu/drm/xe/xe_pm.c
>> +++ b/drivers/gpu/drm/xe/xe_pm.c
>> @@ -141,39 +141,46 @@ int xe_pm_runtime_suspend(struct xe_device *xe)
>> {
>> struct xe_gt *gt;
>> u8 id;
>> - int err;
>> + int err = 0;
>> +
>> + if (xe->d3cold_allowed && xe_device_mem_access_ongoing(xe))
>> + return -EBUSY;
>> +
>> + /* Disable access_ongoing asserts and prevent recursive pm calls */
>> + WRITE_ONCE(xe->pm_callback_task, current);
>> if (xe->d3cold_allowed) {
>> - if (xe_device_mem_access_ongoing(xe))
>> - return -EBUSY;
>> -
>> err = xe_bo_evict_all(xe);
>> if (err)
>> - return err;
>> + goto out;
>> }
>> for_each_gt(gt, xe, id) {
>> err = xe_gt_suspend(gt);
>> if (err)
>> - return err;
>> + goto out;
>> }
>> xe_irq_suspend(xe);
>> -
>> - return 0;
>> +out:
>> + WRITE_ONCE(xe->pm_callback_task, NULL);
>> + return err;
>> }
>> int xe_pm_runtime_resume(struct xe_device *xe)
>> {
>> struct xe_gt *gt;
>> u8 id;
>> - int err;
>> + int err = 0;
>> +
>> + /* Disable access_ongoing asserts and prevent recursive pm calls */
>> + WRITE_ONCE(xe->pm_callback_task, current);
>> if (xe->d3cold_allowed) {
>> for_each_gt(gt, xe, id) {
>> err = xe_pcode_init(gt);
>> if (err)
>> - return err;
>> + goto out;
>> }
>> /*
>> @@ -182,7 +189,7 @@ int xe_pm_runtime_resume(struct xe_device *xe)
>> */
>> err = xe_bo_restore_kernel(xe);
>> if (err)
>> - return err;
>> + goto out;
>> }
>> xe_irq_resume(xe);
>> @@ -193,10 +200,11 @@ int xe_pm_runtime_resume(struct xe_device *xe)
>> if (xe->d3cold_allowed) {
>> err = xe_bo_restore_user(xe);
>> if (err)
>> - return err;
>> + goto out;
>> }
>> -
>> - return 0;
>> +out:
>> + WRITE_ONCE(xe->pm_callback_task, NULL);
>> + return err;
>> }
>> int xe_pm_runtime_get(struct xe_device *xe)
>> @@ -210,14 +218,9 @@ int xe_pm_runtime_put(struct xe_device *xe)
>> return pm_runtime_put_autosuspend(xe->drm.dev);
>> }
>> -/* Return true if resume operation happened and usage count was
>> increased */
>> -bool xe_pm_runtime_resume_if_suspended(struct xe_device *xe)
>> +bool xe_pm_runtime_resume_and_get(struct xe_device *xe)
>> {
>> - /* In case we are suspended we need to immediately wake up */
>> - if (pm_runtime_suspended(xe->drm.dev))
>> - return !pm_runtime_resume_and_get(xe->drm.dev);
>> -
>> - return false;
>> + return !pm_runtime_resume_and_get(xe->drm.dev);
>> }
>> int xe_pm_runtime_get_if_active(struct xe_device *xe)
>> diff --git a/drivers/gpu/drm/xe/xe_pm.h b/drivers/gpu/drm/xe/xe_pm.h
>> index 6a885585f653..1b4c15b5e71a 100644
>> --- a/drivers/gpu/drm/xe/xe_pm.h
>> +++ b/drivers/gpu/drm/xe/xe_pm.h
>> @@ -19,7 +19,7 @@ int xe_pm_runtime_suspend(struct xe_device *xe);
>> int xe_pm_runtime_resume(struct xe_device *xe);
>> int xe_pm_runtime_get(struct xe_device *xe);
>> int xe_pm_runtime_put(struct xe_device *xe);
>> -bool xe_pm_runtime_resume_if_suspended(struct xe_device *xe);
>> +bool xe_pm_runtime_resume_and_get(struct xe_device *xe);
>> int xe_pm_runtime_get_if_active(struct xe_device *xe);
>> #endif
More information about the Intel-xe
mailing list