[Intel-xe] [PATCH v12 01/13] drm/xe: fix xe_device_mem_access_get() races
Gupta, Anshuman
anshuman.gupta at intel.com
Tue Jul 4 15:29:52 UTC 2023
On 7/4/2023 4:55 PM, Matthew Auld wrote:
> On 30/06/2023 16:22, Gupta, Anshuman wrote:
>>
>>
>> On 6/26/2023 4:20 PM, Matthew Auld wrote:
>>> It looks like there is at least one race here, given that the
>>> pm_runtime_suspended() check looks to return false if we are in the
>>> process of suspending the device (RPM_SUSPENDING vs RPM_SUSPENDED). We
>>> later also do xe_pm_runtime_get_if_active(), but since the device is
>>> suspending or has now suspended, this doesn't do anything either.
>>> Following from this we can potentially return from
>>> xe_device_mem_access_get() with the device suspended or about to be,
>>> leading to broken behaviour.
>>>
>>> Attempt to fix this by always grabbing the runtime ref when our internal
>>> ref transitions from 0 -> 1. The hard part is then dealing with the
>>> runtime_pm callbacks also calling xe_device_mem_access_get() and
>>> deadlocking, which the pm_runtime_suspended() check prevented.
>>>
>>> v2:
>>> - ct->lock looks to be primed with fs_reclaim, so holding that and
>>> then
>>> allocating memory will cause lockdep to complain. Now that we
>>> unconditionally grab the mem_access.lock around
>>> mem_access_{get,put}, we
>>> need to change the ordering wrt to grabbing the ct->lock, since
>>> some of
>>> the runtime_pm routines can allocate memory (or at least that's what
>>> lockdep seems to suggest). Hopefully not a big deal. It might be
>>> that
>>> there were already issues with this, just that the atomics where
>>> "hiding" the potential issues.
>>> v3:
>>> - Use Thomas Hellström' idea with tracking the active task that is
>>> executing in the resume or suspend callback, in order to avoid
>>> recursive resume/suspend calls deadlocking on itself.
>>> - Split the ct->lock change.
>>> v4:
>>> - Add smb_mb() around accessing the pm_callback_task for extra safety.
>>> (Thomas Hellström)
>>> v5:
>>> - Clarify the kernel-doc for the mem_access.lock, given that it is
>>> quite
>>> strange in what it protects (data vs code). The real motivation
>>> is to
>>> aid lockdep. (Rodrigo Vivi)
>>> v6:
>>> - Split out the lock change. We still want this as a lockdep aid but
>>> only for the xe_device_mem_access_get() path. Sticking a lock on the
>>> put() looks be a no-go, also the runtime_put() there is always
>>> async.
>>> - Now that the lock is gone move to atomics and rely on the pm code
>>> serialising multiple callers on the 0 -> 1 transition.
>>> - g2h_worker_func() looks to be the next issue, given that
>>> suspend-resume callbacks are using CT, so try to handle that.
>>> v7:
>>> - Add xe_device_mem_access_get_if_ongoing(), and use it in
>>> g2h_worker_func().
>>>
>>> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/258
>>> Signed-off-by: Matthew Auld <matthew.auld at intel.com>
>>> Cc: Rodrigo Vivi <rodrigo.vivi at intel.com>
>>> Cc: Thomas Hellström <thomas.hellstrom at linux.intel.com>
>>> Cc: Matthew Brost <matthew.brost at intel.com>
>>> Cc: Anshuman Gupta <anshuman.gupta at intel.com>
>>> ---
>>> drivers/gpu/drm/xe/xe_device.c | 58 +++++++++++++++++++-----
>>> drivers/gpu/drm/xe/xe_device.h | 12 ++---
>>> drivers/gpu/drm/xe/xe_device_types.h | 6 +++
>>> drivers/gpu/drm/xe/xe_guc_ct.c | 34 +++++++++++++-
>>> drivers/gpu/drm/xe/xe_pm.c | 66 +++++++++++++++++++---------
>>> drivers/gpu/drm/xe/xe_pm.h | 3 +-
>>> 6 files changed, 134 insertions(+), 45 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/xe/xe_device.c
>>> b/drivers/gpu/drm/xe/xe_device.c
>>> index c7985af85a53..1dc552da434f 100644
>>> --- a/drivers/gpu/drm/xe/xe_device.c
>>> +++ b/drivers/gpu/drm/xe/xe_device.c
>>> @@ -411,27 +411,61 @@ u32 xe_device_ccs_bytes(struct xe_device *xe,
>>> u64 size)
>>> DIV_ROUND_UP(size, NUM_BYTES_PER_CCS_BYTE) : 0;
>>> }
>>> +bool xe_device_mem_access_ongoing(struct xe_device *xe)
>>> +{
>>> + if (xe_pm_read_callback_task(xe) != NULL)
>>> + return true;
>>> +
>>> + return atomic_read(&xe->mem_access.ref);
>>> +}
>>> +
>>> +void xe_device_assert_mem_access(struct xe_device *xe)
>>> +{
>>> + XE_WARN_ON(!xe_device_mem_access_ongoing(xe));
>>> +}
>>> +
>>> +bool xe_device_mem_access_get_if_ongoing(struct xe_device *xe)
>>> +{
>>> + return atomic_inc_not_zero(&xe->mem_access.ref);
>>> +}
>>> +
>>> void xe_device_mem_access_get(struct xe_device *xe)
>>> {
>>> - bool resumed = xe_pm_runtime_resume_if_suspended(xe);
>>> - int ref = atomic_inc_return(&xe->mem_access.ref);
>>> + /*
>>> + * This looks racy, but should be fine since the
>>> pm_callback_task only
>>> + * transitions from NULL -> current (and back to NULL again),
>>> during the
>>> + * runtime_resume() or runtime_suspend() callbacks, for which
>>> there can
>>> + * only be a single one running for our device. We only need to
>>> prevent
>>> + * recursively calling the runtime_get or runtime_put from those
>>> + * callbacks, as well as preventing triggering any access_ongoing
>>> + * asserts.
>>> + */
>> two runtime_suspend() can run in parallel for two different pci device
>> those worker thread pooled by pm_wq workqueue, it is not guaranteed
>> that tast_struct will be different for two worker spawned by same pm_wq ?
>
> IIUC we only use pm_wq for the async put(). If somehow tast_struct can
> be the same for different workers when using pm_wq (I'm not sure tbh),
> then I think it's fine since all put() must be balanced with a get(),
> and all get() are handled directly in the caller (not pm_wq) since it's
> non-async, and must first wait for any running callback.
Agree with that.
>
>>
>>> + if (xe_pm_read_callback_task(xe) == current)
>>> + return;
>>> - if (ref == 1)
>>> - xe->mem_access.hold_rpm = xe_pm_runtime_get_if_active(xe);
>>> + if (!atomic_inc_not_zero(&xe->mem_access.ref)) {
>>> + bool hold_rpm = xe_pm_runtime_resume_and_get(xe);
>>> + int ref;
>>> - /* The usage counter increased if device was immediately resumed */
>>> - if (resumed)
>>> - xe_pm_runtime_put(xe);
>>> -
>>> - XE_WARN_ON(ref == S32_MAX);
>>> + ref = atomic_inc_return(&xe->mem_access.ref);
>>> + if (ref == 1)
>>> + xe->mem_access.hold_rpm = hold_rpm;
>>> + else
>>> + xe_pm_runtime_put(xe);
AFAIU above is not possible ?
>>> + } else {
>>> + XE_WARN_ON(atomic_read(&xe->mem_access.ref) == S32_MAX);
>>> + }
>>> }
>>> void xe_device_mem_access_put(struct xe_device *xe)
>>> {
>>> - bool hold = xe->mem_access.hold_rpm;
>>> - int ref = atomic_dec_return(&xe->mem_access.ref);
>>> + int ref;
>>> - if (!ref && hold)
>>> + if (xe_pm_read_callback_task(xe) == current)
>>> + return;
>>> +
>>> + ref = atomic_dec_return(&xe->mem_access.ref);
>>> + if (ref == 0 && xe->mem_access.hold_rpm)
>>> xe_pm_runtime_put(xe);
>>> XE_WARN_ON(ref < 0);
>> /snip
>>> +
>>> int xe_pm_runtime_suspend(struct xe_device *xe)
>>> {
>>> struct xe_gt *gt;
>>> u8 id;
>>> - int err;
>>> + int err = 0;
>>> +
>>> + if (xe->d3cold_allowed && xe_device_mem_access_ongoing(xe))
>>> + return -EBUSY;
>> Not related to this patch but We should return -EBUSY even for d3hot
>> as well.
>
> Looking at this again, access_ongoing is always going to be false here,
> right? On the 0 -> 1 transition we always do full sync pm get before
> increment mem_access.ref, so not sure if this check actually does anything.
I belive this is paranoid check against any unbalanced put.
Br,
Anshuman Gupta.
>
>> Br,
>> Anshuman Gupta
>>> +
>>> + /* Disable access_ongoing asserts and prevent recursive pm calls */
>>> + xe_pm_write_callback_task(xe, current);
>>> if (xe->d3cold_allowed) {
>>> - if (xe_device_mem_access_ongoing(xe))
>>> - return -EBUSY;
>>> -
>>> err = xe_bo_evict_all(xe);
>>> if (err)
>>> - return err;
>>> + goto out;
>>> }
>>> for_each_gt(gt, xe, id) {
>>> err = xe_gt_suspend(gt);
>>> if (err)
>>> - return err;
>>> + goto out;
>>> }
>>> xe_irq_suspend(xe);
>>> -
>>> - return 0;
>>> +out:
>>> + xe_pm_write_callback_task(xe, NULL);
>>> + return err;
>>> }
>>> int xe_pm_runtime_resume(struct xe_device *xe)
>>> {
>>> struct xe_gt *gt;
>>> u8 id;
>>> - int err;
>>> + int err = 0;
>>> +
>>> + /* Disable access_ongoing asserts and prevent recursive pm calls */
>>> + xe_pm_write_callback_task(xe, current);
>>> if (xe->d3cold_allowed) {
>>> for_each_gt(gt, xe, id) {
>>> err = xe_pcode_init(gt);
>>> if (err)
>>> - return err;
>>> + goto out;
>>> }
>>> /*
>>> @@ -182,7 +210,7 @@ int xe_pm_runtime_resume(struct xe_device *xe)
>>> */
>>> err = xe_bo_restore_kernel(xe);
>>> if (err)
>>> - return err;
>>> + goto out;
>>> }
>>> xe_irq_resume(xe);
>>> @@ -193,10 +221,11 @@ int xe_pm_runtime_resume(struct xe_device *xe)
>>> if (xe->d3cold_allowed) {
>>> err = xe_bo_restore_user(xe);
>>> if (err)
>>> - return err;
>>> + goto out;
>>> }
>>> -
>>> - return 0;
>>> +out:
>>> + xe_pm_write_callback_task(xe, NULL);
>>> + return err;
>>> }
>>> int xe_pm_runtime_get(struct xe_device *xe)
>>> @@ -210,14 +239,9 @@ int xe_pm_runtime_put(struct xe_device *xe)
>>> return pm_runtime_put_autosuspend(xe->drm.dev);
>>> }
>>> -/* Return true if resume operation happened and usage count was
>>> increased */
>>> -bool xe_pm_runtime_resume_if_suspended(struct xe_device *xe)
>>> +bool xe_pm_runtime_resume_and_get(struct xe_device *xe)
>>> {
>>> - /* In case we are suspended we need to immediately wake up */
>>> - if (pm_runtime_suspended(xe->drm.dev))
>>> - return !pm_runtime_resume_and_get(xe->drm.dev);
>>> -
>>> - return false;
>>> + return !pm_runtime_resume_and_get(xe->drm.dev);
>>> }
>>> int xe_pm_runtime_get_if_active(struct xe_device *xe)
>>> diff --git a/drivers/gpu/drm/xe/xe_pm.h b/drivers/gpu/drm/xe/xe_pm.h
>>> index 6a885585f653..e92c508d44b9 100644
>>> --- a/drivers/gpu/drm/xe/xe_pm.h
>>> +++ b/drivers/gpu/drm/xe/xe_pm.h
>>> @@ -19,7 +19,8 @@ int xe_pm_runtime_suspend(struct xe_device *xe);
>>> int xe_pm_runtime_resume(struct xe_device *xe);
>>> int xe_pm_runtime_get(struct xe_device *xe);
>>> int xe_pm_runtime_put(struct xe_device *xe);
>>> -bool xe_pm_runtime_resume_if_suspended(struct xe_device *xe);
>>> +bool xe_pm_runtime_resume_and_get(struct xe_device *xe);
>>> int xe_pm_runtime_get_if_active(struct xe_device *xe);
>>> +struct task_struct *xe_pm_read_callback_task(struct xe_device *xe);
>>> #endif
More information about the Intel-xe
mailing list