[Intel-xe] [PATCH v12 01/13] drm/xe: fix xe_device_mem_access_get() races
Gupta, Anshuman
anshuman.gupta at intel.com
Tue Jul 11 09:00:55 UTC 2023
> -----Original Message-----
> From: Auld, Matthew <matthew.auld at intel.com>
> Sent: Tuesday, July 4, 2023 9:30 PM
> To: Gupta, Anshuman <anshuman.gupta at intel.com>; intel-
> xe at lists.freedesktop.org
> Cc: Vivi, Rodrigo <rodrigo.vivi at intel.com>; Thomas Hellström
> <thomas.hellstrom at linux.intel.com>; Brost, Matthew
> <matthew.brost at intel.com>
> Subject: Re: [PATCH v12 01/13] drm/xe: fix xe_device_mem_access_get() races
>
> On 04/07/2023 16:29, Gupta, Anshuman wrote:
> >
> >
> > On 7/4/2023 4:55 PM, Matthew Auld wrote:
> >> On 30/06/2023 16:22, Gupta, Anshuman wrote:
> >>>
> >>>
> >>> On 6/26/2023 4:20 PM, Matthew Auld wrote:
> >>>> It looks like there is at least one race here, given that the
> >>>> pm_runtime_suspended() check looks to return false if we are in the
> >>>> process of suspending the device (RPM_SUSPENDING vs
> RPM_SUSPENDED).
> >>>> We later also do xe_pm_runtime_get_if_active(), but since the
> >>>> device is suspending or has now suspended, this doesn't do anything either.
> >>>> Following from this we can potentially return from
> >>>> xe_device_mem_access_get() with the device suspended or about to
> >>>> be, leading to broken behaviour.
> >>>>
> >>>> Attempt to fix this by always grabbing the runtime ref when our
> >>>> internal ref transitions from 0 -> 1. The hard part is then dealing
> >>>> with the runtime_pm callbacks also calling
> >>>> xe_device_mem_access_get() and deadlocking, which the
> >>>> pm_runtime_suspended() check prevented.
> >>>>
> >>>> v2:
> >>>> - ct->lock looks to be primed with fs_reclaim, so holding that
> >>>> and then
> >>>> allocating memory will cause lockdep to complain. Now that we
> >>>> unconditionally grab the mem_access.lock around
> >>>> mem_access_{get,put}, we
> >>>> need to change the ordering wrt to grabbing the ct->lock, since
> >>>> some of
> >>>> the runtime_pm routines can allocate memory (or at least that's
> >>>> what
> >>>> lockdep seems to suggest). Hopefully not a big deal. It might
> >>>> be that
> >>>> there were already issues with this, just that the atomics
> >>>> where
> >>>> "hiding" the potential issues.
> >>>> v3:
> >>>> - Use Thomas Hellström' idea with tracking the active task that
> >>>> is
> >>>> executing in the resume or suspend callback, in order to avoid
> >>>> recursive resume/suspend calls deadlocking on itself.
> >>>> - Split the ct->lock change.
> >>>> v4:
> >>>> - Add smb_mb() around accessing the pm_callback_task for extra
> >>>> safety.
> >>>> (Thomas Hellström)
> >>>> v5:
> >>>> - Clarify the kernel-doc for the mem_access.lock, given that it
> >>>> is quite
> >>>> strange in what it protects (data vs code). The real motivation
> >>>> is to
> >>>> aid lockdep. (Rodrigo Vivi)
> >>>> v6:
> >>>> - Split out the lock change. We still want this as a lockdep aid
> >>>> but
> >>>> only for the xe_device_mem_access_get() path. Sticking a lock
> >>>> on the
> >>>> put() looks be a no-go, also the runtime_put() there is always
> >>>> async.
> >>>> - Now that the lock is gone move to atomics and rely on the pm
> >>>> code
> >>>> serialising multiple callers on the 0 -> 1 transition.
> >>>> - g2h_worker_func() looks to be the next issue, given that
> >>>> suspend-resume callbacks are using CT, so try to handle that.
> >>>> v7:
> >>>> - Add xe_device_mem_access_get_if_ongoing(), and use it in
> >>>> g2h_worker_func().
> >>>>
> >>>> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/258
> >>>> Signed-off-by: Matthew Auld <matthew.auld at intel.com>
> >>>> Cc: Rodrigo Vivi <rodrigo.vivi at intel.com>
> >>>> Cc: Thomas Hellström <thomas.hellstrom at linux.intel.com>
> >>>> Cc: Matthew Brost <matthew.brost at intel.com>
> >>>> Cc: Anshuman Gupta <anshuman.gupta at intel.com>
> >>>> ---
> >>>> drivers/gpu/drm/xe/xe_device.c | 58
> >>>> +++++++++++++++++++-----
> >>>> drivers/gpu/drm/xe/xe_device.h | 12 ++---
> >>>> drivers/gpu/drm/xe/xe_device_types.h | 6 +++
> >>>> drivers/gpu/drm/xe/xe_guc_ct.c | 34 +++++++++++++-
> >>>> drivers/gpu/drm/xe/xe_pm.c | 66
> >>>> +++++++++++++++++++---------
> >>>> drivers/gpu/drm/xe/xe_pm.h | 3 +-
> >>>> 6 files changed, 134 insertions(+), 45 deletions(-)
> >>>>
> >>>> diff --git a/drivers/gpu/drm/xe/xe_device.c
> >>>> b/drivers/gpu/drm/xe/xe_device.c index c7985af85a53..1dc552da434f
> >>>> 100644
> >>>> --- a/drivers/gpu/drm/xe/xe_device.c
> >>>> +++ b/drivers/gpu/drm/xe/xe_device.c
> >>>> @@ -411,27 +411,61 @@ u32 xe_device_ccs_bytes(struct xe_device *xe,
> >>>> u64 size)
> >>>> DIV_ROUND_UP(size, NUM_BYTES_PER_CCS_BYTE) : 0;
> >>>> }
> >>>> +bool xe_device_mem_access_ongoing(struct xe_device *xe) {
> >>>> + if (xe_pm_read_callback_task(xe) != NULL)
> >>>> + return true;
> >>>> +
> >>>> + return atomic_read(&xe->mem_access.ref); }
> >>>> +
> >>>> +void xe_device_assert_mem_access(struct xe_device *xe) {
> >>>> + XE_WARN_ON(!xe_device_mem_access_ongoing(xe));
> >>>> +}
> >>>> +
> >>>> +bool xe_device_mem_access_get_if_ongoing(struct xe_device *xe) {
> >>>> + return atomic_inc_not_zero(&xe->mem_access.ref);
> >>>> +}
> >>>> +
> >>>> void xe_device_mem_access_get(struct xe_device *xe)
> >>>> {
> >>>> - bool resumed = xe_pm_runtime_resume_if_suspended(xe);
> >>>> - int ref = atomic_inc_return(&xe->mem_access.ref);
> >>>> + /*
> >>>> + * This looks racy, but should be fine since the
> >>>> pm_callback_task only
> >>>> + * transitions from NULL -> current (and back to NULL again),
> >>>> during the
> >>>> + * runtime_resume() or runtime_suspend() callbacks, for which
> >>>> there can
> >>>> + * only be a single one running for our device. We only need
> >>>> +to
> >>>> prevent
> >>>> + * recursively calling the runtime_get or runtime_put from
> >>>> +those
> >>>> + * callbacks, as well as preventing triggering any
> >>>> +access_ongoing
> >>>> + * asserts.
> >>>> + */
> >>> two runtime_suspend() can run in parallel for two different pci
> >>> device those worker thread pooled by pm_wq workqueue, it is not
> >>> guaranteed that tast_struct will be different for two worker spawned
> >>> by same pm_wq ?
> >>
> >> IIUC we only use pm_wq for the async put(). If somehow tast_struct
> >> can be the same for different workers when using pm_wq (I'm not sure
> >> tbh), then I think it's fine since all put() must be balanced with a
> >> get(), and all get() are handled directly in the caller (not pm_wq)
> >> since it's non-async, and must first wait for any running callback.
> > Agree with that.
> >>
> >>>
> >>>> + if (xe_pm_read_callback_task(xe) == current)
> >>>> + return;
> >>>> - if (ref == 1)
> >>>> - xe->mem_access.hold_rpm = xe_pm_runtime_get_if_active(xe);
> >>>> + if (!atomic_inc_not_zero(&xe->mem_access.ref)) {
> >>>> + bool hold_rpm = xe_pm_runtime_resume_and_get(xe);
> >>>> + int ref;
> >>>> - /* The usage counter increased if device was immediately
> >>>> resumed */
> >>>> - if (resumed)
> >>>> - xe_pm_runtime_put(xe);
> >>>> -
> >>>> - XE_WARN_ON(ref == S32_MAX);
> >>>> + ref = atomic_inc_return(&xe->mem_access.ref);
> >>>> + if (ref == 1)
> >>>> + xe->mem_access.hold_rpm = hold_rpm;
> >>>> + else
> >>>> + xe_pm_runtime_put(xe);
> > AFAIU above is not possible ?
>
> Do you mean the xe_pm_runtime_put() part? On the 0 -> 1 transition we just
> always call xe_pm_runtime_resume_and_get(). That serialises multiple callers
> such that exactly one runs our rpm resume callback and the rest all block waiting
> for it to transition into RPM_ACTIVE. However once we return from
> xe_pm_runtime_resume_and_get() there are potentially several different callers
> holding an rpm ref. We only need the one rpm ref for our needs, so we simply
> drop the rest here keeping the one that first incremented the ref.
Thanks for explanation.
I am still not clear with how are we handling case of error return by xe_pm_runtime_resume_and_get() ?
If one of two simultaneous call of xe_device_mem_access_get() fails at xe_pm_runtime_resume_and_get()
then it drops the rpm ref of other passed call as well ?
That eventually provide access to PCI bar without RPM protection.
We should also check returned value from xe_pm_runtime_resume_and_get() before calling
xe_pm_runtime_put() as pm_runtime_resume_and_get() also drops the device usage count on
error case.
Br,
Anshuman Gupta.
>
> >>>> + } else {
> >>>> + XE_WARN_ON(atomic_read(&xe->mem_access.ref) == S32_MAX);
> >>>> + }
> >>>> }
> >>>> void xe_device_mem_access_put(struct xe_device *xe)
> >>>> {
> >>>> - bool hold = xe->mem_access.hold_rpm;
> >>>> - int ref = atomic_dec_return(&xe->mem_access.ref);
> >>>> + int ref;
> >>>> - if (!ref && hold)
> >>>> + if (xe_pm_read_callback_task(xe) == current)
> >>>> + return;
> >>>> +
> >>>> + ref = atomic_dec_return(&xe->mem_access.ref);
> >>>> + if (ref == 0 && xe->mem_access.hold_rpm)
> >>>> xe_pm_runtime_put(xe);
> >>>> XE_WARN_ON(ref < 0);
> >>> /snip
> >>>> +
> >>>> int xe_pm_runtime_suspend(struct xe_device *xe)
> >>>> {
> >>>> struct xe_gt *gt;
> >>>> u8 id;
> >>>> - int err;
> >>>> + int err = 0;
> >>>> +
> >>>> + if (xe->d3cold_allowed && xe_device_mem_access_ongoing(xe))
> >>>> + return -EBUSY;
> >>> Not related to this patch but We should return -EBUSY even for d3hot
> >>> as well.
> >>
> >> Looking at this again, access_ongoing is always going to be false
> >> here, right? On the 0 -> 1 transition we always do full sync pm get
> >> before increment mem_access.ref, so not sure if this check actually
> >> does anything.
> > I belive this is paranoid check against any unbalanced put.
> > Br,
> > Anshuman Gupta.
> >>
> >>> Br,
> >>> Anshuman Gupta
> >>>> +
> >>>> + /* Disable access_ongoing asserts and prevent recursive pm
> >>>> calls */
> >>>> + xe_pm_write_callback_task(xe, current);
> >>>> if (xe->d3cold_allowed) {
> >>>> - if (xe_device_mem_access_ongoing(xe))
> >>>> - return -EBUSY;
> >>>> -
> >>>> err = xe_bo_evict_all(xe);
> >>>> if (err)
> >>>> - return err;
> >>>> + goto out;
> >>>> }
> >>>> for_each_gt(gt, xe, id) {
> >>>> err = xe_gt_suspend(gt);
> >>>> if (err)
> >>>> - return err;
> >>>> + goto out;
> >>>> }
> >>>> xe_irq_suspend(xe);
> >>>> -
> >>>> - return 0;
> >>>> +out:
> >>>> + xe_pm_write_callback_task(xe, NULL);
> >>>> + return err;
> >>>> }
> >>>> int xe_pm_runtime_resume(struct xe_device *xe)
> >>>> {
> >>>> struct xe_gt *gt;
> >>>> u8 id;
> >>>> - int err;
> >>>> + int err = 0;
> >>>> +
> >>>> + /* Disable access_ongoing asserts and prevent recursive pm
> >>>> calls */
> >>>> + xe_pm_write_callback_task(xe, current);
> >>>> if (xe->d3cold_allowed) {
> >>>> for_each_gt(gt, xe, id) {
> >>>> err = xe_pcode_init(gt);
> >>>> if (err)
> >>>> - return err;
> >>>> + goto out;
> >>>> }
> >>>> /*
> >>>> @@ -182,7 +210,7 @@ int xe_pm_runtime_resume(struct xe_device *xe)
> >>>> */
> >>>> err = xe_bo_restore_kernel(xe);
> >>>> if (err)
> >>>> - return err;
> >>>> + goto out;
> >>>> }
> >>>> xe_irq_resume(xe);
> >>>> @@ -193,10 +221,11 @@ int xe_pm_runtime_resume(struct xe_device
> >>>> *xe)
> >>>> if (xe->d3cold_allowed) {
> >>>> err = xe_bo_restore_user(xe);
> >>>> if (err)
> >>>> - return err;
> >>>> + goto out;
> >>>> }
> >>>> -
> >>>> - return 0;
> >>>> +out:
> >>>> + xe_pm_write_callback_task(xe, NULL);
> >>>> + return err;
> >>>> }
> >>>> int xe_pm_runtime_get(struct xe_device *xe) @@ -210,14 +239,9 @@
> >>>> int xe_pm_runtime_put(struct xe_device *xe)
> >>>> return pm_runtime_put_autosuspend(xe->drm.dev);
> >>>> }
> >>>> -/* Return true if resume operation happened and usage count was
> >>>> increased */ -bool xe_pm_runtime_resume_if_suspended(struct
> >>>> xe_device *xe)
> >>>> +bool xe_pm_runtime_resume_and_get(struct xe_device *xe)
> >>>> {
> >>>> - /* In case we are suspended we need to immediately wake up */
> >>>> - if (pm_runtime_suspended(xe->drm.dev))
> >>>> - return !pm_runtime_resume_and_get(xe->drm.dev);
> >>>> -
> >>>> - return false;
> >>>> + return !pm_runtime_resume_and_get(xe->drm.dev);
> >>>> }
> >>>> int xe_pm_runtime_get_if_active(struct xe_device *xe) diff --git
> >>>> a/drivers/gpu/drm/xe/xe_pm.h b/drivers/gpu/drm/xe/xe_pm.h index
> >>>> 6a885585f653..e92c508d44b9 100644
> >>>> --- a/drivers/gpu/drm/xe/xe_pm.h
> >>>> +++ b/drivers/gpu/drm/xe/xe_pm.h
> >>>> @@ -19,7 +19,8 @@ int xe_pm_runtime_suspend(struct xe_device *xe);
> >>>> int xe_pm_runtime_resume(struct xe_device *xe);
> >>>> int xe_pm_runtime_get(struct xe_device *xe);
> >>>> int xe_pm_runtime_put(struct xe_device *xe); -bool
> >>>> xe_pm_runtime_resume_if_suspended(struct xe_device *xe);
> >>>> +bool xe_pm_runtime_resume_and_get(struct xe_device *xe);
> >>>> int xe_pm_runtime_get_if_active(struct xe_device *xe);
> >>>> +struct task_struct *xe_pm_read_callback_task(struct xe_device
> >>>> +*xe);
> >>>> #endif
More information about the Intel-xe
mailing list