[Intel-xe] [PATCH 2/3] drm/xe: fix xe_device_mem_access_get() race
Matthew Auld
matthew.auld at intel.com
Mon May 15 09:04:10 UTC 2023
On 15/05/2023 09:44, Thomas Hellström wrote:
> Hi, Matthew,
>
> On 5/12/23 17:56, Matthew Auld wrote:
>> On 12/05/2023 15:32, Rodrigo Vivi wrote:
>>> On Thu, May 11, 2023 at 12:43:36PM +0100, Matthew Auld wrote:
>>>> On 05/05/2023 17:44, Rodrigo Vivi wrote:
>>>>> On Fri, May 05, 2023 at 04:38:53PM +0100, Matthew Auld wrote:
>>>>>> It looks like there is at least one race here, given that the
>>>>>> pm_runtime_suspended() check looks to return false if we are in the
>>>>>> process of suspending the device (RPM_SUSPENDING vs
>>>>>> RPM_SUSPENDED). We
>>>>>> later also do xe_pm_runtime_get_if_active(), but since the device is
>>>>>> suspending or has now suspended, this doesn't do anything either.
>>>>>> Following from this we can potentially return from
>>>>>> xe_device_mem_access_get() with the device suspended or about to be,
>>>>>> leading to broken behaviour.
>>>>>>
>>>>>> Attempt to fix this by always grabbing the runtime ref when our
>>>>>> internal
>>>>>> ref transitions from 0 -> 1, and then wrap the whole thing with a
>>>>>> lock
>>>>>> to ensure callers are serialized.
>>>>>>
>>>>>> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/258
>>>>>> Signed-off-by: Matthew Auld <matthew.auld at intel.com>
>>>>>> Cc: Rodrigo Vivi <rodrigo.vivi at intel.com>
>>>>>> Cc: Thomas Hellström <thomas.hellstrom at linux.intel.com>
>>>>>> Cc: Matthew Brost <matthew.brost at intel.com>
>>>>>> ---
>>>>>> drivers/gpu/drm/xe/xe_device.c | 11 +++--------
>>>>>> drivers/gpu/drm/xe/xe_device_types.h | 5 ++++-
>>>>>> drivers/gpu/drm/xe/xe_pm.c | 9 ++-------
>>>>>> drivers/gpu/drm/xe/xe_pm.h | 2 +-
>>>>>> 4 files changed, 10 insertions(+), 17 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/xe/xe_device.c
>>>>>> b/drivers/gpu/drm/xe/xe_device.c
>>>>>> index 01c497bcf9a5..0a18b41a0e1a 100644
>>>>>> --- a/drivers/gpu/drm/xe/xe_device.c
>>>>>> +++ b/drivers/gpu/drm/xe/xe_device.c
>>>>>> @@ -406,17 +406,12 @@ u32 xe_device_ccs_bytes(struct xe_device
>>>>>> *xe, u64 size)
>>>>>> void xe_device_mem_access_get(struct xe_device *xe)
>>>>>> {
>>>>>> - bool resumed = xe_pm_runtime_resume_if_suspended(xe);
>>>>>> -
>>>>>> mutex_lock(&xe->mem_access.lock);
>>>>>> - if (xe->mem_access.ref++ == 0)
>>>>>> - xe->mem_access.hold_rpm = xe_pm_runtime_get_if_active(xe);
>>>>>> + if (xe->mem_access.ref == 0)
>>>>>> + xe->mem_access.hold_rpm = xe_pm_runtime_resume_and_get(xe);
>>>>>> + xe->mem_access.ref++;
>>>>>
>>>>> my memory is good to the point that I have tried this before...
>>>>> but not good enough to remember the issues that I got with this
>>>>> approach :(
>>>>
>>>> Ok, it seems one big issue is around xe_pm_runtime_resume() et al.
>>>> The lock
>>>> fixes all the races, but xe_pm_runtime_resume() seems to call
>>>> xe_device_mem_access_{get,put}() in loads of places AFAICT, but that
>>>> is ofc
>>>> going to deadlock if we introduce a lock, since we are inside the
>>>> callback.
>>>> But I think even without that lock it will still deadlock, since the
>>>> runtime_pm code will see that we are PM_RESUMING and wait for
>>>> itself. I'm
>>>> guessing that explains why we had the conditional
>>>> pm_runtime_suspended() and
>>>> if_active(), since that prevents triggering the runtime_pm from our
>>>> callbacks (we will either be PM_SUSPENDING or PM_RESUMING),
>>>
>>> Yes, that was the goal.
>>>
>>>> but then we are
>>>> ofc left with all the nasty races.
>>>
>>> :(
>>>
>>>> Any ideas? It seems like the
>>>> resume/suspend callbacks should fundamentally never be calling
>>>> xe_device_mem_access_{get,put}()?
>>>
>>> We probably need something like the pseudo code in the end of
>>> Documentation/power/runtime_pm.rst
>>
>> IIUC that looks like a template for where you have lots of incoming IO
>> requests, and you want to process them in an async manner. But to
>> process the request you need to also keep the device awake and ensure
>> it has resumed, so here it makes sense to use the RPM_ASYNC
>> runtime_get. And then in the ->resume() callback it then has a natural
>> place to at some later point process the outstanding IO requests.
>>
>> But for xe_device_mem_access_get() I don't think it can be async,
>> since we need to ensure the ->resume() callback has already completed
>> before returning to the caller. From the pov of
>> xe_device_mem_access_get() it doesn't know if it's being called from
>> ->resume() or some other normal path and yet it seems like it somehow
>> needs to differentiate them. I feel like either the ->resume() should
>> never have been allowed to call it in the first place, or there needs
>> to be some token that always gets passed around.
>>
>> But maybe I'm misunderstanding something here, or at least I'm not
>> currently seeing the connection with the pseudo code? Can you share more?
>
> Some fly-by ideas here:
>
> With the variant we briefly discussed before, to only lock during 0-1
> and 1-0 transitions, using atomic_inc_not_zero() and atomic_add_unless()
> then the get() problem goes out of the way, but any recursive get() and
> put() during put()->suspend will then cause similar problems?
Yeah, I was thinking something like that, but AFAICT the runtime_pm code
will see that it is already in a PM_RESUMING state and wait for itself
to exit from that state, which also deadlocks.
>
> While a bit hacky, that could probably be solved having an
>
> xe_device::suspending_task set and cleared by the suspending task and
> then use
>
> if (xe->suspending_task == current)
I didn't think of that tbh. Does seem hacky, but I think should work,
and perhaps fine for now?
>
> to skip pm calls during suspend. Assuming of course that all suspend()
> task for a single device are never run concurrently.
Yes, AFAICT only one resume or suspend callback can be running for a
device, until we exit from PM_SUSPENDING or PM_RESUMING.
>
> /Thomas
>
>>
>>>
>>>>
>>>>>
>>>>>> mutex_unlock(&xe->mem_access.lock);
>>>>>> - /* The usage counter increased if device was immediately
>>>>>> resumed */
>>>>>> - if (resumed)
>>>>>> - xe_pm_runtime_put(xe);
>>>>>> -
>>>>>> XE_WARN_ON(xe->mem_access.ref == S32_MAX);
>>>>>> }
>>>>>> diff --git a/drivers/gpu/drm/xe/xe_device_types.h
>>>>>> b/drivers/gpu/drm/xe/xe_device_types.h
>>>>>> index 59462933f67a..9e37189d5745 100644
>>>>>> --- a/drivers/gpu/drm/xe/xe_device_types.h
>>>>>> +++ b/drivers/gpu/drm/xe/xe_device_types.h
>>>>>> @@ -256,7 +256,10 @@ struct xe_device {
>>>>>> * triggering additional actions when they occur.
>>>>>> */
>>>>>> struct {
>>>>>> - /** @lock: protect the ref count */
>>>>>> + /**
>>>>>> + * @lock: Serialize xe_device_mem_access users,
>>>>>> + * and protect the below internal state, like @ref.
>>>>>> + */
>>>>>> struct mutex lock;
>>>>>> /** @ref: ref count of memory accesses */
>>>>>> s32 ref;
>>>>>> diff --git a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c
>>>>>> index b7b57f10ba25..b2ffa001e6f7 100644
>>>>>> --- a/drivers/gpu/drm/xe/xe_pm.c
>>>>>> +++ b/drivers/gpu/drm/xe/xe_pm.c
>>>>>> @@ -210,14 +210,9 @@ int xe_pm_runtime_put(struct xe_device *xe)
>>>>>> return pm_runtime_put_autosuspend(xe->drm.dev);
>>>>>> }
>>>>>> -/* Return true if resume operation happened and usage count was
>>>>>> increased */
>>>>>> -bool xe_pm_runtime_resume_if_suspended(struct xe_device *xe)
>>>>>> +bool xe_pm_runtime_resume_and_get(struct xe_device *xe)
>>>>>> {
>>>>>> - /* In case we are suspended we need to immediately wake up */
>>>>>> - if (pm_runtime_suspended(xe->drm.dev))
>>>>>> - return !pm_runtime_resume_and_get(xe->drm.dev);
>>>>>> -
>>>>>> - return false;
>>>>>> + return !pm_runtime_resume_and_get(xe->drm.dev);
>>>>>
>>>>> now with similar name I feel strange that we are not aligned with
>>>>> their
>>>>> return. Although I prefer our one...
>>>>>
>>>>> Anyway, the code is right... if you are testing and it is working well
>>>>> let's move with this.
>>>>>
>>>>> Reviewed-by: Rodrigo Vivi <rodrigo.vivi at intel.com>
>>>>> (for the series)
>>>>>
>>>>> but let's get an Ack from Maarten since he was kept as author on
>>>>> patch 3
>>>>> and it is modified from his merged one.
>>>>>
>>>>>> }
>>>>>> int xe_pm_runtime_get_if_active(struct xe_device *xe)
>>>>>> diff --git a/drivers/gpu/drm/xe/xe_pm.h b/drivers/gpu/drm/xe/xe_pm.h
>>>>>> index 6a885585f653..1b4c15b5e71a 100644
>>>>>> --- a/drivers/gpu/drm/xe/xe_pm.h
>>>>>> +++ b/drivers/gpu/drm/xe/xe_pm.h
>>>>>> @@ -19,7 +19,7 @@ int xe_pm_runtime_suspend(struct xe_device *xe);
>>>>>> int xe_pm_runtime_resume(struct xe_device *xe);
>>>>>> int xe_pm_runtime_get(struct xe_device *xe);
>>>>>> int xe_pm_runtime_put(struct xe_device *xe);
>>>>>> -bool xe_pm_runtime_resume_if_suspended(struct xe_device *xe);
>>>>>> +bool xe_pm_runtime_resume_and_get(struct xe_device *xe);
>>>>>> int xe_pm_runtime_get_if_active(struct xe_device *xe);
>>>>>> #endif
>>>>>> --
>>>>>> 2.40.0
>>>>>>
More information about the Intel-xe
mailing list