[PATCH v2] drm/xe/client: Check return value of xe_force_wake_get
Nirmoy Das
nirmoy.das at linux.intel.com
Mon Jun 3 19:56:41 UTC 2024
Hi Himal,
On 6/3/2024 1:34 PM, Ghimiray, Himal Prasad wrote:
>
>
> On 03-06-2024 16:28, Nirmoy Das wrote:
>>
>> On 6/3/2024 11:07 AM, Ghimiray, Himal Prasad wrote:
>>>
>>> On 03-06-2024 14:01, Nirmoy Das wrote:
>>>> xe_force_wake_get() can return error so check it's return value
>>>> before reading gpu_timestamp value.
>>>>
>>>> v2: set HWE to NULL instead of setting timestamp to 0(Lucas)
>>>> Add a warn on for xe_force_wake_put(Himal)
>>>>
>>>> Fixes: 188ced1e0ff8 ("drm/xe/client: Print runtime to fdinfo")
>>>> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray at intel.com>
>>>> Cc: Lucas De Marchi <lucas.demarchi at intel.com>
>>>> Cc: Rodrigo Vivi <rodrigo.vivi at intel.com>
>>>> Signed-off-by: Nirmoy Das <nirmoy.das at intel.com>
>>>> ---
>>>> drivers/gpu/drm/xe/xe_drm_client.c | 8 ++++++--
>>>> 1 file changed, 6 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/xe/xe_drm_client.c
>>>> b/drivers/gpu/drm/xe/xe_drm_client.c
>>>> index 4a19b771e3a0..4bdb909eabfc 100644
>>>> --- a/drivers/gpu/drm/xe/xe_drm_client.c
>>>> +++ b/drivers/gpu/drm/xe/xe_drm_client.c
>>>> @@ -264,9 +264,13 @@ static void show_run_ticks(struct drm_printer
>>>> *p, struct drm_file *file)
>>>> if (!hwe)
>>>> continue;
>>>> - xe_force_wake_get(gt_to_fw(gt), XE_FW_GT);
>>>> + if (xe_force_wake_get(gt_to_fw(gt), XE_FW_GT)) {
>>>
>>> Since xe_force_wake_get will increase the ref count irrespective of
>>> acknowledgment from forcewake ack register fails or succeeds.
>>
>> General pattern is to skip xe_force_wake_put() though at some places
>> we do call xe_force_wake_put() when the caller is unabortable.
>>
>> I don't know the exact reason but I guess failure to wake a domain
>> is serious enough so we want to keep already woken domains in wake
>> state for system stability.
>
> If you break without xe_force_Wake_put, subsequent force_wake_get will
> assume domain is awake(which might not be true, because there was no
> acknowledgment)
>
> Please consider below scenario:
>
> function a:
> xe_force_wake_get ---> increased domain ref from 0 to 1 -----> wake
> domain ---> timedout the acknowledgment -> failed and returned without
> decreasing the ref count
>
> read/write operations -> not called
> xe_force_wake_put -> not called
>
> function b : called after function a
> xe_force_wake_get -----> domain ref increased to 2 ----> exit success
> read/write operations -> executed but domain was not awake
> xe_force_wake_put ----> domain ref decreased to 1 again.
I assumed this was known to everyone as a known side effect, as we
mostly skip calling xe_force_wake_put(), but I guess it is not.
Had a chat with Rodrigo about this, and he acknowledged the current
inconsistency and the need for cleanup. Going to create a series
to add the missing xe_force_wake_put() calls where we haven't been doing it.
Thanks,
Nirmoy
>
> BR
> Himal
>
>
>>
>>
>> Regards,
>>
>> Nirmoy
>>
>>
>>>
>>> It is required to decrement the ref count taken by xe_force_wake_get
>>> by calling xe_force_wake_put before break.
>>>
>>> something like :
>>>
>>> if (xe_force_wake_get(gt_to_fw(gt), XE_FW_GT))
>>>
>>> hwe = NULL;
>>>
>>> else
>>>
>>> gpu_timestamp = xe_hw_engine_read_timestamp(hwe);
>>>
>>> XE_WARN_ON(xe_force_wake_put(gt_to_fw(gt), XE_FW_GT));
>>>
>>>
>>>> + hwe = NULL;
>>>> + break;
>>>> + }
>>>> +
>>>> gpu_timestamp = xe_hw_engine_read_timestamp(hwe);
>>>> - xe_force_wake_put(gt_to_fw(gt), XE_FW_GT);
>>>> + XE_WARN_ON(xe_force_wake_put(gt_to_fw(gt), XE_FW_GT));
>>>> break;
>>>> }
More information about the Intel-xe
mailing list