[PATCH v2] drm/xe/client: Check return value of xe_force_wake_get

Nirmoy Das nirmoy.das at linux.intel.com
Mon Jun 3 19:56:41 UTC 2024


Hi Himal,

On 6/3/2024 1:34 PM, Ghimiray, Himal Prasad wrote:
>
>
> On 03-06-2024 16:28, Nirmoy Das wrote:
>>
>> On 6/3/2024 11:07 AM, Ghimiray, Himal Prasad wrote:
>>>
>>> On 03-06-2024 14:01, Nirmoy Das wrote:
>>>> xe_force_wake_get() can return error so check it's return value
>>>> before reading gpu_timestamp value.
>>>>
>>>> v2: set HWE to NULL instead of setting timestamp to 0(Lucas)
>>>>      Add a warn on for xe_force_wake_put(Himal)
>>>>
>>>> Fixes: 188ced1e0ff8 ("drm/xe/client: Print runtime to fdinfo")
>>>> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray at intel.com>
>>>> Cc: Lucas De Marchi <lucas.demarchi at intel.com>
>>>> Cc: Rodrigo Vivi <rodrigo.vivi at intel.com>
>>>> Signed-off-by: Nirmoy Das <nirmoy.das at intel.com>
>>>> ---
>>>>   drivers/gpu/drm/xe/xe_drm_client.c | 8 ++++++--
>>>>   1 file changed, 6 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/xe/xe_drm_client.c 
>>>> b/drivers/gpu/drm/xe/xe_drm_client.c
>>>> index 4a19b771e3a0..4bdb909eabfc 100644
>>>> --- a/drivers/gpu/drm/xe/xe_drm_client.c
>>>> +++ b/drivers/gpu/drm/xe/xe_drm_client.c
>>>> @@ -264,9 +264,13 @@ static void show_run_ticks(struct drm_printer 
>>>> *p, struct drm_file *file)
>>>>           if (!hwe)
>>>>               continue;
>>>>   -        xe_force_wake_get(gt_to_fw(gt), XE_FW_GT);
>>>> +        if (xe_force_wake_get(gt_to_fw(gt), XE_FW_GT)) {
>>>
>>> Since xe_force_wake_get will increase the ref count irrespective of 
>>> acknowledgment from forcewake ack register fails or succeeds.
>>
>> General pattern is to skip xe_force_wake_put() though at some places 
>> we do call xe_force_wake_put() when the caller is unabortable.
>>
>> I don't know the exact reason but I  guess failure to wake a domain 
>> is serious enough so we want to  keep already woken domains  in wake 
>> state for system stability.
>
> If you break without xe_force_Wake_put, subsequent force_wake_get will 
> assume domain is awake(which might not be true, because there was no 
> acknowledgment)
>
> Please consider below scenario:
>
> function a:
> xe_force_wake_get ---> increased domain ref from 0 to 1 -----> wake 
> domain ---> timedout the acknowledgment -> failed and returned without 
> decreasing the ref count
>
> read/write operations  -> not called
> xe_force_wake_put -> not called
>
> function b : called after function a
> xe_force_wake_get -----> domain ref increased to 2 ----> exit success
> read/write operations -> executed but domain was not awake
> xe_force_wake_put ----> domain ref decreased to 1 again.


I assumed this was known to everyone as a known side effect, as we 
mostly skip calling xe_force_wake_put(), but I guess it is not.

Had a chat with Rodrigo about this, and he acknowledged the current 
inconsistency and the need for cleanup. Going to create a series

to add the missing xe_force_wake_put() calls where we haven't been doing it.


Thanks,

Nirmoy

>
> BR
> Himal
>
>
>>
>>
>> Regards,
>>
>> Nirmoy
>>
>>
>>>
>>> It is required to decrement the ref count taken by xe_force_wake_get 
>>> by calling xe_force_wake_put before break.
>>>
>>> something like :
>>>
>>> if (xe_force_wake_get(gt_to_fw(gt), XE_FW_GT))
>>>
>>> hwe = NULL;
>>>
>>> else
>>>
>>> gpu_timestamp = xe_hw_engine_read_timestamp(hwe);
>>>
>>> XE_WARN_ON(xe_force_wake_put(gt_to_fw(gt), XE_FW_GT));
>>>
>>>
>>>> +            hwe = NULL;
>>>> +            break;
>>>> +        }
>>>> +
>>>>           gpu_timestamp = xe_hw_engine_read_timestamp(hwe);
>>>> -        xe_force_wake_put(gt_to_fw(gt), XE_FW_GT);
>>>> +        XE_WARN_ON(xe_force_wake_put(gt_to_fw(gt), XE_FW_GT));
>>>>           break;
>>>>       }


More information about the Intel-xe mailing list