[PATCH v2 01/23] drm/xe: Error handling in xe_force_wake_get()
Nilawar, Badal
badal.nilawar at intel.com
Tue Sep 17 05:48:47 UTC 2024
On 13-09-2024 18:47, Ghimiray, Himal Prasad wrote:
>
>
> On 13-09-2024 16:56, Michal Wajdeczko wrote:
>>
>>
>> On 13.09.2024 05:59, Ghimiray, Himal Prasad wrote:
>>>
>>>
>>> On 13-09-2024 03:01, Michal Wajdeczko wrote:
>>>>
>>>>
>>>> On 12.09.2024 21:15, Himal Prasad Ghimiray wrote:
>>>>> If an acknowledgment timeout occurs for a domain awake request, do not
>>>>> increment the reference count for the domain. This ensures that
>>>>> subsequent _get calls do not incorrectly assume the domain is
>>>>> awake. The
>>>>> return value is a mask of domains whose reference counts were
>>>>> incremented, and these domains need to be released using
>>>>> xe_force_wake_put.
>>>>>
>>>>> The caller needs to compare the return value with the input domains to
>>>>> determine the success or failure of the operation and decide
>>>>> whether to
>>>>> continue or return accordingly.
>>>>>
>>>>> While at it, add simple kernel-doc for xe_force_wake_get()
>>>>>
>>>>> Cc: Badal Nilawar <badal.nilawar at intel.com>
>>>>> Cc: Rodrigo Vivi <rodrigo.vivi at intel.com>
>>>>> Cc: Lucas De Marchi <lucas.demarchi at intel.com>
>>>>> Cc: Nirmoy Das <nirmoy.das at intel.com>
>>>>> Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray at intel.com>
>>>>> ---
>>>>> drivers/gpu/drm/xe/xe_force_wake.c | 35 ++++++++++++++++++++++++
>>>>> +-----
>>>>> 1 file changed, 29 insertions(+), 6 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/xe/xe_force_wake.c
>>>>> b/drivers/gpu/drm/xe/xe_force_wake.c
>>>>> index a64c14757c84..fa42d652d23f 100644
>>>>> --- a/drivers/gpu/drm/xe/xe_force_wake.c
>>>>> +++ b/drivers/gpu/drm/xe/xe_force_wake.c
>>>>> @@ -150,26 +150,49 @@ static int domain_sleep_wait(struct xe_gt *gt,
>>>>> (ffs(tmp__) - 1))) && \
>>>>> domain__->reg_ctl.addr)
>>>>> +/**
>>>>> + * xe_force_wake_get : Increase the domain refcount; if it was 0
>>>>> initially, wake the domain
>>>>
>>>> while likely this is still recognized by the kernel-doc tool, this is
>>>> not correct notation for the function() documentation
>>>
>>>
>>> I assume you are suggesting %s/xe_force_wake_get/xe_force_wake_get()
>>> will fix it.
>>>
>>>
>>>>
>>>> [1]
>>>> https://docs.kernel.org/doc-guide/kernel-doc.html#function-
>>>> documentation
>>>>
>>>>> + * @fw: struct xe_force_wake
>>>>> + * @domains: forcewake domains to get refcount on
>>>>> + *
>>>>> + * Increment refcount for the force-wake domain. If the domain is
>>>>> + * asleep, awaken it and wait for acknowledgment within the specified
>>>>> + * timeout. If a timeout occurs, decrement the refcount.
>>>>
>>>> not sure if doc shall be 1:1 of low level implementation details
>>>
>>> Does this sound okay ?
>>> This function takes references for the input @domains and wakes them if
>>> they are asleep.
>>>
>>>>
>>>>> + * The caller should compare the return value with the @domains to
>>>>> + * determine the success or failure of the operation.
>>>>> + *
>>>>> + * Return: mask of refcount increased domains.
>>>>
>>>> if we return a 'mask' then maybe it should be of 'unsigned int' type?
>>>
>>> Agreed. Will fix in next version.
>>>
>>>>
>>>>> If the return value is
>>>>> + * equal to the input parameter @domains, the operation is considered
>>>>> + * successful. Otherwise, the operation is considered a failure, and
>>>>> + * the caller should handle the failure case, potentially returning
>>>>> + * -ETIMEDOUT.
>>>>
>>>> it looks that all problems with the nice API is due to the
>>>> XE_FORCEWAKE_ALL that is not a single domain ID and requires extra care
>>>>
>>>> maybe there should be different pair of functions:
>>>
>>> I am not convinced with different pair of functions:
>>>
>>> In current implementation:
>>>
>>> int mask = xe_force_wake_get(fw, domains)
>>> if (mask != domains) {
>>> Non critical path continue with warning;
>>> or
>>> critical path:
>>> xe_force_wake_put(fw, mask);
>>> return -ETIMEDOUT;
>>> }
>>>
>>> do_ops;
>>> xe_force_wake_put(fw, mask);
>>> return err;
>>>
>>> Above flow remains intact irrespective of individual domains or
>>> FORCEWAKE_ALL.
>>>
>>> In case of individual domains if (mask != domains) can be replaced with
>>> (!mask) and user can avoid xe_force_wake_put(fw, mask) in failure path
>>> since mask is 0;
>>
>> so maybe we should have (by reinventing i915?):
>>
>> // opaque, but zero means failure/no domains are awake
>> typedef unsigned long xe_wakeref_t;
>>
>>
>> // caller should test for ref != 0
>> // but shall call put if ref != 0
>> xe_wakeref_t xe_force_wake_get(fw, enum xe_force_wake_domains d)
>>
>> // safe to call with ref == 0
>> void xe_force_wake_put(fw, xe_wakeref_t ref)
>>
>>
>> // helpers for critical work that must be sure about domain
>>
>> // compares opaque ref with explicit domain != ALL
>> // can be used by the code that obtained the ref
>> bool xe_wakeref_has_domain(xe_wakeref_t, enum xe_force_wake_domains d)
>>
>> // compares fw with explicit domain != ALL
>> // can be used by the code that does not have direct access to the ref
>> bool xe_force_wake_is_awake(fw, enum xe_force_wake_domains d)
>>
>>
>> // helpers for checking correctness
>> void xe_force_wake_assert_held(fw, enum xe_force_wake_domains d)
>>
>>
>> then usage would be:
>>
>> xe_wakeref_t ref;
>>
>> ref = xe_force_wake_get(fw, d);
>> if (ref) {
>> // ...
>> xe_force_wake_put(fw, ref);
>> }
>>
>> or:
>>
>> xe_wakeref_t ref;
>>
>> ref = xe_force_wake_get(fw, ALL);
>> if (xe_wakeref_has_domain(ref, d1))
>> // ... critical work1
>> if (xe_wakeref_has_domain(ref, d2))
>> // ... critical work2
>> xe_force_wake_put(fw, ref);
>>
>>
>> so above will be very similar to what you have but by having explicit
>> types IMO it will help connect all functions into proper use-case flow
>
>
> Agreed implementation/usage will be same, will use explicit type for
> clarity.
> IMO typedef unsigned int xe_wakeref_t is sufficient instead of
> typedef unsigned long xe_wakeref_t;
I agree with this.
Regards,
Badal
>
>
>>
>>>
>>>
>>>>
>>>> // for single domain where ret=0 is success, ret<0 is error
>>>
>>> This leads to caller only calling xe_force_wake_put incase of get
>>> success. so in case of caller continuing with failure, he will need to
>>> ensure the put is not called.
>>>
>>> for example:
>>> int ret;
>>>
>>> ret = xe_force_wake_get(fw, DOMAIN_GT);
>>> XE_WARN_ON(ret)
>>> if(!ret)
>>> xe_force_wake_put(fw, DOMAIN_GT);
>>>
>>>> int xe_force_wake_get(fw, enum xe_force_wake_domain_id id);
>>>> void xe_force_wake_put(fw, enum xe_force_wake_domain_id id);
>>>>
>>>> and
>>>>
>>>> // for all domain where ret=0 is success, ret<0 is error
>>>> int int xe_force_wake_get_all(fw);
>>>> void xe_force_wake_put_all(fw);
>>>
>>> In case of xe_force_wake_get_all(fw) failure, how the caller will know
>>> which domains got awake and which failed ?
>>>
>>> ret = xe_force_wake_get_all(fw);
>>> if(!ret)
>>> No way to put awake domains to sleep
>>
>> in case of failure, it would be the responsibility of the
>> xe_force_wake_get_all() to put all partial awakes immediately, since it
>> failed to awake all requested domains (same as in single domain case)
>>
>> but let's drop this idea
>>
>>>
>>>>
>>>> and
>>>>
>>>> // input: mask of domains, return: mask of domain
>>>> unsigned int xe_force_wake_get_mask(fw, mask);
>>>> void xe_force_wake_put_mask(fw, mask);
>>>>
>>>> this last one can be just main implementation (static or public if we
>>>> really want to continue with random set of enabled domains)
>>>>
>>>>> + */
>>>>> int xe_force_wake_get(struct xe_force_wake *fw,
>>>>> enum xe_force_wake_domains domains)
>>>>> {
>>>>> struct xe_gt *gt = fw->gt;
>>>>> struct xe_force_wake_domain *domain;
>>>>> - enum xe_force_wake_domains tmp, woken = 0;
>>>>> + enum xe_force_wake_domains tmp, awake_rqst = 0, awake_ack = 0;
>>>>
>>>> it looks that you're abusing even more all enum variables by treating
>>>> them as plain integers
>>>
>>> Miss at my end. Will address them in next version.
>>>
>>>>
>>>>> unsigned long flags;
>>>>> - int ret = 0;
>>>>> + int ret = domains;
>>>>> spin_lock_irqsave(&fw->lock, flags);
>>>>> for_each_fw_domain_masked(domain, domains, fw, tmp) {
>>>>> if (!domain->ref++) {
>>>>> - woken |= BIT(domain->id);
>>>>> + awake_rqst |= BIT(domain->id);
>>>>> domain_wake(gt, domain);
>>>>> }
>>>>> }
>>>>> - for_each_fw_domain_masked(domain, woken, fw, tmp) {
>>>>> - ret |= domain_wake_wait(gt, domain);
>>>>> + for_each_fw_domain_masked(domain, awake_rqst, fw, tmp) {
>>>>> + if (domain_wake_wait(gt, domain) == 0) {
>>>>> + awake_ack |= BIT(domain->id);
>>>>> + } else {
>>>>> + ret &= ~BIT(domain->id);
>>>>> + --domain->ref;
>>>>> + }
>>>>> }
>>>>> - fw->awake_domains |= woken;
>>>>> +
>>>>> + fw->awake_domains |= awake_ack;
>>>>> spin_unlock_irqrestore(&fw->lock, flags);
>>>>> return ret;
More information about the Intel-xe
mailing list