[PATCH v2 01/23] drm/xe: Error handling in xe_force_wake_get()

Ghimiray, Himal Prasad himal.prasad.ghimiray at intel.com
Fri Sep 13 03:59:17 UTC 2024



On 13-09-2024 03:01, Michal Wajdeczko wrote:
> 
> 
> On 12.09.2024 21:15, Himal Prasad Ghimiray wrote:
>> If an acknowledgment timeout occurs for a domain awake request, do not
>> increment the reference count for the domain. This ensures that
>> subsequent _get calls do not incorrectly assume the domain is awake. The
>> return value is a mask of domains whose reference counts were
>> incremented, and these domains need to be released using
>> xe_force_wake_put.
>>
>> The caller needs to compare the return value with the input domains to
>> determine the success or failure of the operation and decide whether to
>> continue or return accordingly.
>>
>> While at it, add simple kernel-doc for xe_force_wake_get()
>>
>> Cc: Badal Nilawar <badal.nilawar at intel.com>
>> Cc: Rodrigo Vivi <rodrigo.vivi at intel.com>
>> Cc: Lucas De Marchi <lucas.demarchi at intel.com>
>> Cc: Nirmoy Das <nirmoy.das at intel.com>
>> Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray at intel.com>
>> ---
>>   drivers/gpu/drm/xe/xe_force_wake.c | 35 +++++++++++++++++++++++++-----
>>   1 file changed, 29 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_force_wake.c b/drivers/gpu/drm/xe/xe_force_wake.c
>> index a64c14757c84..fa42d652d23f 100644
>> --- a/drivers/gpu/drm/xe/xe_force_wake.c
>> +++ b/drivers/gpu/drm/xe/xe_force_wake.c
>> @@ -150,26 +150,49 @@ static int domain_sleep_wait(struct xe_gt *gt,
>>   					 (ffs(tmp__) - 1))) && \
>>   					 domain__->reg_ctl.addr)
>>   
>> +/**
>> + * xe_force_wake_get : Increase the domain refcount; if it was 0 initially, wake the domain
> 
> while likely this is still recognized by the kernel-doc tool, this is
> not correct notation for the function() documentation


I assume you are suggesting %s/xe_force_wake_get/xe_force_wake_get()
will fix it.


> 
> [1] https://docs.kernel.org/doc-guide/kernel-doc.html#function-documentation
> 
>> + * @fw: struct xe_force_wake
>> + * @domains: forcewake domains to get refcount on
>> + *
>> + * Increment refcount for the force-wake domain. If the domain is
>> + * asleep, awaken it and wait for acknowledgment within the specified
>> + * timeout. If a timeout occurs, decrement the refcount.
> 
> not sure if doc shall be 1:1 of low level implementation details

Does this sound okay ?
This function takes references for the input @domains and wakes them if 
they are asleep.

> 
>> + * The caller should compare the return value with the @domains to
>> + * determine the success or failure of the operation.
>> + *
>> + * Return: mask of refcount increased domains.
> 
> if we return a 'mask' then maybe it should be of 'unsigned int' type?

Agreed. Will fix in next version.

> 
>> If the return value is
>> + * equal to the input parameter @domains, the operation is considered
>> + * successful. Otherwise, the operation is considered a failure, and
>> + * the caller should handle the failure case, potentially returning
>> + * -ETIMEDOUT.
> 
> it looks that all problems with the nice API is due to the
> XE_FORCEWAKE_ALL that is not a single domain ID and requires extra care
> 
> maybe there should be different pair of functions:

I am not convinced with different pair of functions:

In current implementation:

int mask = xe_force_wake_get(fw, domains)
if (mask != domains) {
	Non critical path continue with warning;
	 or
	critical path:
		xe_force_wake_put(fw, mask);
		return -ETIMEDOUT;
}

do_ops;
xe_force_wake_put(fw, mask);
return err;

Above flow remains intact irrespective of individual domains or 
FORCEWAKE_ALL.

In case of individual domains if (mask != domains) can be replaced with
(!mask) and user can avoid xe_force_wake_put(fw, mask) in failure path 
since mask is 0;


> 
> // for single domain where ret=0 is success, ret<0 is error

This leads to caller only calling xe_force_wake_put incase of get 
success. so in case of caller continuing with failure, he will need to 
ensure the put is not called.

for example:
int ret;

ret = xe_force_wake_get(fw, DOMAIN_GT);
XE_WARN_ON(ret)
if(!ret)
	xe_force_wake_put(fw, DOMAIN_GT);

> int xe_force_wake_get(fw, enum xe_force_wake_domain_id id);
> void xe_force_wake_put(fw, enum xe_force_wake_domain_id id);
> 
> and
> 
> // for all domain where ret=0 is success, ret<0 is error
> int int xe_force_wake_get_all(fw);
> void xe_force_wake_put_all(fw);

In case of xe_force_wake_get_all(fw) failure, how the caller will know 
which domains got awake and which failed ?

ret = xe_force_wake_get_all(fw);
if(!ret)
    No way to put awake domains to sleep

> 
> and
> 
> // input: mask of domains, return: mask of domain
> unsigned int xe_force_wake_get_mask(fw, mask);
> void xe_force_wake_put_mask(fw, mask);
> 
> this last one can be just main implementation (static or public if we
> really want to continue with random set of enabled domains)
> 
>> + */
>>   int xe_force_wake_get(struct xe_force_wake *fw,
>>   		      enum xe_force_wake_domains domains)
>>   {
>>   	struct xe_gt *gt = fw->gt;
>>   	struct xe_force_wake_domain *domain;
>> -	enum xe_force_wake_domains tmp, woken = 0;
>> +	enum xe_force_wake_domains tmp, awake_rqst = 0, awake_ack = 0;
> 
> it looks that you're abusing even more all enum variables by treating
> them as plain integers

Miss at my end. Will address them in next version.

> 
>>   	unsigned long flags;
>> -	int ret = 0;
>> +	int ret = domains;
>>   
>>   	spin_lock_irqsave(&fw->lock, flags);
>>   	for_each_fw_domain_masked(domain, domains, fw, tmp) {
>>   		if (!domain->ref++) {
>> -			woken |= BIT(domain->id);
>> +			awake_rqst |= BIT(domain->id);
>>   			domain_wake(gt, domain);
>>   		}
>>   	}
>> -	for_each_fw_domain_masked(domain, woken, fw, tmp) {
>> -		ret |= domain_wake_wait(gt, domain);
>> +	for_each_fw_domain_masked(domain, awake_rqst, fw, tmp) {
>> +		if (domain_wake_wait(gt, domain) == 0) {
>> +			awake_ack |= BIT(domain->id);
>> +		} else {
>> +			ret &= ~BIT(domain->id);
>> +			--domain->ref;
>> +		}
>>   	}
>> -	fw->awake_domains |= woken;
>> +
>> +	fw->awake_domains |= awake_ack;
>>   	spin_unlock_irqrestore(&fw->lock, flags);
>>   
>>   	return ret;


More information about the Intel-xe mailing list