[PATCH v5 01/23] drm/xe: Error handling in xe_force_wake_get()

Michal Wajdeczko michal.wajdeczko at intel.com
Wed Sep 25 12:01:28 UTC 2024



On 24.09.2024 14:16, Himal Prasad Ghimiray wrote:
> If an acknowledgment timeout occurs for a domain awake request, do not
> increment the reference count for the domain. This ensures that
> subsequent _get calls do not incorrectly assume the domain is awake. The
> return value is a mask of domains whose reference counts were
> incremented, and these domains need to be released using
> xe_force_wake_put.
> 
> The caller needs to compare the return value with the input domains to
> determine the success or failure of the operation and decide whether to
> continue or return accordingly.
> 
> While at it, add simple kernel-doc for xe_force_wake_get()
> 
> v3
> - Use explicit type for mask (Michal/Badal)
> - Improve kernel-doc (Michal)
> - Use unsigned int instead of abusing enum (Michal)
> 
> v5
> - Use unsigned int for return (MattB/Badal/Rodrigo)
> - use xe_gt_WARN for domain awake ack failure (Badal/Rodrigo)
> 
> Cc: Michal Wajdeczko <michal.wajdeczko at intel.com>
> Cc: Badal Nilawar <badal.nilawar at intel.com>
> Cc: Rodrigo Vivi <rodrigo.vivi at intel.com>
> Cc: Lucas De Marchi <lucas.demarchi at intel.com>
> Cc: Nirmoy Das <nirmoy.das at intel.com>
> Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray at intel.com>
> ---
>  drivers/gpu/drm/xe/xe_force_wake.c | 37 +++++++++++++++++++++++-------
>  drivers/gpu/drm/xe/xe_force_wake.h |  4 ++--
>  2 files changed, 31 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_force_wake.c b/drivers/gpu/drm/xe/xe_force_wake.c
> index a64c14757c84..d190aa93be90 100644
> --- a/drivers/gpu/drm/xe/xe_force_wake.c
> +++ b/drivers/gpu/drm/xe/xe_force_wake.c
> @@ -150,28 +150,49 @@ static int domain_sleep_wait(struct xe_gt *gt,
>  					 (ffs(tmp__) - 1))) && \
>  					 domain__->reg_ctl.addr)
>  
> -int xe_force_wake_get(struct xe_force_wake *fw,
> -		      enum xe_force_wake_domains domains)
> +/**
> + * xe_force_wake_get() : Increase the domain refcount
> + * @fw: struct xe_force_wake
> + * @domains: forcewake domains to get refcount on
> + *
> + * This function takes references for the input @domains and wakes them if
> + * they are asleep.

nit: likely we should also add the note that one shall call the
xe_force_wake_put() function to decrease refcounts

> + *
> + * Return: mask of refcount increased domains. 

do we really need to expose implementation detail to the caller?

can't we just treat the returned value as opaque data that just needs to
be passed to the matching xe_force_wake_put(fw, ref) call ?

> If the return value is
> + * equal to the input parameter @domains, the operation is considered
> + * successful. 

sorry, but I'm still little uncomfortable with such approach, due to a
mismatch between input parameter type that is enum xe_force_wake_domains
and return type which is unsigned int, as IMO forcing the user to
compare two different types seems wrong

can't we just say that if returned value is zero then no domains were
waken (which would provide definite answer if single domain was requested) ?

and for the xe_force_wake_get(fw, ALL_DOMAINS) case, we can provide
helper function that will check if specified domain is really awake:

	ref = xe_force_wake_put(fw, ALL_DOMAIN)

	if (ref) {
		xe_force_wake_is_awake(fw, SINGLE_DOMAIN)

		xe_force_wake_put(fw, ref)
	}

> Otherwise, the operation is considered a failure, and
> + * the caller should handle the failure case, potentially returning
> + * -ETIMEDOUT.
> + */
> +unsigned int xe_force_wake_get(struct xe_force_wake *fw,
> +			       enum xe_force_wake_domains domains)
>  {
>  	struct xe_gt *gt = fw->gt;
>  	struct xe_force_wake_domain *domain;
> -	enum xe_force_wake_domains tmp, woken = 0;
> +	unsigned int tmp, ret, awake_rqst = 0, awake_failed = 0;
>  	unsigned long flags;
> -	int ret = 0;
>  
>  	spin_lock_irqsave(&fw->lock, flags);
>  	for_each_fw_domain_masked(domain, domains, fw, tmp) {
>  		if (!domain->ref++) {
> -			woken |= BIT(domain->id);
> +			awake_rqst |= BIT(domain->id);
>  			domain_wake(gt, domain);
>  		}
>  	}
> -	for_each_fw_domain_masked(domain, woken, fw, tmp) {
> -		ret |= domain_wake_wait(gt, domain);
> +	for_each_fw_domain_masked(domain, awake_rqst, fw, tmp) {
> +		if (domain_wake_wait(gt, domain) == 0) {
> +			fw->awake_domains |= BIT(domain->id);
> +		} else {
> +			awake_failed |= BIT(domain->id);
> +			--domain->ref;
> +		}
>  	}
> -	fw->awake_domains |= woken;
> +	ret = (domains & ~awake_failed);
>  	spin_unlock_irqrestore(&fw->lock, flags);
>  
> +	xe_gt_WARN(gt, awake_failed, "domain%s %#x failed to acknowledgment awake\n",
> +		   str_plural(hweight_long(awake_failed)), awake_failed);
> +
>  	return ret;
>  }
>  
> diff --git a/drivers/gpu/drm/xe/xe_force_wake.h b/drivers/gpu/drm/xe/xe_force_wake.h
> index a2577672f4e3..6c1ade39139b 100644
> --- a/drivers/gpu/drm/xe/xe_force_wake.h
> +++ b/drivers/gpu/drm/xe/xe_force_wake.h
> @@ -15,8 +15,8 @@ void xe_force_wake_init_gt(struct xe_gt *gt,
>  			   struct xe_force_wake *fw);
>  void xe_force_wake_init_engines(struct xe_gt *gt,
>  				struct xe_force_wake *fw);
> -int xe_force_wake_get(struct xe_force_wake *fw,
> -		      enum xe_force_wake_domains domains);
> +unsigned int xe_force_wake_get(struct xe_force_wake *fw,
> +			       enum xe_force_wake_domains domains);
>  int xe_force_wake_put(struct xe_force_wake *fw,
>  		      enum xe_force_wake_domains domains);
>  



More information about the Intel-xe mailing list