[PATCH v5 01/23] drm/xe: Error handling in xe_force_wake_get()
Ghimiray, Himal Prasad
himal.prasad.ghimiray at intel.com
Wed Sep 25 16:36:11 UTC 2024
On 25-09-2024 17:31, Michal Wajdeczko wrote:
>
>
> On 24.09.2024 14:16, Himal Prasad Ghimiray wrote:
>> If an acknowledgment timeout occurs for a domain awake request, do not
>> increment the reference count for the domain. This ensures that
>> subsequent _get calls do not incorrectly assume the domain is awake. The
>> return value is a mask of domains whose reference counts were
>> incremented, and these domains need to be released using
>> xe_force_wake_put.
>>
>> The caller needs to compare the return value with the input domains to
>> determine the success or failure of the operation and decide whether to
>> continue or return accordingly.
>>
>> While at it, add simple kernel-doc for xe_force_wake_get()
>>
>> v3
>> - Use explicit type for mask (Michal/Badal)
>> - Improve kernel-doc (Michal)
>> - Use unsigned int instead of abusing enum (Michal)
>>
>> v5
>> - Use unsigned int for return (MattB/Badal/Rodrigo)
>> - use xe_gt_WARN for domain awake ack failure (Badal/Rodrigo)
>>
>> Cc: Michal Wajdeczko <michal.wajdeczko at intel.com>
>> Cc: Badal Nilawar <badal.nilawar at intel.com>
>> Cc: Rodrigo Vivi <rodrigo.vivi at intel.com>
>> Cc: Lucas De Marchi <lucas.demarchi at intel.com>
>> Cc: Nirmoy Das <nirmoy.das at intel.com>
>> Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray at intel.com>
>> ---
>> drivers/gpu/drm/xe/xe_force_wake.c | 37 +++++++++++++++++++++++-------
>> drivers/gpu/drm/xe/xe_force_wake.h | 4 ++--
>> 2 files changed, 31 insertions(+), 10 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_force_wake.c b/drivers/gpu/drm/xe/xe_force_wake.c
>> index a64c14757c84..d190aa93be90 100644
>> --- a/drivers/gpu/drm/xe/xe_force_wake.c
>> +++ b/drivers/gpu/drm/xe/xe_force_wake.c
>> @@ -150,28 +150,49 @@ static int domain_sleep_wait(struct xe_gt *gt,
>> (ffs(tmp__) - 1))) && \
>> domain__->reg_ctl.addr)
>>
>> -int xe_force_wake_get(struct xe_force_wake *fw,
>> - enum xe_force_wake_domains domains)
>> +/**
>> + * xe_force_wake_get() : Increase the domain refcount
>> + * @fw: struct xe_force_wake
>> + * @domains: forcewake domains to get refcount on
>> + *
>> + * This function takes references for the input @domains and wakes them if
>> + * they are asleep.
>
> nit: likely we should also add the note that one shall call the
> xe_force_wake_put() function to decrease refcounts
Sure.
>
>> + *
>> + * Return: mask of refcount increased domains.
>
> do we really need to expose implementation detail to the caller?
>
> can't we just treat the returned value as opaque data that just needs to
> be passed to the matching xe_force_wake_put(fw, ref) call ?
>
>> If the return value is
>> + * equal to the input parameter @domains, the operation is considered
>> + * successful.
>
> sorry, but I'm still little uncomfortable with such approach, due to a
> mismatch between input parameter type that is enum xe_force_wake_domains
> and return type which is unsigned int, as IMO forcing the user to
> compare two different types seems wrong
Sure, I understand the thought process behind it. Will a helper to do it
is OK. Justifying below for the helper.
>
> can't we just say that if returned value is zero then no domains were
> waken (which would provide definite answer if single domain was requested) ?
I am concerned about the scenario, where user ends up considering non
zero return of FORCEWAKE_ALL as success.
Having multiple success or failure condition for same API, based on
input parameter looks bad design to me.
ref = xe_force_wake_get(fw, GT)
if (ref) -> means success condn
whereas
ref = xe_force_wake_put(fw, ALL)
if (ref) -> doesn't necessarily means successfully awakened all domain.
User can very easily interpret it wrongly:
ref = xe_force_wake_get(fw, ALL)
if(!ref)
error_handling;
/* Irrespective of failure or success of xe_force_wake_get code reaches
here */
do_multiple_operations(); /* Needed all fw domains supported on GT to be
awake */
xe_force_wake_put(fw, ref);
We have use-cases throughout the driver where user relies on
success/failure for FORCEWAKE_ALL domains.
>
> and for the xe_force_wake_get(fw, ALL_DOMAINS) case, we can provide
> helper function that will check if specified domain is really awake:
>
> ref = xe_force_wake_put(fw, ALL_DOMAIN)
>
> if (ref) {
> xe_force_wake_is_awake(fw, SINGLE_DOMAIN)
I assume this API is to check whether ref has domain in it or not. Will
be good helper to have for such scenarios, Which are currently not there
in driver.
>
> xe_force_wake_put(fw, ref)
> }
>
I believe a helper to determine success or failure of force_wake_get,
instead of caller doing it will streamline all the domains and usecases.
int xe_force_wake_get_status(enum domain, unsigned int fw_ref)
{
return (fw_ref == domain) ? 0 : -ETIMEDOUT;
}
Usecases:
A) No error check, just continue in case of failure too.
ref = xe_force_wake_get(fw, domain /* can be ALL_DOMAIN */);
do_operations();
xe_force_wake_put(fw, ref);
B) error check, abort in case of failure.
ref = xe_force_wake_get(fw, domain /* can be ALL_DOMAIN */);
int err = xe_force_wake_get_status(domain, ref);
if(err) {
xe_force_wake_put(fw, ref);
return err;
}
do_operations();
xe_force_wake_put(fw, ref);
c) get all domain, but check specific domain:
ref = xe_force_wake_get(fw, ALL_domain);
if (xe_force_wake_get_status(domain, ref))
dmesg_warn( "unable to awake all requested domain \n");
if (xe_fwref_has_domain(fw, SINGLE_DOMAIN))
do_operations()
xe_force_wake_put(fw, ref);
>> Otherwise, the operation is considered a failure, and
>> + * the caller should handle the failure case, potentially returning
>> + * -ETIMEDOUT.
>> + */
>> +unsigned int xe_force_wake_get(struct xe_force_wake *fw,
>> + enum xe_force_wake_domains domains)
>> {
>> struct xe_gt *gt = fw->gt;
>> struct xe_force_wake_domain *domain;
>> - enum xe_force_wake_domains tmp, woken = 0;
>> + unsigned int tmp, ret, awake_rqst = 0, awake_failed = 0;
>> unsigned long flags;
>> - int ret = 0;
>>
>> spin_lock_irqsave(&fw->lock, flags);
>> for_each_fw_domain_masked(domain, domains, fw, tmp) {
>> if (!domain->ref++) {
>> - woken |= BIT(domain->id);
>> + awake_rqst |= BIT(domain->id);
>> domain_wake(gt, domain);
>> }
>> }
>> - for_each_fw_domain_masked(domain, woken, fw, tmp) {
>> - ret |= domain_wake_wait(gt, domain);
>> + for_each_fw_domain_masked(domain, awake_rqst, fw, tmp) {
>> + if (domain_wake_wait(gt, domain) == 0) {
>> + fw->awake_domains |= BIT(domain->id);
>> + } else {
>> + awake_failed |= BIT(domain->id);
>> + --domain->ref;
>> + }
>> }
>> - fw->awake_domains |= woken;
>> + ret = (domains & ~awake_failed);
>> spin_unlock_irqrestore(&fw->lock, flags);
>>
>> + xe_gt_WARN(gt, awake_failed, "domain%s %#x failed to acknowledgment awake\n",
>> + str_plural(hweight_long(awake_failed)), awake_failed);
>> +
>> return ret;
>> }
>>
>> diff --git a/drivers/gpu/drm/xe/xe_force_wake.h b/drivers/gpu/drm/xe/xe_force_wake.h
>> index a2577672f4e3..6c1ade39139b 100644
>> --- a/drivers/gpu/drm/xe/xe_force_wake.h
>> +++ b/drivers/gpu/drm/xe/xe_force_wake.h
>> @@ -15,8 +15,8 @@ void xe_force_wake_init_gt(struct xe_gt *gt,
>> struct xe_force_wake *fw);
>> void xe_force_wake_init_engines(struct xe_gt *gt,
>> struct xe_force_wake *fw);
>> -int xe_force_wake_get(struct xe_force_wake *fw,
>> - enum xe_force_wake_domains domains);
>> +unsigned int xe_force_wake_get(struct xe_force_wake *fw,
>> + enum xe_force_wake_domains domains);
>> int xe_force_wake_put(struct xe_force_wake *fw,
>> enum xe_force_wake_domains domains);
>>
>
More information about the Intel-xe
mailing list