[PATCH v3 2/2] drm/i915: Never return 0 if not all requests retired

Tvrtko Ursulin tvrtko.ursulin at linux.intel.com
Wed Nov 23 12:57:26 UTC 2022


On 23/11/2022 09:28, Janusz Krzysztofik wrote:
> Hi Tvrtko,
> 
> Thanks for your comments.
> 
> On Tuesday, 22 November 2022 11:50:38 CET Tvrtko Ursulin wrote:
>>
>> On 21/11/2022 14:56, Janusz Krzysztofik wrote:
>>> Users of intel_gt_retire_requests_timeout() expect 0 return value on
>>> success.  However, we have no protection from passing back 0 potentially
>>> returned by a call to dma_fence_wait_timeout() when it succedes right
>>> after its timeout has expired.
>>
>> Is this talking about a potential weakness, or ambiguous kerneldoc, of
>> dma_fence_wait_timeout, dma_fence_default_wait and
>> i915_request_wait_timeout? They appear to say 0 return means timeout,
>> implying unsignaled fence. In other words signaled must return positive
>> remaining timeout. Implementations seems to allow a race which indeed
>> appears that return 0 and signaled fence is possible.
> 
> While my initial analysis was indeed focused on inconsistent semantics of 0
> return values from different dma_fence_default_wait() backends, I should have
> also mentioned in this commit description that users may perfectly
> call intel_gt_retire_requests_timeout() with 0 timeout, in which case the
> false positive 0 value can be returned regardless of dma_fence_wait_timeout()
> potential issues.  Would you like me to reword and resubmit?

Not sure yet.

So the only caller which passes in zero to 
intel_gt_retire_requests_timeout appears to be intel_gt_retire_requests 
and it eats the return value anyway so this patch is immaterial for that 
one.

I guess it can change how intel_gt_wait_for_idle behaves with short-ish 
timeouts. In case this race is hit. But then wouldn't it make sense to 
follow up with a patch which addresses this race by re-checking the "is 
signaled" when timeout expires, either in dma_fence_wait_timeout, or to 
intel_gt_retire_requests_timeout. Or if not that at least better 
document the dma_fence_wait_timeout and/or 
intel_gt_retire_requests_timeout. Makes sense?

Regards,

Tvrtko

> 
>> If dma_fence_wait can indeed return 0 even when a request is signaled,
>> then how is timeout ?: -ETIME below correct? It isn't a chance for false
>> negative in its' callers?
> 
> The goal of intel_gt_retire_requests_timeout() is to retire requests.  When
> that goal has been reached, i.e., all requests have been retired, active count
> is 0, and 0 is correctly returned, regardless of timeout value.
> 
> The value of timeout is used only when there are still pending requests, which
> means that the goal hasn't been reached and the function hasn't succeeded.
> Then, no false negative is possible, unlike the false positive that we now
> have when we return 0  while some requests are still pending.
> 
> Thanks,
> Janusz
> 
>>
>> Regards,
>>
>> Tvrtko
>>
>>> Replace 0 with -ETIME before potentially using the timeout value as return
>>> code, so -ETIME is returned if there are still some requests not retired
>>> after timeout, 0 otherwise.
>>>
>>> v3: Use conditional expression, more compact but also better reflecting
>>>       intention standing behind the change.
>>>
>>> v2: Move the added lines down so flush_submission() is not affected.
>>>
>>> Fixes: f33a8a51602c ("drm/i915: Merge wait_for_timelines with retire_request")
>>> Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik at linux.intel.com>
>>> Reviewed-by: Andrzej Hajda <andrzej.hajda at intel.com>
>>> Cc: stable at vger.kernel.org # v5.5+
>>> ---
>>>    drivers/gpu/drm/i915/gt/intel_gt_requests.c | 2 +-
>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
>>> index edb881d756309..1dfd01668c79c 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
>>> @@ -199,7 +199,7 @@ out_active:	spin_lock(&timelines->lock);
>>>    	if (remaining_timeout)
>>>    		*remaining_timeout = timeout;
>>>    
>>> -	return active_count ? timeout : 0;
>>> +	return active_count ? timeout ?: -ETIME : 0;
>>>    }
>>>    
>>>    static void retire_work_handler(struct work_struct *work)
>>
> 
> 
> 
> 


More information about the dri-devel mailing list