[Intel-gfx] [PATCH v3 2/2] drm/i915: Never return 0 if not all requests retired
Tvrtko Ursulin
tvrtko.ursulin at linux.intel.com
Wed Nov 23 12:57:26 UTC 2022
On 23/11/2022 09:28, Janusz Krzysztofik wrote:
> Hi Tvrtko,
>
> Thanks for your comments.
>
> On Tuesday, 22 November 2022 11:50:38 CET Tvrtko Ursulin wrote:
>>
>> On 21/11/2022 14:56, Janusz Krzysztofik wrote:
>>> Users of intel_gt_retire_requests_timeout() expect 0 return value on
>>> success. However, we have no protection from passing back 0 potentially
>>> returned by a call to dma_fence_wait_timeout() when it succedes right
>>> after its timeout has expired.
>>
>> Is this talking about a potential weakness, or ambiguous kerneldoc, of
>> dma_fence_wait_timeout, dma_fence_default_wait and
>> i915_request_wait_timeout? They appear to say 0 return means timeout,
>> implying unsignaled fence. In other words signaled must return positive
>> remaining timeout. Implementations seems to allow a race which indeed
>> appears that return 0 and signaled fence is possible.
>
> While my initial analysis was indeed focused on inconsistent semantics of 0
> return values from different dma_fence_default_wait() backends, I should have
> also mentioned in this commit description that users may perfectly
> call intel_gt_retire_requests_timeout() with 0 timeout, in which case the
> false positive 0 value can be returned regardless of dma_fence_wait_timeout()
> potential issues. Would you like me to reword and resubmit?
Not sure yet.
So the only caller which passes in zero to
intel_gt_retire_requests_timeout appears to be intel_gt_retire_requests
and it eats the return value anyway so this patch is immaterial for that
one.
I guess it can change how intel_gt_wait_for_idle behaves with short-ish
timeouts. In case this race is hit. But then wouldn't it make sense to
follow up with a patch which addresses this race by re-checking the "is
signaled" when timeout expires, either in dma_fence_wait_timeout, or to
intel_gt_retire_requests_timeout. Or if not that at least better
document the dma_fence_wait_timeout and/or
intel_gt_retire_requests_timeout. Makes sense?
Regards,
Tvrtko
>
>> If dma_fence_wait can indeed return 0 even when a request is signaled,
>> then how is timeout ?: -ETIME below correct? It isn't a chance for false
>> negative in its' callers?
>
> The goal of intel_gt_retire_requests_timeout() is to retire requests. When
> that goal has been reached, i.e., all requests have been retired, active count
> is 0, and 0 is correctly returned, regardless of timeout value.
>
> The value of timeout is used only when there are still pending requests, which
> means that the goal hasn't been reached and the function hasn't succeeded.
> Then, no false negative is possible, unlike the false positive that we now
> have when we return 0 while some requests are still pending.
>
> Thanks,
> Janusz
>
>>
>> Regards,
>>
>> Tvrtko
>>
>>> Replace 0 with -ETIME before potentially using the timeout value as return
>>> code, so -ETIME is returned if there are still some requests not retired
>>> after timeout, 0 otherwise.
>>>
>>> v3: Use conditional expression, more compact but also better reflecting
>>> intention standing behind the change.
>>>
>>> v2: Move the added lines down so flush_submission() is not affected.
>>>
>>> Fixes: f33a8a51602c ("drm/i915: Merge wait_for_timelines with retire_request")
>>> Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik at linux.intel.com>
>>> Reviewed-by: Andrzej Hajda <andrzej.hajda at intel.com>
>>> Cc: stable at vger.kernel.org # v5.5+
>>> ---
>>> drivers/gpu/drm/i915/gt/intel_gt_requests.c | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
>>> index edb881d756309..1dfd01668c79c 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
>>> @@ -199,7 +199,7 @@ out_active: spin_lock(&timelines->lock);
>>> if (remaining_timeout)
>>> *remaining_timeout = timeout;
>>>
>>> - return active_count ? timeout : 0;
>>> + return active_count ? timeout ?: -ETIME : 0;
>>> }
>>>
>>> static void retire_work_handler(struct work_struct *work)
>>
>
>
>
>
More information about the Intel-gfx
mailing list