[Intel-gfx] [PATCH v3 2/2] drm/i915: Never return 0 if not all requests retired

Janusz Krzysztofik janusz.krzysztofik at linux.intel.com
Wed Nov 23 09:28:40 UTC 2022


Hi Tvrtko,

Thanks for your comments.

On Tuesday, 22 November 2022 11:50:38 CET Tvrtko Ursulin wrote:
> 
> On 21/11/2022 14:56, Janusz Krzysztofik wrote:
> > Users of intel_gt_retire_requests_timeout() expect 0 return value on
> > success.  However, we have no protection from passing back 0 potentially
> > returned by a call to dma_fence_wait_timeout() when it succedes right
> > after its timeout has expired.
> 
> Is this talking about a potential weakness, or ambiguous kerneldoc, of 
> dma_fence_wait_timeout, dma_fence_default_wait and 
> i915_request_wait_timeout? They appear to say 0 return means timeout, 
> implying unsignaled fence. In other words signaled must return positive 
> remaining timeout. Implementations seems to allow a race which indeed 
> appears that return 0 and signaled fence is possible.

While my initial analysis was indeed focused on inconsistent semantics of 0 
return values from different dma_fence_default_wait() backends, I should have 
also mentioned in this commit description that users may perfectly 
call intel_gt_retire_requests_timeout() with 0 timeout, in which case the 
false positive 0 value can be returned regardless of dma_fence_wait_timeout() 
potential issues.  Would you like me to reword and resubmit?

> If dma_fence_wait can indeed return 0 even when a request is signaled, 
> then how is timeout ?: -ETIME below correct? It isn't a chance for false 
> negative in its' callers?

The goal of intel_gt_retire_requests_timeout() is to retire requests.  When 
that goal has been reached, i.e., all requests have been retired, active count 
is 0, and 0 is correctly returned, regardless of timeout value.

The value of timeout is used only when there are still pending requests, which 
means that the goal hasn't been reached and the function hasn't succeeded.  
Then, no false negative is possible, unlike the false positive that we now 
have when we return 0  while some requests are still pending.

Thanks,
Janusz

> 
> Regards,
> 
> Tvrtko
> 
> > Replace 0 with -ETIME before potentially using the timeout value as return
> > code, so -ETIME is returned if there are still some requests not retired
> > after timeout, 0 otherwise.
> > 
> > v3: Use conditional expression, more compact but also better reflecting
> >      intention standing behind the change.
> > 
> > v2: Move the added lines down so flush_submission() is not affected.
> > 
> > Fixes: f33a8a51602c ("drm/i915: Merge wait_for_timelines with retire_request")
> > Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik at linux.intel.com>
> > Reviewed-by: Andrzej Hajda <andrzej.hajda at intel.com>
> > Cc: stable at vger.kernel.org # v5.5+
> > ---
> >   drivers/gpu/drm/i915/gt/intel_gt_requests.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> > index edb881d756309..1dfd01668c79c 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> > @@ -199,7 +199,7 @@ out_active:	spin_lock(&timelines->lock);
> >   	if (remaining_timeout)
> >   		*remaining_timeout = timeout;
> >   
> > -	return active_count ? timeout : 0;
> > +	return active_count ? timeout ?: -ETIME : 0;
> >   }
> >   
> >   static void retire_work_handler(struct work_struct *work)
> 






More information about the Intel-gfx mailing list