[Intel-gfx] [PATCH 1/3] drm/i915: Fix negative remaining time after retire requests
Das, Nirmoy
nirmoy.das at linux.intel.com
Thu Nov 17 09:58:02 UTC 2022
On 11/16/2022 12:25 PM, Janusz Krzysztofik wrote:
> Commit b97060a99b01 ("drm/i915/guc: Update intel_gt_wait_for_idle to work
> with GuC") extended the API of intel_gt_retire_requests_timeout() with an
> extra argument 'remaining_timeout', intended for passing back unconsumed
> portion of requested timeout when 0 (success) is returned. However, when
> request retirement happens to succeed despite an error returned by
> dma_fence_wait_timeout(), the error code (a negative value) is passed back
> instead of remaining time. If a user then passes that negative value
> forward as requested timeout to another wait, an explicit WARN or BUG can
> be triggered.
>
> Instead of copying the value of timeout variable to *remaining_timeout
> before return, update the *remaining_timeout after each DMA fence wait.
Thanks for the detailed comment, indeed we were not accounting for the
return value of dma_fence_wait_timeout()
Acked-by: Nirmoy Das <nirmoy.das at intel.com>
Thanks,
Nirmoy
> Set it to 0 on -ETIME, -EINTR or -ERESTARTSYS, and assume no time has been
> consumed on other errors returned from the wait.
>
> Fixes: b97060a99b01 ("drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC")
> Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik at linux.intel.com>
> Cc: stable at vger.kernel.org # v5.15+
> ---
> drivers/gpu/drm/i915/gt/intel_gt_requests.c | 23 ++++++++++++++++++---
> 1 file changed, 20 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> index edb881d756309..ccaf2fd80625b 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> @@ -138,6 +138,9 @@ long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout,
> unsigned long active_count = 0;
> LIST_HEAD(free);
>
> + if (remaining_timeout)
> + *remaining_timeout = timeout;
> +
> flush_submission(gt, timeout); /* kick the ksoftirqd tasklets */
> spin_lock(&timelines->lock);
> list_for_each_entry_safe(tl, tn, &timelines->active_list, link) {
> @@ -163,6 +166,23 @@ long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout,
> timeout);
> dma_fence_put(fence);
>
> + if (remaining_timeout) {
> + /*
> + * If we get an error here but request
> + * retirement succeeds anyway
> + * (!active_count) and we return 0, the
> + * caller may want to spend remaining
> + * time on waiting for other events.
> + */
> + if (timeout == -ETIME ||
> + timeout == -EINTR ||
> + timeout == -ERESTARTSYS)
> + *remaining_timeout = 0;
> + else if (timeout >= 0)
> + *remaining_timeout = timeout;
> + /* else assume no time consumed */
> + }
> +
> /* Retirement is best effort */
> if (!mutex_trylock(&tl->mutex)) {
> active_count++;
> @@ -196,9 +216,6 @@ out_active: spin_lock(&timelines->lock);
> if (flush_submission(gt, timeout)) /* Wait, there's more! */
> active_count++;
>
> - if (remaining_timeout)
> - *remaining_timeout = timeout;
> -
> return active_count ? timeout : 0;
> }
>
More information about the dri-devel
mailing list