[Intel-gfx] [PATCH] drm/i915/gt: Drop the timeline->mutex as we wait for retirement

Mika Kuoppala mika.kuoppala at linux.intel.com
Tue Mar 3 13:40:07 UTC 2020


Chris Wilson <chris at chris-wilson.co.uk> writes:

> As we have pinned the timeline (using tl->active_count), we can safely
> drop the tl->mutex as we wait for what we believe to be the final
> request on that timeline. This is useful for ensuring that we do not
> block the engine heartbeat by hogging the kernel_context's timeline on a
> dead GPU.
>
> References: https://gitlab.freedesktop.org/drm/intel/issues/1364
> Fixes: 058179e72e09 ("drm/i915/gt: Replace hangcheck by heartbeats")
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala at linux.intel.com>
> ---
>  drivers/gpu/drm/i915/gt/intel_gt_requests.c | 11 +++++++++--
>  1 file changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> index 8a5054f21bf8..436412d07689 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> @@ -147,24 +147,31 @@ long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout)
>  
>  			fence = i915_active_fence_get(&tl->last_request);
>  			if (fence) {
> +				mutex_unlock(&tl->mutex);
> +
>  				timeout = dma_fence_wait_timeout(fence,
>  								 interruptible,
>  								 timeout);
>  				dma_fence_put(fence);
> +
> +				if (!mutex_trylock(&tl->mutex)) {

If you can't take it, it must be active and for the retirement
advancement we can bail out early.

Or is there something else with a sampled try?

> +					active_count++;
> +					goto out_active;
> +				}
>  			}
>  		}
>  
>  		if (!retire_requests(tl) || flush_submission(gt))
>  			active_count++;
> +		mutex_unlock(&tl->mutex);
>  
> -		spin_lock(&timelines->lock);
> +out_active:	spin_lock(&timelines->lock);
>  
>  		/* Resume iteration after dropping lock */

You either fixed this comment with this patch.
Or that the comment remains a highly confusing.

>  		list_safe_reset_next(tl, tn, link);
>  		if (atomic_dec_and_test(&tl->active_count))
>  			list_del(&tl->link);

We have the timelines lock and the above seems safe
wtithout the actual mutex.

But the comment is still hauting me.
-Mika

>  
> -		mutex_unlock(&tl->mutex);
>  
>  		/* Defer the final release to after the spinlock */
>  		if (refcount_dec_and_test(&tl->kref.refcount)) {
> -- 
> 2.25.1


More information about the Intel-gfx mailing list