[Intel-gfx] [PATCH] drm/i915: Update to post-reset execlist queue clean-up

Tue Dec 1 03:46:59 PST 2015

On 23/10/15 18:02, Tomas Elf wrote:
> When clearing an execlist queue, instead of traversing it and unreferencing all
> requests while holding the spinlock (which might lead to thread sleeping with
> IRQs are turned off - bad news!), just move all requests to the retire request
> list while holding spinlock and then drop spinlock and invoke the execlists
> request retirement path, which already deals with the intricacies of
> purging/dereferencing execlist queue requests.
>
> This patch can be considered v3 of:
>
> 	commit b96db8b81c54ef30485ddb5992d63305d86ea8d3
> 	Author: Tomas Elf <tomas.elf at intel.com>
> 	drm/i915: Grab execlist spinlock to avoid post-reset concurrency issues
>
> This patch assumes v2 of the above patch is part of the baseline, reverts v2
> and adds changes on top to turn it into v3.
>
> Signed-off-by: Tomas Elf <tomas.elf at intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> Cc: Chris Wilson <chris at chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_gem.c | 15 ++++-----------
>   1 file changed, 4 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 2c7a0b7..b492603 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2756,20 +2756,13 @@ static void i915_gem_reset_ring_cleanup(struct drm_i915_private *dev_priv,
>
>   	if (i915.enable_execlists) {
>   		spin_lock_irq(&ring->execlist_lock);
> -		while (!list_empty(&ring->execlist_queue)) {
> -			struct drm_i915_gem_request *submit_req;
>
> -			submit_req = list_first_entry(&ring->execlist_queue,
> -					struct drm_i915_gem_request,
> -					execlist_link);
> -			list_del(&submit_req->execlist_link);
> +		/* list_splice_tail_init checks for empty lists */
> +		list_splice_tail_init(&ring->execlist_queue,
> +				      &ring->execlist_retired_req_list);
>
> -			if (submit_req->ctx != ring->default_context)
> -				intel_lr_context_unpin(submit_req);
> -
> -			i915_gem_request_unreference(submit_req);
> -		}
>   		spin_unlock_irq(&ring->execlist_lock);
> +		intel_execlists_retire_requests(ring);
>   	}
>
>   	/*
>

Fallen through the cracks..

This looks to be even more serious, since lockdep notices possible 
deadlock involving vmap_area_lock:

  Possible interrupt unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(vmap_area_lock);
                                local_irq_disable();
                                lock(&(&ring->execlist_lock)->rlock);
                                lock(vmap_area_lock);
   <Interrupt>
     lock(&(&ring->execlist_lock)->rlock);

  *** DEADLOCK ***

Because it unpins LRC context and ringbuffer which ends up in the VM 
code under the execlist_lock.

intel_execlists_retire_requests is slightly different from the code in 
the reset handler because it concerns itself with ctx_obj existence 
which the other one doesn't.

Could people more knowledgeable of this code check if it is OK and R-B?

Regards,

Tvrtko