[Intel-gfx] [PATCH] drm/i915: Update to post-reset execlist queue clean-up
Dave Gordon
david.s.gordon at intel.com
Fri Dec 11 06:14:00 PST 2015
On 01/12/15 11:46, Tvrtko Ursulin wrote:
>
> On 23/10/15 18:02, Tomas Elf wrote:
>> When clearing an execlist queue, instead of traversing it and
>> unreferencing all
>> requests while holding the spinlock (which might lead to thread
>> sleeping with
>> IRQs are turned off - bad news!), just move all requests to the retire
>> request
>> list while holding spinlock and then drop spinlock and invoke the
>> execlists
>> request retirement path, which already deals with the intricacies of
>> purging/dereferencing execlist queue requests.
>>
>> This patch can be considered v3 of:
>>
>> commit b96db8b81c54ef30485ddb5992d63305d86ea8d3
>> Author: Tomas Elf <tomas.elf at intel.com>
>> drm/i915: Grab execlist spinlock to avoid post-reset concurrency
>> issues
>>
>> This patch assumes v2 of the above patch is part of the baseline,
>> reverts v2
>> and adds changes on top to turn it into v3.
>>
>> Signed-off-by: Tomas Elf <tomas.elf at intel.com>
>> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>> Cc: Chris Wilson <chris at chris-wilson.co.uk>
>> ---
>> drivers/gpu/drm/i915/i915_gem.c | 15 ++++-----------
>> 1 file changed, 4 insertions(+), 11 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_gem.c
>> b/drivers/gpu/drm/i915/i915_gem.c
>> index 2c7a0b7..b492603 100644
>> --- a/drivers/gpu/drm/i915/i915_gem.c
>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>> @@ -2756,20 +2756,13 @@ static void i915_gem_reset_ring_cleanup(struct
>> drm_i915_private *dev_priv,
>>
>> if (i915.enable_execlists) {
>> spin_lock_irq(&ring->execlist_lock);
>> - while (!list_empty(&ring->execlist_queue)) {
>> - struct drm_i915_gem_request *submit_req;
>>
>> - submit_req = list_first_entry(&ring->execlist_queue,
>> - struct drm_i915_gem_request,
>> - execlist_link);
>> - list_del(&submit_req->execlist_link);
>> + /* list_splice_tail_init checks for empty lists */
>> + list_splice_tail_init(&ring->execlist_queue,
>> + &ring->execlist_retired_req_list);
>>
>> - if (submit_req->ctx != ring->default_context)
>> - intel_lr_context_unpin(submit_req);
>> -
>> - i915_gem_request_unreference(submit_req);
>> - }
>> spin_unlock_irq(&ring->execlist_lock);
>> + intel_execlists_retire_requests(ring);
>> }
>>
>> /*
>
> Fallen through the cracks..
>
> This looks to be even more serious, since lockdep notices possible
> deadlock involving vmap_area_lock:
>
> Possible interrupt unsafe locking scenario:
>
> CPU0 CPU1
> ---- ----
> lock(vmap_area_lock);
> local_irq_disable();
> lock(&(&ring->execlist_lock)->rlock);
> lock(vmap_area_lock);
> <Interrupt>
> lock(&(&ring->execlist_lock)->rlock);
>
> *** DEADLOCK ***
>
> Because it unpins LRC context and ringbuffer which ends up in the VM
> code under the execlist_lock.
>
> intel_execlists_retire_requests is slightly different from the code in
> the reset handler because it concerns itself with ctx_obj existence
> which the other one doesn't.
>
> Could people more knowledgeable of this code check if it is OK and R-B?
>
> Regards,
>
> Tvrtko
Hi Tvrtko,
I didn't understand this message at first, I thought you'd found a
problem with this ("v3") patch, but now I see what you actually meant is
that there is indeed a problem with the (v2) that got merged, not the
original question about unreferencing an object while holding a spinlock
(because it can't be the last reference), but rather because of the
unpin, which can indeed cause a problem with a non-i915-defined kernel lock.
So we should certainly update the current (v2) upstream with this.
Thomas Daniel already R-B'd this code on 23rd October, when it was:
[PATCH v3 7/8] drm/i915: Grab execlist spinlock to avoid post-reset
concurrency issues.
and it hasn't changed in substance since then, so you can carry his R-B
over, plus I said on that same day that this was a better solution. So:
Reviewed-by: Thomas Daniel <thomas.daniel at intel.com>
Reviewed-by: Dave Gordon <dave.gordon at intel.com>
More information about the Intel-gfx
mailing list