[Intel-gfx] [PATCH v2 7/8] drm/i915: Grab execlist spinlock to avoid post-reset concurrency issues.

Dave Gordon david.s.gordon at intel.com
Fri Oct 23 05:49:57 PDT 2015


On 23/10/15 12:02, Tomas Elf wrote:
> On 23/10/2015 09:59, Daniel Vetter wrote:
>> On Fri, Oct 23, 2015 at 09:42:27AM +0100, Tvrtko Ursulin wrote:
>>>
>>> On 19/10/15 16:32, Tomas Elf wrote:
>>>> Grab execlist lock when cleaning up execlist queues after GPU reset
>>>> to avoid
>>>> concurrency problems between the context event interrupt handler and
>>>> the reset
>>>> path immediately following a GPU reset.
>>>>
>>>> * v2 (Chris Wilson):
>>>> Do execlist check and use simpler form of spinlock functions.
>>>>
>>>> Signed-off-by: Tomas Elf <tomas.elf at intel.com>
>>>> ---
>>>>   drivers/gpu/drm/i915/i915_gem.c | 23 ++++++++++++++---------
>>>>   1 file changed, 14 insertions(+), 9 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/i915_gem.c
>>>> b/drivers/gpu/drm/i915/i915_gem.c
>>>> index e57061a..2c7a0b7 100644
>>>> --- a/drivers/gpu/drm/i915/i915_gem.c
>>>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>>>> @@ -2753,18 +2753,23 @@ static void
>>>> i915_gem_reset_ring_cleanup(struct drm_i915_private *dev_priv,
>>>>        * are the ones that keep the context and ringbuffer backing
>>>> objects
>>>>        * pinned in place.
>>>>        */
>>>> -    while (!list_empty(&ring->execlist_queue)) {
>>>> -        struct drm_i915_gem_request *submit_req;
>>>>
>>>> -        submit_req = list_first_entry(&ring->execlist_queue,
>>>> -                struct drm_i915_gem_request,
>>>> -                execlist_link);
>>>> -        list_del(&submit_req->execlist_link);
>>>> +    if (i915.enable_execlists) {
>>>> +        spin_lock_irq(&ring->execlist_lock);
>>>> +        while (!list_empty(&ring->execlist_queue)) {
>>>> +            struct drm_i915_gem_request *submit_req;
>>>>
>>>> -        if (submit_req->ctx != ring->default_context)
>>>> -            intel_lr_context_unpin(submit_req);
>>>> +            submit_req = list_first_entry(&ring->execlist_queue,
>>>> +                    struct drm_i915_gem_request,
>>>> +                    execlist_link);
>>>> +            list_del(&submit_req->execlist_link);
>>>>
>>>> -        i915_gem_request_unreference(submit_req);
>>>> +            if (submit_req->ctx != ring->default_context)
>>>> +                intel_lr_context_unpin(submit_req);
>>>> +
>>>> +            i915_gem_request_unreference(submit_req);
>>>> +        }
>>>> +        spin_unlock_irq(&ring->execlist_lock);
>>>
>>> Can't this, in theory at least, end up calling
>>> drm_gem_object_unreference in
>>> a couple of code paths, which can sleep, while holding a spinlock?
>>>
>>> If this cannot happen in practice for some reason it would probably
>>> be good
>>> to put a comment explaining it.
>>>
>>> Or I am missing something?
>>
>> It can indeed I think. Tvrtko, since I'm off to KS next week and you have
>> push access, can you pls push the revert if this checks out?
>
> I've discussed it with Tvrtko and one solution that seems reasonable is
> to hold the execlist lock while traversing the execlist queue and moving
> the elements over to a temporary list. After that we can drop the
> execlist lock, traverse the temporary list and unreference the elements
> while not holding lock. That way we can get rid of the element in a
> synchronized fashion and also unreference without holding the lock. I
> believe this pattern is used elsewhere (such as during request retirement)
>
> Does that sound like a plan or does anyone have a problem with that
> solution?
>
> Thanks,
> Tomas

I think the original is safe, because the execlist queue /shouldn't/ be 
the last reference. But that could change, and in any case in a possibly 
broken state such as may have caused the hang we probably shouldn't rely 
on it. So I think the staged-dequeue-and-release is a better solution :)

.Dave.



More information about the Intel-gfx mailing list