[Intel-gfx] [PATCH] drm/i915: Kill context before taking ctx->mutex

Fri Jul 3 10:35:55 UTC 2020

Op 02-07-2020 om 16:51 schreef Tvrtko Ursulin:
> On 02/07/2020 14:26, Maarten Lankhorst wrote:
>> Op 30-06-2020 om 16:16 schreef Tvrtko Ursulin:
>>> On 24/06/2020 12:05, Maarten Lankhorst wrote:
>>>> Killing context before taking ctx->mutex fixes a hang in
>>>> gem_ctx_persistence.close-replace-race, where lut_close
>>>> takes obj->resv.lock which is already held by execbuf,
>>>> causing a stalling indefinitely.
>>> If this is the consequence of inverting the locking order I think you need to move the fix earlier in the series, to precede the patch which creates the inversion. Otherwise AFAICT the re-order of kill_context vs lut_close seems fine.
>> Yeah, it was just a bugfix I found when looking at the code, if you review it I can push it now so I don't have to resend.  :)
> You are saying it's a bug in drm-tip today?
>
> From the commit:
>
> [ 1904.342847] 2 locks held by gem_ctx_persist/11520:
> [ 1904.342849]  #0: ffff8882188e4968 (&ctx->mutex){+.+.}-{3:3}, at: context_close+0xe6/0x850 [i915]
> [ 1904.342941]  #1: ffff88821c58a5a8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: lut_close+0x2c2/0xba0 [i915]
> [ 1904.343033] 3 locks held by gem_ctx_persist/11521:
> [ 1904.343035]  #0: ffffc900008ff938 (reservation_ww_class_acquire){+.+.}-{0:0}, at: i915_gem_do_execbuffer+0x103d/0x54c0 [i915]
> [ 1904.343157]  #1: ffff88821c58a5a8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: eb_validate_vmas+0x602/0x2010 [i915]
> [ 1904.343267]  #2: ffff88820afd9200 (&vm->mutex/1){+.+.}-{3:3}, at: i915_vma_pin_ww+0x335/0x2300 [i915]
>
> I don't see two inverted locks in two threads - what is happening causing "stalling" - deadlock? Livelock?
>
> Regards,
>
> Tvrtko

This patch can probably be removed now that lut_lock is split out as a spinlock.