[Intel-gfx] [PATCH v6 63/64] drm/i915: Move gt_revoke() slightly
Thomas Hellström
thomas.hellstrom at linux.intel.com
Mon Jan 18 13:22:58 UTC 2021
On 1/18/21 1:01 PM, Maarten Lankhorst wrote:
> Op 18-01-2021 om 12:11 schreef Thomas Hellström:
>> On 1/5/21 4:35 PM, Maarten Lankhorst wrote:
>>> We get a lockdep splat when the reset mutex is held, because it can be
>>> taken from fence_wait. This conflicts with the mmu notifier we have,
>>> because we recurse between reset mutex and mmap lock -> mmu notifier.
>>>
>>> Remove this recursion by calling revoke_mmaps before taking the lock.
>> Hmm. Is the mmap se taken from gt_revoke()?
>>
>> If so, isn't the real problem that the mmap_sem is taken in the dma_fence critical path (where the reset code sits)?
> Hey,
>
> The gpu reset code specifically needs to revoke all gtt mappings, and the fault handler uses intel_gt_reset_trylock(),
>
> so this change should be ok since all those mappings are invalidated correctly and completed before this point.
>
> The reset mutex isn't actually taken inside fence code, but used for lockdep validation, so this should be ok.
>
> ~Maarten
Hmm, OK but then we still have the following established locking order.
lock(fence_signaling)
lock(i_mmap_lock)
But in the notifier
lock(i_mmap_lock)
fence_signaling(within notifier)
So gt_revoke() is violating dma-fence rules.
BTW it looks to me like the reset mutex notation is actually doing much
the same as the dma-fence annotations; While we can move gt_revoke() out
of the reset mutex, that only gives us false hopes since it moves it out
of the equivalent dma-fence annotation. I figure the reason this was not
seen before the new code is that the reset mutex lockdep isn't taken
when waiting for active. Only when waiting for dma-fence, but IMO the
root problem is pre-existing.
/Thomas
More information about the Intel-gfx
mailing list