[Intel-gfx] [PATCH v6 63/64] drm/i915: Move gt_revoke() slightly

Thomas Hellström thomas.hellstrom at linux.intel.com
Mon Jan 18 13:28:02 UTC 2021


On 1/18/21 2:22 PM, Thomas Hellström wrote:
>
> On 1/18/21 1:01 PM, Maarten Lankhorst wrote:
>> Op 18-01-2021 om 12:11 schreef Thomas Hellström:
>>> On 1/5/21 4:35 PM, Maarten Lankhorst wrote:
>>>> We get a lockdep splat when the reset mutex is held, because it can be
>>>> taken from fence_wait. This conflicts with the mmu notifier we have,
>>>> because we recurse between reset mutex and mmap lock -> mmu notifier.
>>>>
>>>> Remove this recursion by calling revoke_mmaps before taking the lock.
>>> Hmm. Is the mmap se taken from gt_revoke()?
>>>
>>> If so, isn't the real problem that the mmap_sem is taken in the 
>>> dma_fence critical path (where the reset code sits)?
>> Hey,
>>
>> The gpu reset code specifically needs to revoke all gtt mappings, and 
>> the fault handler uses intel_gt_reset_trylock(),
>>
>> so this change should be ok since all those mappings are invalidated 
>> correctly and completed before this point.
>>
>> The reset mutex isn't actually taken inside fence code, but used for 
>> lockdep validation, so this should be ok.
>>
>> ~Maarten
>
> Hmm, OK but then we still have the following established locking order.
>
> lock(fence_signaling)
> lock(i_mmap_lock)
>
> But in the notifier
>
> lock(i_mmap_lock)
> fence_signaling(within notifier)
>
> So gt_revoke() is violating dma-fence rules.
>
> BTW it looks to me like the reset mutex notation is actually doing 
> much the same as the dma-fence annotations; While we can move 
> gt_revoke() out of the reset mutex, that only gives us false hopes 
> since it moves it out of the equivalent dma-fence annotation. I figure 
> the reason this was not seen before the new code is that the reset 
> mutex lockdep isn't taken when waiting for active. Only when waiting 
> for dma-fence, but IMO the root problem is pre-existing.
>
> /Thomas
>
>
The interesting scenario is

thread 1:
take i_mmap_lock()
enter_mmu_notifier()
wait_fence()

thread 2:
need_to_reset_gpu_for_the_above_fence();
take i_mmap_lock()

Deadlock.

/Thomas




More information about the Intel-gfx mailing list