[Intel-gfx] [PATCH v6 63/64] drm/i915: Move gt_revoke() slightly

Thomas Hellström thomas.hellstrom at linux.intel.com
Mon Jan 18 15:05:08 UTC 2021


On 1/18/21 3:46 PM, Maarten Lankhorst wrote:
> Op 18-01-2021 om 14:28 schreef Thomas Hellström:
>> On 1/18/21 2:22 PM, Thomas Hellström wrote:
>>> On 1/18/21 1:01 PM, Maarten Lankhorst wrote:
>>>> Op 18-01-2021 om 12:11 schreef Thomas Hellström:
>>>>> On 1/5/21 4:35 PM, Maarten Lankhorst wrote:
>>>>>> We get a lockdep splat when the reset mutex is held, because it can be
>>>>>> taken from fence_wait. This conflicts with the mmu notifier we have,
>>>>>> because we recurse between reset mutex and mmap lock -> mmu notifier.
>>>>>>
>>>>>> Remove this recursion by calling revoke_mmaps before taking the lock.
>>>>> Hmm. Is the mmap se taken from gt_revoke()?
>>>>>
>>>>> If so, isn't the real problem that the mmap_sem is taken in the dma_fence critical path (where the reset code sits)?
>>>> Hey,
>>>>
>>>> The gpu reset code specifically needs to revoke all gtt mappings, and the fault handler uses intel_gt_reset_trylock(),
>>>>
>>>> so this change should be ok since all those mappings are invalidated correctly and completed before this point.
>>>>
>>>> The reset mutex isn't actually taken inside fence code, but used for lockdep validation, so this should be ok.
>>>>
>>>> ~Maarten
>>> Hmm, OK but then we still have the following established locking order.
>>>
>>> lock(fence_signaling)
>>> lock(i_mmap_lock)
>>>
>>> But in the notifier
>>>
>>> lock(i_mmap_lock)
>>> fence_signaling(within notifier)
>>>
>>> So gt_revoke() is violating dma-fence rules.
>>>
>>> BTW it looks to me like the reset mutex notation is actually doing much the same as the dma-fence annotations; While we can move gt_revoke() out of the reset mutex, that only gives us false hopes since it moves it out of the equivalent dma-fence annotation. I figure the reason this was not seen before the new code is that the reset mutex lockdep isn't taken when waiting for active. Only when waiting for dma-fence, but IMO the root problem is pre-existing.
>>>
>>> /Thomas
>>>
>>>
>> The interesting scenario is
>>
>> thread 1:
>> take i_mmap_lock()
>> enter_mmu_notifier()
>> wait_fence()
>>
>> thread 2:
>> need_to_reset_gpu_for_the_above_fence();
>> take i_mmap_lock()
>>
>> Deadlock.
>>
>> /Thomas
>>
>>
> Yeah, I think gpu reset isn't completely following lockdep rules yet. Thread 1 isn't doing anything wrong, gpu reset probably should stop revoking gt bindings, and allow some garbage during reset. I don't see another way out. :-/

Me neither.

But to silence lockdep until dma_fence annotation is widely added:

Reviewed-by: Thomas Hellström <thomas.hellstrom at linux.intel.com>




More information about the Intel-gfx mailing list