Possible lock inversion in ttm_bo_vm_access
Koenig, Christian
Christian.Koenig at amd.com
Sun Oct 14 18:16:20 UTC 2018
Hi Thomas,
> that the access() handler took a shortcut when the new locking order
> was established
There is no new locking order, the access handler is just for debugging
and ignoring the correct locking order between mmap_sem and bo_reserve.
That this is throwing a lockdep warning is perfectly possible. We should
probably move that to a trylock instead.
> bo_reserve()
> copy_to_user() / copy_from_user()
> bo_unreserve()
That one is illegal for a completely different reason.
The address accessed by copy_to_user()/copy_from_user() could be a BO
itself, so to resolve this we could end up locking a BO twice.
Adding a might_lock() to the beginning of ttm_bo_vm_fault as you
suggested doesn't work either, because at this point the mmap_sem is
still locked.
So lockdep would complain about the incorrect bo_reserve and mmap_sem order.
Christian.
Am 13.10.2018 um 21:04 schrieb Thomas Hellstrom:
> Hi, Christian,
>
> On 10/13/2018 07:36 PM, Christian König wrote:
>> Hi Thomas,
>>
>>> bo_reserve()
>>> copy_to_user() / copy_from_user()
>>> bo_unreserve()
>>
>> That pattern is illegal for a number of reasons and the mmap_sem is
>> only one of it.
>>
>> So the locking order must always be mmap_sem->bo_reservation. See the
>> userptr implementation in amdgpu as well.
>>
>> Christian.
>
> I'm not arguing against that, and since vmwgfx doesn't use that
> pattern, the locking order doesn't really matter to me since it's even
> possible to make the TTM fault() handler more well-behaved if we were
> to fix the locking order to mmap_sem->bo_reserve.
>
> My concern is, since the _opposite_ locking order is (admittedly
> vaguely) documented in the fault handler, that the access() handler
> took a shortcut when the new locking order was established possibly
> without auditing of the other TTM drivers for locking inversion: For
> example it looks from a quick glance like
> nouveau_gem_pushbuf_reloc_apply() calls copy_from_user() with bo's
> reserved (which IIRC was the typical use-case at the time this was
> last lifted). And lockdep won't trip unless the access() callback is
> actually called.
>
> My point is if AMD wants to enforce this locking order, then IMHO the
> other drivers need to be audited and corrected if they are assuming
> the locking order documented in fault(). A good way to catch such
> drivers would be to add that might_lock().
>
> Thanks,
> Thomas
>
>
>>
>> Am 12.10.2018 um 16:52 schrieb Thomas Hellstrom:
>>> Hi, Felix,
>>>
>>> It looks like there is a locking inversion in ttm_bo_vm_access()
>>> where we take a sleeping bo_reserve() while holding mmap_sem().
>>>
>>> Previously we've been assuming the other way around or at least
>>> undefined allowing for drivers to do
>>>
>>> bo_reserve()
>>> copy_to_user() / copy_from_user()
>>> bo_unreserve()
>>>
>>> I'm not sure the latter pattern is used in any drivers, though, and
>>> I guess there are ways around it. So it might make sense to fix the
>>> locking order at this point. In that case, perhaps one should add a
>>>
>>> might_lock(&bo->resv->lock.base);
>>>
>>> at the start of the TTM fault handler to trip lockdep on locking
>>> order violations in situations where the access() callback isn't
>>> commonly used...
>>>
>>> /Thomas
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> dri-devel mailing list
>>> dri-devel at lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>
>
More information about the dri-devel
mailing list