[PATCH v3 0/2] drm/tests: Fix for UAF and a test for drm_exec lock alloc tracking warning
Thomas Hellström
thomas.hellstrom at linux.intel.com
Fri Sep 8 14:31:36 UTC 2023
On 9/8/23 13:13, Thomas Hellström wrote:
>
> On 9/8/23 11:14, Christian König wrote:
>> Am 08.09.23 um 11:04 schrieb Thomas Hellström:
>>>
>>> On 9/8/23 10:52, Christian König wrote:
>>>> Am 08.09.23 um 09:37 schrieb Thomas Hellström:
>>>>> Hi,
>>>>>
>>>>> On 9/7/23 16:49, Christian König wrote:
>>>>>> Am 07.09.23 um 16:47 schrieb Thomas Hellström:
>>>>>>> Hi,
>>>>>>>
>>>>>>> On 9/7/23 16:37, Christian König wrote:
>>>>>>>> Am 07.09.23 um 15:53 schrieb Thomas Hellström:
>>>>>>>>> While trying to replicate a weird drm_exec lock alloc tracking
>>>>>>>>> warning
>>>>>>>>> using the drm_exec kunit test, the warning was shadowed by a
>>>>>>>>> UAF warning
>>>>>>>>> from KASAN due to a bug in the drm kunit helpers.
>>>>>>>>>
>>>>>>>>> Patch 1 fixes that drm kunit UAF.
>>>>>>>>> Patch 2 introduces a drm_exec kunit subtest that fails if the
>>>>>>>>> conditions
>>>>>>>>> for the weird warning are met.
>>>>>>>>>
>>>>>>>>> The series previously also had a patch with a drm_exec
>>>>>>>>> workaround for the
>>>>>>>>> warning but that patch has already been commited to
>>>>>>>>> drm_misc_next_fixes.
>>>>>>>>
>>>>>>>> Thinking more about this what happens when somebody calls
>>>>>>>> drm_exec_unlock_obj() on the first locked object?
>>>>>>>>
>>>>>>> Essentially the same thing. I've been thinking of the best way
>>>>>>> to handle that, but not sure what's the best one.
>>>>>>
>>>>>> Well what does lockdep store in that object in the first place?
>>>>>> Could we fix that somehow?
>>>>>
>>>>> Lockdep maintains an array of held locks (lock classes) for each
>>>>> task. Upon freeing, that list is traversed to see if the address
>>>>> matches the stored memory address. This also has the interesting
>>>>> side effect that IICR dma_resv_assert_held() checks if *any*
>>>>> dma_resv is held....
>>>>>
>>>>> Ideally each object would have its own class instance, but I think
>>>>> some applications would then exhaust the array size.
>>>>
>>>> IIRC Daniel once explained to me that he designed lockdep for
>>>> ww_mutexes like this for some reason, but I don't remember the
>>>> details any more.
>>>>
>>>> Maybe lockdep wouldn't otherwise be able to deal with the fact that
>>>> you could lock them in any order or something like that.
>>>
>>> Oh, that's well handled with the mutex_lock_nest_lock() type of
>>> annotation that's used for WW mutexes. IIRC the problem is that
>>> lockdep can't really deal with either that vast number of locks
>>> overall or the vast number of held locks per process.
>>
>> Could we somehow teach lockdep that multiple locks of a lock class
>> can be held at the same time? E.g. like a reference count in the
>> lockclass or something like that?
>>
>>>
>>>>
>>>>>
>>>>> I'll dig a bit deeper into this.
>>>>>
>>>>>
>>>>> Meanwhile for the unlock problem, looking at how the unlocks are
>>>>> used in i915 it's typically locks that are grabbed during eviction
>>>>> and released again once validation of a single object succeeded.
>>>>> The risk of them ending up at the first lock is small, unless they
>>>>> are prelocked as the contended lock. But for these "temporary"
>>>>> objects, the prelocked lock is immediately dropped after locking
>>>>> and are only used to find something suitable to wait for to relax
>>>>> the ww transaction.
>>>>
>>>> Yeah, I don't see this as an use case in reality. It's more of a
>>>> "what if?" thing.
>>>
>>> Oh, it's a real use-case. As soon as you start having sleeping locks
>>> for eviction you hit it, in particular with WW mutex slowpath
>>> debugging. And we will need to work on improving TTM support for
>>> that for xe.
>>
>> Oh, good point! When we have contention on a lock, rollback and take
>> that lock then first it can be that this lock then needs to be
>> unlocked again. Unlikely, but certainly possible.
>>
>> Sounds like we really need to fix this in lockdep then.
>
> So it seems lockdep *does* reference counting in this case, but stores
> the address of the first locked lockdep map, and then subsequently
> uses it for various things. In short freeing the first lock isn't
> something lockdep thinks you should do. Ever.
>
> The good thing about this is that this refcounting appears only done
> on nest locks, that is, when we have a ww context AFAICT. That means
> we can probably store a fake ww_mutex lockdep map with the ww acquire
> context and lock it when we initialize the context and unlock it on
> ww_acquire_fini().
>
> Should take care of the problem I think, although the problem of
> lockdep_assert() and lock freeing granularity will remain. It looks
> like there is a comparison function one can optionally set to make
> different objects look separate to lockdep. Probably something to
> think of for enhanced debugging with a limited set of locked objects.
>
> Need to also check what happens if we do a sequence of successful
> trylocks.
OK, nested trylocks indeed seem to store one instance per lock, so not
prone to the problem.
For locks under a ww_acquire_ctx, the solution outlined above appears to
work, and it's restricted to lockdep code only.
/Thomas
>
> /Thomas
>
>>
>> Christian.
>>
>>>
>>>>
>>>>>
>>>>> If we were to implement something similar in drm_exec, we'd need
>>>>> an interface to mark an object as "temporary" when locking, and
>>>>> make sure we drop those objects if they end up as "prelocked".
>>>>> Personally I think this solution works well and would be my
>>>>> preferred choice.
>>>>>
>>>>> Yet another alternative would be to keep a reference even of the
>>>>> unlocked objects...
>>>>>
>>>>> But these workarounds ofc only push the problem out of drm_exec.
>>>>> Users of raw dma-resv or ww mutexes would still wonder what's
>>>>> going on.
>>>>
>>>> Agree, completely. This is really a bug in lockdep or rather how we
>>>> designed to implement ww_mutexes in lockdep and should therefore be
>>>> fixed there I think.
>>>
>>>
>>>>
>>>> Christian.
>>>>
>>>>>
>>>>> /Thomas
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> Christian.
>>>>>>
>>>>>>>
>>>>>>> /Thomas
>>>>>>>
>>>>>>>
>>>>>>>> Christian.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> v2:
>>>>>>>>> - Rewording of commit messages
>>>>>>>>> - Add some commit message tags
>>>>>>>>> v3:
>>>>>>>>> - Remove an already committed patch
>>>>>>>>> - Rework the test to not require dmesg inspection (Maxime Ripard)
>>>>>>>>> - Condition the test on CONFIG_LOCK_ALLOC
>>>>>>>>> - Update code comments and commit messages (Maxime Ripard)
>>>>>>>>>
>>>>>>>>> Cc: Maxime Ripard <mripard at kernel.org>
>>>>>>>>> Cc: Christian König <christian.koenig at amd.com>
>>>>>>>>>
>>>>>>>>> Thomas Hellström (2):
>>>>>>>>> drm/tests: helpers: Avoid a driver uaf
>>>>>>>>> drm/tests/drm_exec: Add a test for object freeing within
>>>>>>>>> drm_exec_fini()
>>>>>>>>>
>>>>>>>>> drivers/gpu/drm/tests/drm_exec_test.c | 82
>>>>>>>>> +++++++++++++++++++++++++++
>>>>>>>>> include/drm/drm_kunit_helpers.h | 4 +-
>>>>>>>>> 2 files changed, 85 insertions(+), 1 deletion(-)
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>>
More information about the dri-devel
mailing list