[PATCH v3 0/2] drm/tests: Fix for UAF and a test for drm_exec lock alloc tracking warning

Fri Sep 8 14:31:36 UTC 2023

On 9/8/23 13:13, Thomas Hellström wrote:
>
> On 9/8/23 11:14, Christian König wrote:
>> Am 08.09.23 um 11:04 schrieb Thomas Hellström:
>>>
>>> On 9/8/23 10:52, Christian König wrote:
>>>> Am 08.09.23 um 09:37 schrieb Thomas Hellström:
>>>>> Hi,
>>>>>
>>>>> On 9/7/23 16:49, Christian König wrote:
>>>>>> Am 07.09.23 um 16:47 schrieb Thomas Hellström:
>>>>>>> Hi,
>>>>>>>
>>>>>>> On 9/7/23 16:37, Christian König wrote:
>>>>>>>> Am 07.09.23 um 15:53 schrieb Thomas Hellström:
>>>>>>>>> While trying to replicate a weird drm_exec lock alloc tracking 
>>>>>>>>> warning
>>>>>>>>> using the drm_exec kunit test, the warning was shadowed by a 
>>>>>>>>> UAF warning
>>>>>>>>> from KASAN due to a bug in the drm kunit helpers.
>>>>>>>>>
>>>>>>>>> Patch 1 fixes that drm kunit UAF.
>>>>>>>>> Patch 2 introduces a drm_exec kunit subtest that fails if the 
>>>>>>>>> conditions
>>>>>>>>>        for the weird warning are met.
>>>>>>>>>
>>>>>>>>> The series previously also had a patch with a drm_exec 
>>>>>>>>> workaround for the
>>>>>>>>> warning but that patch has already been commited to 
>>>>>>>>> drm_misc_next_fixes.
>>>>>>>>
>>>>>>>> Thinking more about this what happens when somebody calls 
>>>>>>>> drm_exec_unlock_obj() on the first locked object?
>>>>>>>>
>>>>>>> Essentially the same thing. I've been thinking of the best way 
>>>>>>> to handle that, but not sure what's the best one.
>>>>>>
>>>>>> Well what does lockdep store in that object in the first place? 
>>>>>> Could we fix that somehow?
>>>>>
>>>>> Lockdep maintains an array of held locks (lock classes) for each 
>>>>> task. Upon freeing, that list is traversed to see if the address 
>>>>> matches the stored memory address. This also has the interesting 
>>>>> side effect that IICR dma_resv_assert_held() checks if *any* 
>>>>> dma_resv is held....
>>>>>
>>>>> Ideally each object would have its own class instance, but I think 
>>>>> some applications would then exhaust the array size.
>>>>
>>>> IIRC Daniel once explained to me that he designed lockdep for 
>>>> ww_mutexes like this for some reason, but I don't remember the 
>>>> details any more.
>>>>
>>>> Maybe lockdep wouldn't otherwise be able to deal with the fact that 
>>>> you could lock them in any order or something like that.
>>>
>>> Oh, that's well handled with the mutex_lock_nest_lock()  type of 
>>> annotation that's used for WW mutexes. IIRC the problem is that 
>>> lockdep can't really deal with either that vast number of locks 
>>> overall or the vast number of held locks per process.
>>
>> Could we somehow teach lockdep that multiple locks of a lock class 
>> can be held at the same time? E.g. like a reference count in the 
>> lockclass or something like that?
>>
>>>
>>>>
>>>>>
>>>>> I'll dig a bit deeper into this.
>>>>>
>>>>>
>>>>> Meanwhile for the unlock problem, looking at how the unlocks are 
>>>>> used in i915 it's typically locks that are grabbed during eviction 
>>>>> and released again once validation of a single object succeeded. 
>>>>> The risk of them ending up at the first lock is small, unless they 
>>>>> are prelocked as the contended lock. But for these "temporary" 
>>>>> objects, the prelocked lock is immediately dropped after locking 
>>>>> and are only used to find something suitable to wait for to relax 
>>>>> the ww transaction.
>>>>
>>>> Yeah, I don't see this as an use case in reality. It's more of a 
>>>> "what if?" thing.
>>>
>>> Oh, it's a real use-case. As soon as you start having sleeping locks 
>>> for eviction you hit it, in particular with WW mutex slowpath 
>>> debugging. And we will need to work on improving TTM support for 
>>> that for xe.
>>
>> Oh, good point! When we have contention on a lock, rollback and take 
>> that lock then first it can be that this lock then needs to be 
>> unlocked again. Unlikely, but certainly possible.
>>
>> Sounds like we really need to fix this in lockdep then.
>
> So it seems lockdep *does* reference counting in this case, but stores 
> the address of the first locked lockdep map, and then subsequently 
> uses it for various things. In short freeing the first lock isn't 
> something lockdep thinks you should do. Ever.
>
> The good thing about this is that this refcounting appears only done 
> on nest locks, that is, when we have a ww context AFAICT. That means 
> we can probably store a fake ww_mutex lockdep map with the ww acquire 
> context and lock it when we initialize the context and unlock it on 
> ww_acquire_fini().
>
> Should take care of the problem I think, although the problem of 
> lockdep_assert() and lock freeing granularity will remain. It looks 
> like there is a comparison function one can optionally set to make 
> different objects look separate to lockdep. Probably something to 
> think of for enhanced debugging with a limited set of locked objects.
>
> Need to also check what happens if we do a sequence of successful 
> trylocks.

OK, nested trylocks indeed seem to store one instance per lock, so not 
prone to the problem.

For locks under a ww_acquire_ctx, the solution outlined above appears to 
work, and it's restricted to lockdep code only.

/Thomas

>
> /Thomas
>
>>
>> Christian.
>>
>>>
>>>>
>>>>>
>>>>> If we were to implement something similar in drm_exec, we'd need 
>>>>> an interface to mark an object as "temporary" when locking, and 
>>>>> make sure we drop those objects if they end up as "prelocked". 
>>>>> Personally I think this solution works well and would be my 
>>>>> preferred choice.
>>>>>
>>>>> Yet another alternative would be to keep a reference even of the 
>>>>> unlocked objects...
>>>>>
>>>>> But these workarounds ofc only push the problem out of drm_exec. 
>>>>> Users of raw dma-resv or ww mutexes would still wonder what's 
>>>>> going on.
>>>>
>>>> Agree, completely. This is really a bug in lockdep or rather how we 
>>>> designed to implement ww_mutexes in lockdep and should therefore be 
>>>> fixed there I think.
>>>
>>>
>>>>
>>>> Christian.
>>>>
>>>>>
>>>>> /Thomas
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> Christian.
>>>>>>
>>>>>>>
>>>>>>> /Thomas
>>>>>>>
>>>>>>>
>>>>>>>> Christian.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> v2:
>>>>>>>>> - Rewording of commit messages
>>>>>>>>> - Add some commit message tags
>>>>>>>>> v3:
>>>>>>>>> - Remove an already committed patch
>>>>>>>>> - Rework the test to not require dmesg inspection (Maxime Ripard)
>>>>>>>>> - Condition the test on CONFIG_LOCK_ALLOC
>>>>>>>>> - Update code comments and commit messages (Maxime Ripard)
>>>>>>>>>
>>>>>>>>> Cc: Maxime Ripard <mripard at kernel.org>
>>>>>>>>> Cc: Christian König <christian.koenig at amd.com>
>>>>>>>>>
>>>>>>>>> Thomas Hellström (2):
>>>>>>>>>    drm/tests: helpers: Avoid a driver uaf
>>>>>>>>>    drm/tests/drm_exec: Add a test for object freeing within
>>>>>>>>>      drm_exec_fini()
>>>>>>>>>
>>>>>>>>>   drivers/gpu/drm/tests/drm_exec_test.c | 82 
>>>>>>>>> +++++++++++++++++++++++++++
>>>>>>>>>   include/drm/drm_kunit_helpers.h       |  4 +-
>>>>>>>>>   2 files changed, 85 insertions(+), 1 deletion(-)
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>>