lock/unlock mismatch in ttm_bo.c

Wed Jan 24 09:12:48 UTC 2018

Am 24.01.2018 um 10:10 schrieb Christian König:
> Am 24.01.2018 um 10:06 schrieb Chunming Zhou:
>>
>>
>> On 2018年01月24日 16:58, Christian König wrote:
>>> NAK, that doesn't looks correct to me. unlock_resv means that the 
>>> function is allowed to unlock the reservation.
>>>
>>> So the original logic seems to do exactly what it is supposed to do. 
>>> Please keep in mind that reservation_object_trylock returns boolean, 
>>> not negative error code.
>> 'unlock_resv = true' means ttm_bo_cleanup_refs() is locked outside, 
>> then reservation_object_trylock will return false, then original code 
>> 'if (unlock_resv && !reservation_object_trylock(bo->resv))' will 
>> always be true, and then return directly.
>> Is my understanding right? is that expected?
>
> Ah, crap you are right. That must be "!unlock_resv && !reserv...".

No wait a moment. Your initial assumption seems to be incorrect: 
'unlock_resv = true' means ttm_bo_cleanup_refs() is locked outside

When unlock_resv = false then that means ttm_bo_cleanup_refs() is locked 
outside.

And when we assume this, the original code is indeed right.

Christian.

>
> But why do you want to set unlock_resv to true?
>
> Regards,
> Christian.
>
>>
>> Regards,
>> David Zhou
>>
>>>
>>> Regards,
>>> Christian.
>>>
>>> Am 24.01.2018 um 06:46 schrieb Chunming Zhou:
>>>> update the fix.
>>>>
>>>>
>>>> On 2018年01月24日 11:09, Chunming Zhou wrote:
>>>>> Hi Tom,
>>>>>
>>>>> Your change looks ok, as Roger suggested, you can send both dri 
>>>>> and amd mail lists.
>>>>>
>>>>> In addition, when I review your patches, I found a bug as the 
>>>>> attached, you can send it together with yours if you think that's 
>>>>> a right fix.
>>>>>
>>>>> Regards,
>>>>>
>>>>> David Zhou
>>>>>
>>>>>
>>>>> On 2018年01月24日 03:25, Tom St Denis wrote:
>>>>>> On 22/01/18 01:42 AM, Chunming Zhou wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 2018年01月20日 02:23, Tom St Denis wrote:
>>>>>>>> On 19/01/18 01:14 PM, Tom St Denis wrote:
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> In the function ttm_bo_cleanup_refs() it seems possible to get 
>>>>>>>>> to line 551 without entering the block on 516 which means 
>>>>>>>>> you'll be unlocking a mutex that wasn't locked.
>>>>>>>>>
>>>>>>>>> Now it might be that in the course of the API this pattern 
>>>>>>>>> cannot be expressed but it's not clear from the function alone 
>>>>>>>>> that that is the case.
>>>>>>>>
>>>>>>>>
>>>>>>>> Looking further it seems the behaviour depends on locking in 
>>>>>>>> parent callers.  That's kinda a no-no right? Shouldn't the lock 
>>>>>>>> be taken/released in the same function ideally?
>>>>>>> Same feelings
>>>>>>>
>>>>>>> Regards,
>>>>>>> David Zhou
>>>>>>
>>>>>> Attached is a patch that addresses this.
>>>>>>
>>>>>> I can't see any obvious race in functions that call 
>>>>>> ttm_bo_cleanup_refs() between the time they let go of the lock 
>>>>>> and the time it's taken again in the call.
>>>>>>
>>>>>> Running it on my system doesn't produce anything notable though 
>>>>>> the KASAN with DRI_PRIME=1 issue is still there (this patch 
>>>>>> neither causes that nor fixes it).
>>>>>>
>>>>>> Tom
>>>>>
>>>>
>>>
>>
>