[PATCH] drm/ttm: make ttm_bo_unpin more defensive

Christian König ckoenig.leichtzumerken at gmail.com
Mon Mar 15 18:47:00 UTC 2021



Am 15.03.21 um 18:08 schrieb Thomas Hellström (Intel):
>
> On 3/15/21 11:26 AM, Christian König wrote:
>>
>>
>> Am 13.03.21 um 19:29 schrieb Thomas Hellström (Intel):
>>> Hi, Christian
>>>
>>> On 3/12/21 10:38 AM, Christian König wrote:
>>>> We seem to have some more driver bugs than thought.
>>>>
>>>> Signed-off-by: Christian König <christian.koenig at amd.com>
>>>> ---
>>>>   include/drm/ttm/ttm_bo_api.h | 6 ++++--
>>>>   1 file changed, 4 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/include/drm/ttm/ttm_bo_api.h 
>>>> b/include/drm/ttm/ttm_bo_api.h
>>>> index 4fb523dfab32..df9fe596e7c5 100644
>>>> --- a/include/drm/ttm/ttm_bo_api.h
>>>> +++ b/include/drm/ttm/ttm_bo_api.h
>>>> @@ -603,9 +603,11 @@ static inline void ttm_bo_pin(struct 
>>>> ttm_buffer_object *bo)
>>>>   static inline void ttm_bo_unpin(struct ttm_buffer_object *bo)
>>>>   {
>>>>       dma_resv_assert_held(bo->base.resv);
>>>> -    WARN_ON_ONCE(!bo->pin_count);
>>>>       WARN_ON_ONCE(!kref_read(&bo->kref));
>>>> -    --bo->pin_count;
>>>> +    if (bo->pin_count)
>>>> +        --bo->pin_count;
>>>> +    else
>>>> +        WARN_ON_ONCE(true);
>>>>   }
>>>>     int ttm_mem_evict_first(struct ttm_device *bdev,
>>>
>>> Since I now have been staring for half a year at the code of the 
>>> driver that made pinning an art, I have a couple of suggestions 
>>> here, Could we use an atomic for pin_count, allowing unlocked 
>>> unpinning but require the lock only for pin_count transition 0->1, 
>>> (but unlocked for pin_if_already_pinned). In particular I think 
>>> vmwgfx would benefit from unlocked unpins. Also if the atomic were a 
>>> refcount_t, that would probably give you the above behaviour?
>>
>> Nope, I've considered this as well while moving the pin count into TTM.
>>
>> The problem is that you not only need the BO reserved for 0->1 
>> transitions, but also for 1->0 transitions to move the BO on the LRU 
>> correctly.
>
> Ah, and there is no way to have us know the correct LRU list without 
> reservation?

Not really, there is always the chance that CPU A is reducing the count 
from 1->0 while CPU B is doing 0->1 and you end up with a LRU status 
which doesn't match the pin count.

We could try to do something like a loop updating the LRU status until 
it matches the pin count, but the implications of that are usually 
really nasty.

Christian.

>
> /Thomas
>
>
>>
>> Christian.
>>
>>>
>>> /Thomas
>>>
>>>



More information about the dri-devel mailing list