[PATCH] drm/ttm: make ttm_bo_unpin more defensive

Thomas Hellström (Intel) thomas_os at shipmail.org
Mon Mar 15 19:00:30 UTC 2021


On 3/15/21 7:47 PM, Christian König wrote:
>
>
> Am 15.03.21 um 18:08 schrieb Thomas Hellström (Intel):
>>
>> On 3/15/21 11:26 AM, Christian König wrote:
>>>
>>>
>>> Am 13.03.21 um 19:29 schrieb Thomas Hellström (Intel):
>>>> Hi, Christian
>>>>
>>>> On 3/12/21 10:38 AM, Christian König wrote:
>>>>> We seem to have some more driver bugs than thought.
>>>>>
>>>>> Signed-off-by: Christian König <christian.koenig at amd.com>
>>>>> ---
>>>>>   include/drm/ttm/ttm_bo_api.h | 6 ++++--
>>>>>   1 file changed, 4 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/include/drm/ttm/ttm_bo_api.h 
>>>>> b/include/drm/ttm/ttm_bo_api.h
>>>>> index 4fb523dfab32..df9fe596e7c5 100644
>>>>> --- a/include/drm/ttm/ttm_bo_api.h
>>>>> +++ b/include/drm/ttm/ttm_bo_api.h
>>>>> @@ -603,9 +603,11 @@ static inline void ttm_bo_pin(struct 
>>>>> ttm_buffer_object *bo)
>>>>>   static inline void ttm_bo_unpin(struct ttm_buffer_object *bo)
>>>>>   {
>>>>>       dma_resv_assert_held(bo->base.resv);
>>>>> -    WARN_ON_ONCE(!bo->pin_count);
>>>>>       WARN_ON_ONCE(!kref_read(&bo->kref));
>>>>> -    --bo->pin_count;
>>>>> +    if (bo->pin_count)
>>>>> +        --bo->pin_count;
>>>>> +    else
>>>>> +        WARN_ON_ONCE(true);
>>>>>   }
>>>>>     int ttm_mem_evict_first(struct ttm_device *bdev,
>>>>
>>>> Since I now have been staring for half a year at the code of the 
>>>> driver that made pinning an art, I have a couple of suggestions 
>>>> here, Could we use an atomic for pin_count, allowing unlocked 
>>>> unpinning but require the lock only for pin_count transition 0->1, 
>>>> (but unlocked for pin_if_already_pinned). In particular I think 
>>>> vmwgfx would benefit from unlocked unpins. Also if the atomic were 
>>>> a refcount_t, that would probably give you the above behaviour?
>>>
>>> Nope, I've considered this as well while moving the pin count into TTM.
>>>
>>> The problem is that you not only need the BO reserved for 0->1 
>>> transitions, but also for 1->0 transitions to move the BO on the LRU 
>>> correctly.
>>
>> Ah, and there is no way to have us know the correct LRU list without 
>> reservation?
>
> Not really, there is always the chance that CPU A is reducing the 
> count from 1->0 while CPU B is doing 0->1 and you end up with a LRU 
> status which doesn't match the pin count.
>
> We could try to do something like a loop updating the LRU status until 
> it matches the pin count, but the implications of that are usually 
> really nasty.
>
OK, yeah I was more thinking along the lines of protecting the LRU 
status with the global lru lock and unpin would then be

if (refcount_dec_and_lock(&bo->pin_count, &ttm_glob.lru_lock)) {
     add_to_relevant_lrus(bo, bo->lru_status);
     spin_unlock(&ttm_glob.lru_lock);
}

But looking at ttm_bo_move_to_lru_tail() I realize that's not really 
trivial anymore. I hope it's doable at some point though.

But meanwhile, perhaps TTM needs to accept and pave over that drivers 
are in fact destroying pinned buffers?

/Thomas





> Christian.
>
>>
>> /Thomas
>>
>>
>>>
>>> Christian.
>>>
>>>>
>>>> /Thomas
>>>>
>>>>


More information about the dri-devel mailing list