[PATCH] drm/ttm: make ttm_bo_unpin more defensive
Thomas Hellström (Intel)
thomas_os at shipmail.org
Mon Mar 15 19:00:30 UTC 2021
On 3/15/21 7:47 PM, Christian König wrote:
>
>
> Am 15.03.21 um 18:08 schrieb Thomas Hellström (Intel):
>>
>> On 3/15/21 11:26 AM, Christian König wrote:
>>>
>>>
>>> Am 13.03.21 um 19:29 schrieb Thomas Hellström (Intel):
>>>> Hi, Christian
>>>>
>>>> On 3/12/21 10:38 AM, Christian König wrote:
>>>>> We seem to have some more driver bugs than thought.
>>>>>
>>>>> Signed-off-by: Christian König <christian.koenig at amd.com>
>>>>> ---
>>>>> include/drm/ttm/ttm_bo_api.h | 6 ++++--
>>>>> 1 file changed, 4 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/include/drm/ttm/ttm_bo_api.h
>>>>> b/include/drm/ttm/ttm_bo_api.h
>>>>> index 4fb523dfab32..df9fe596e7c5 100644
>>>>> --- a/include/drm/ttm/ttm_bo_api.h
>>>>> +++ b/include/drm/ttm/ttm_bo_api.h
>>>>> @@ -603,9 +603,11 @@ static inline void ttm_bo_pin(struct
>>>>> ttm_buffer_object *bo)
>>>>> static inline void ttm_bo_unpin(struct ttm_buffer_object *bo)
>>>>> {
>>>>> dma_resv_assert_held(bo->base.resv);
>>>>> - WARN_ON_ONCE(!bo->pin_count);
>>>>> WARN_ON_ONCE(!kref_read(&bo->kref));
>>>>> - --bo->pin_count;
>>>>> + if (bo->pin_count)
>>>>> + --bo->pin_count;
>>>>> + else
>>>>> + WARN_ON_ONCE(true);
>>>>> }
>>>>> int ttm_mem_evict_first(struct ttm_device *bdev,
>>>>
>>>> Since I now have been staring for half a year at the code of the
>>>> driver that made pinning an art, I have a couple of suggestions
>>>> here, Could we use an atomic for pin_count, allowing unlocked
>>>> unpinning but require the lock only for pin_count transition 0->1,
>>>> (but unlocked for pin_if_already_pinned). In particular I think
>>>> vmwgfx would benefit from unlocked unpins. Also if the atomic were
>>>> a refcount_t, that would probably give you the above behaviour?
>>>
>>> Nope, I've considered this as well while moving the pin count into TTM.
>>>
>>> The problem is that you not only need the BO reserved for 0->1
>>> transitions, but also for 1->0 transitions to move the BO on the LRU
>>> correctly.
>>
>> Ah, and there is no way to have us know the correct LRU list without
>> reservation?
>
> Not really, there is always the chance that CPU A is reducing the
> count from 1->0 while CPU B is doing 0->1 and you end up with a LRU
> status which doesn't match the pin count.
>
> We could try to do something like a loop updating the LRU status until
> it matches the pin count, but the implications of that are usually
> really nasty.
>
OK, yeah I was more thinking along the lines of protecting the LRU
status with the global lru lock and unpin would then be
if (refcount_dec_and_lock(&bo->pin_count, &ttm_glob.lru_lock)) {
add_to_relevant_lrus(bo, bo->lru_status);
spin_unlock(&ttm_glob.lru_lock);
}
But looking at ttm_bo_move_to_lru_tail() I realize that's not really
trivial anymore. I hope it's doable at some point though.
But meanwhile, perhaps TTM needs to accept and pave over that drivers
are in fact destroying pinned buffers?
/Thomas
> Christian.
>
>>
>> /Thomas
>>
>>
>>>
>>> Christian.
>>>
>>>>
>>>> /Thomas
>>>>
>>>>
More information about the dri-devel
mailing list