[PATCH 1/6] drm/ttm: move the LRU into resource handling v4

Wed Mar 23 12:20:32 UTC 2022

Am 23.03.22 um 12:59 schrieb Daniel Vetter:
> On Mon, Mar 21, 2022 at 02:25:56PM +0100, Christian König wrote:
>> This way we finally fix the problem that new resource are
>> not immediately evict-able after allocation.
>>
>> That has caused numerous problems including OOM on GDS handling
>> and not being able to use TTM as general resource manager.
>>
>> v2: stop assuming in ttm_resource_fini that res->bo is still valid.
>> v3: cleanup kerneldoc, add more lockdep annotation
>> v4: consistently use res->num_pages
>>
>> Signed-off-by: Christian König <christian.koenig at amd.com>
>> Tested-by: Bas Nieuwenhuizen <bas at basnieuwenhuizen.nl>
>> +/**
>> + * struct ttm_lru_bulk_move
>> + *
>> + * @tt: first/last lru entry for resources in the TT domain
>> + * @vram: first/last lru entry for resources in the VRAM domain
>> + *
>> + * Helper structure for bulk moves on the LRU list.
>> + */
>> +struct ttm_lru_bulk_move {
>> +	struct ttm_lru_bulk_move_pos tt[TTM_MAX_BO_PRIORITY];
>> +	struct ttm_lru_bulk_move_pos vram[TTM_MAX_BO_PRIORITY];
> Not really needed, just a thought: Should we track the associated dma_resv
> object here to make sure the locking is all done correctly (and also check
> that the bulk move bo have the same dma_resv)? It wouldn't really be any
> overhead for the !CONFIG_LOCKDEP case and we could sprinkle a lot more
> dma_resv_held all over the place.

You made a similar comment on the last revision and I already tried to 
play around with that idea a bit.

But I've completely abandoned that idea after realizing that the BOs in 
the bulk move actually don't need to have the same dma_resv object, nor 
do they all need to be locked.

It just happens that amdgpu is currently using it that way, but I can't 
see any technical necessarily to restrict the bulk move like that.

Regards,
Christian.

> -Daniel