[PATCH 1/6] drm/ttm: move the LRU into resource handling v4

Wed Mar 23 13:44:17 UTC 2022

Am 23.03.22 um 14:36 schrieb Daniel Vetter:
> On Wed, 23 Mar 2022 at 13:20, Christian König
> <ckoenig.leichtzumerken at gmail.com> wrote:
>> Am 23.03.22 um 12:59 schrieb Daniel Vetter:
>>> On Mon, Mar 21, 2022 at 02:25:56PM +0100, Christian König wrote:
>>>> This way we finally fix the problem that new resource are
>>>> not immediately evict-able after allocation.
>>>>
>>>> That has caused numerous problems including OOM on GDS handling
>>>> and not being able to use TTM as general resource manager.
>>>>
>>>> v2: stop assuming in ttm_resource_fini that res->bo is still valid.
>>>> v3: cleanup kerneldoc, add more lockdep annotation
>>>> v4: consistently use res->num_pages
>>>>
>>>> Signed-off-by: Christian König <christian.koenig at amd.com>
>>>> Tested-by: Bas Nieuwenhuizen <bas at basnieuwenhuizen.nl>
>>>> +/**
>>>> + * struct ttm_lru_bulk_move
>>>> + *
>>>> + * @tt: first/last lru entry for resources in the TT domain
>>>> + * @vram: first/last lru entry for resources in the VRAM domain
>>>> + *
>>>> + * Helper structure for bulk moves on the LRU list.
>>>> + */
>>>> +struct ttm_lru_bulk_move {
>>>> +    struct ttm_lru_bulk_move_pos tt[TTM_MAX_BO_PRIORITY];
>>>> +    struct ttm_lru_bulk_move_pos vram[TTM_MAX_BO_PRIORITY];
>>> Not really needed, just a thought: Should we track the associated dma_resv
>>> object here to make sure the locking is all done correctly (and also check
>>> that the bulk move bo have the same dma_resv)? It wouldn't really be any
>>> overhead for the !CONFIG_LOCKDEP case and we could sprinkle a lot more
>>> dma_resv_held all over the place.
>> You made a similar comment on the last revision and I already tried to
>> play around with that idea a bit.
>>
>> But I've completely abandoned that idea after realizing that the BOs in
>> the bulk move actually don't need to have the same dma_resv object, nor
>> do they all need to be locked.
> Uh how does that work? If you evict that bo while someone else is
> doing a bulk move, then at least the result might be rather funny, and
> I have no idea how it could work.

The LRU is still protected by the common spinlock.

So that will synchronize any modification to both the bulk move 
structure as well as the individual list_heads making up the LRU.

>
> Like even if you then make the rule that you have to lock all bos for
> the bulk move, the bo could still be moved independently, and that
> would again break the bulk lru properties.
>
> And if you do none of the above, there's no reason for that bo to have
> a distinct dma_resv.
>
> Like maybe the data structure wont fall apart, but semantically it
> just doesn't make any sense to me to allow this. What would you want
> to use this for?

Yeah, that's a good point.

It's not technically necessary as far as I can see, but I'm not sure if 
there is a valid use case either.

>> It just happens that amdgpu is currently using it that way, but I can't
>> see any technical necessarily to restrict the bulk move like that.
> Yeah we can do that later on in a follow up patch, or I figure out why
> it's not a good idea :-) Just figured this might be good to lock down
> before other drivers start adopting this.

I'm just wondering if it's really more defensive to restrict the 
handling like that.

On the other hand we can still lift the restriction when anybody really 
comes along with a valid use case.

Christian.

> -Daniel
>
>> Regards,
>> Christian.
>>
>>
>>> -Daniel
>