[PATCH 1/6] drm/ttm: move the LRU into resource handling v4

Wed Mar 23 13:36:10 UTC 2022

On Wed, 23 Mar 2022 at 13:20, Christian König
<ckoenig.leichtzumerken at gmail.com> wrote:
>
> Am 23.03.22 um 12:59 schrieb Daniel Vetter:
> > On Mon, Mar 21, 2022 at 02:25:56PM +0100, Christian König wrote:
> >> This way we finally fix the problem that new resource are
> >> not immediately evict-able after allocation.
> >>
> >> That has caused numerous problems including OOM on GDS handling
> >> and not being able to use TTM as general resource manager.
> >>
> >> v2: stop assuming in ttm_resource_fini that res->bo is still valid.
> >> v3: cleanup kerneldoc, add more lockdep annotation
> >> v4: consistently use res->num_pages
> >>
> >> Signed-off-by: Christian König <christian.koenig at amd.com>
> >> Tested-by: Bas Nieuwenhuizen <bas at basnieuwenhuizen.nl>
> >> +/**
> >> + * struct ttm_lru_bulk_move
> >> + *
> >> + * @tt: first/last lru entry for resources in the TT domain
> >> + * @vram: first/last lru entry for resources in the VRAM domain
> >> + *
> >> + * Helper structure for bulk moves on the LRU list.
> >> + */
> >> +struct ttm_lru_bulk_move {
> >> +    struct ttm_lru_bulk_move_pos tt[TTM_MAX_BO_PRIORITY];
> >> +    struct ttm_lru_bulk_move_pos vram[TTM_MAX_BO_PRIORITY];
> > Not really needed, just a thought: Should we track the associated dma_resv
> > object here to make sure the locking is all done correctly (and also check
> > that the bulk move bo have the same dma_resv)? It wouldn't really be any
> > overhead for the !CONFIG_LOCKDEP case and we could sprinkle a lot more
> > dma_resv_held all over the place.
>
> You made a similar comment on the last revision and I already tried to
> play around with that idea a bit.
>
> But I've completely abandoned that idea after realizing that the BOs in
> the bulk move actually don't need to have the same dma_resv object, nor
> do they all need to be locked.

Uh how does that work? If you evict that bo while someone else is
doing a bulk move, then at least the result might be rather funny, and
I have no idea how it could work.

Like even if you then make the rule that you have to lock all bos for
the bulk move, the bo could still be moved independently, and that
would again break the bulk lru properties.

And if you do none of the above, there's no reason for that bo to have
a distinct dma_resv.

Like maybe the data structure wont fall apart, but semantically it
just doesn't make any sense to me to allow this. What would you want
to use this for?

> It just happens that amdgpu is currently using it that way, but I can't
> see any technical necessarily to restrict the bulk move like that.

Yeah we can do that later on in a follow up patch, or I figure out why
it's not a good idea :-) Just figured this might be good to lock down
before other drivers start adopting this.
-Daniel

>
> Regards,
> Christian.
>
>
> > -Daniel
>

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch