[PATCH] drm/ttm: replace dma_resv object on deleted BOs v3
Daniel Vetter
daniel at ffwll.ch
Wed Feb 12 18:58:03 UTC 2020
On Wed, Feb 12, 2020 at 12:53 PM Christian König
<ckoenig.leichtzumerken at gmail.com> wrote:
>
> Am 12.02.20 um 07:23 schrieb Pan, Xinhui:
> >
> >> 2020年2月11日 23:43,Christian König <ckoenig.leichtzumerken at gmail.com> 写道:
> >>
> >> When non-imported BOs are resurrected for delayed delete we replace
> >> the dma_resv object to allow for easy reclaiming of the resources.
> >>
> >> v2: move that to ttm_bo_individualize_resv
> >> v3: add a comment to explain what's going on
> >>
> >> Signed-off-by: Christian König <christian.koenig at amd.com>
> >> Reviewed-by: xinhui pan <xinhui.pan at amd.com>
> >> ---
> >> drivers/gpu/drm/ttm/ttm_bo.c | 14 +++++++++++++-
> >> 1 file changed, 13 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> >> index bfc42a9e4fb4..8174603d390f 100644
> >> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> >> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> >> @@ -393,6 +393,18 @@ static int ttm_bo_individualize_resv(struct ttm_buffer_object *bo)
> >>
> >> r = dma_resv_copy_fences(&bo->base._resv, bo->base.resv);
> >> dma_resv_unlock(&bo->base._resv);
> >> + if (r)
> >> + return r;
> >> +
> >> + if (bo->type != ttm_bo_type_sg) {
> >> + /* This works because the BO is about to be destroyed and nobody
> >> + * reference it any more. The only tricky case is the trylock on
> >> + * the resv object while holding the lru_lock.
> >> + */
> >> + spin_lock(&ttm_bo_glob.lru_lock);
> >> + bo->base.resv = &bo->base._resv;
> >> + spin_unlock(&ttm_bo_glob.lru_lock);
> >> + }
> >>
> > how about something like that.
> > the basic idea is to do the bo cleanup work in bo release first and avoid any race with evict.
> > As in bo dieing progress, evict also just do bo cleanup work.
> >
> > If bo is busy, neither bo_release nor evict can do cleanupwork on it. For the bo release case, we just add bo back to lru list.
> > So we can clean it up both in workqueue and shrinker as the past way did.
> >
> > @@ -405,8 +405,9 @@ static int ttm_bo_individualize_resv(struct ttm_buffer_object *bo)
> >
> > if (bo->type != ttm_bo_type_sg) {
> > spin_lock(&ttm_bo_glob.lru_lock);
> > - bo->base.resv = &bo->base._resv;
> > + ttm_bo_del_from_lru(bo);
> > spin_unlock(&ttm_bo_glob.lru_lock);
> > + bo->base.resv = &bo->base._resv;
> > }
> >
> > return r;
> > @@ -606,10 +607,9 @@ static void ttm_bo_release(struct kref *kref)
> > * shrinkers, now that they are queued for
> > * destruction.
> > */
> > - if (bo->mem.placement & TTM_PL_FLAG_NO_EVICT) {
> > + if (bo->mem.placement & TTM_PL_FLAG_NO_EVICT)
> > bo->mem.placement &= ~TTM_PL_FLAG_NO_EVICT;
> > - ttm_bo_move_to_lru_tail(bo, NULL);
> > - }
> > + ttm_bo_add_mem_to_lru(bo, &bo->mem);
> >
> > kref_init(&bo->kref);
> > list_add_tail(&bo->ddestroy, &bdev->ddestroy);
>
> Yeah, thought about that as well. But this has the major drawback that
> the deleted BO moves to the end of the LRU, which is something we don't
> want.
>
> I think the real solution to this problem is to go a completely
> different way and remove the delayed delete feature from TTM altogether.
> Instead this should be part of some DRM domain handler component.
>
> In other words it should not matter if a BO is evicted, moved or freed.
> Whenever a piece of memory becomes available again we keep around a
> fence which marks the end of using this piece of memory.
>
> When then somebody asks for new memory we work through the LRU and test
> if using a certain piece of memory makes sense or not. If we find that a
> BO needs to be evicted for this we return a reference to the BO in
> question to the upper level handling.
>
> If we find that we can do the allocation but only with recently freed up
> memory we gather the fences and say you can only use the newly allocated
> memory after waiting for those.
>
> HEY! Wait a second! Did I just outlined what a potential replacement to
> TTM would look like?
Hm I thought that was (roughly) the idea behind the last_move fence in
the ttm allocation manager? Except it's gloabl, so you'd want to have
it slightly more fine-grained. But if your vram can essentially be
managed as pages, because (almost) no need for contig allocations,
then tracking all that gets a bit nasty ...
Anyway, I think the ghost object trickery is indeed not great, since
it mixes up fences for one thing (objects) with fences for allocated
ranges in the underlying resource. Another mismatch I'm seeing is that
ttm doesn't even track (or bothers) vma ranges. So you have that
additional fun with "the memory is gone, but the ptes are kinda still
there". I guess you could also solve that by tracking such usage at
the allocation manager level, with some fences that track the async
pte update job. I guess that's your plan, I'm not sure that fits with
all the ideas we have with fancy new submission models (stuff like
what amdkfd is doing).
-Daniel
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
More information about the amd-gfx
mailing list