[RFC PATCH] drm/ttm: Simplify the delayed destroy locking

Daniel Vetter daniel at ffwll.ch
Mon Apr 12 15:43:43 UTC 2021


On Mon, Apr 12, 2021 at 04:21:37PM +0200, Christian König wrote:
> 
> 
> Am 12.04.21 um 16:16 schrieb Thomas Hellström:
> > Hi, Christian,
> > 
> > On 4/12/21 4:01 PM, Christian König wrote:
> > > Hi Thomas,
> > > 
> > > well in general a good idea, but I'm working on a different plan for
> > > a while now.
> > > 
> > > My idea here is that instead of the BO the resource object is kept
> > > on a double linked lru list.
> > > 
> > > The resource objects then have a pointer to either the BO or a fence
> > > object.
> > > 
> > > When it is a fence object we can just grab a reference to it and
> > > wait for it to complete.
> > > 
> > > If it is a BO we evict it the same way we currently do.
> > > 
> > > This allows to remove both the delayed delete, individualization of
> > > BOs, ghost objects etc...
> > 
> > Hmm, ok. So in that case, what would trigger the final release of the
> > buffer object in the absence of a delayed delete list? Would we use a
> > fence callback for that?
> 
> Key point is you don't need any final release of the BO any more. When the
> BOs reference count becomes zero it can be destructed immediately.
> 
> Only the resource object is kept around and protected by a fence until it
> can be released finally.

I was reading dma_resv here for a second, and wondered how we figure out
the refcounting for that. But since you aim for a fence, that's
refcounted, and then as long as we make sure the lifetime rules for
resource objects in this free-standing mode is very clear (lru owns it,
until we remove it holding the lru lock should work?) then I think this is
rather clean approach.

Also non-freestanding resource objects are fully owned by their objects.

> How that resource object is then destroyed could be handled in different
> ways, we could use a delayed work, shrinker callback or just rely on new
> allocations to scan the deleted objects.
> 
> > 
> > Otherwise sounds like a good idea, and if it runs into problems, we
> > could possibly resurrect this.
> 
> Well the main problem is that I need to change all drivers so that bo->mem
> can now be a pointer. IIRC you suggested that years ago and I'm working on
> the details of that idea ever since.

Yeah that's going to be a pile :-) But I think we've reached the same
conclusion on i915 for polishing our lmem manager (atm it's ghost objects
all the way down), before we decided to go direct adopting ttm.

So makes all sense to me.
-Daniel

> 
> Regards,
> Christian.
> 
> > 
> > /Thomas
> > 
> > 
> > 
> > 
> > > 
> > > Regards,
> > > Christian.
> > > 
> > > Am 12.04.21 um 15:17 schrieb Thomas Hellström:
> > > > This RFC needs some decent testing on a driver with bos that share
> > > > reservation objects, and of course a check for whether I missed
> > > > something obvious.
> > > > 
> > > > The locking around delayed destroy is rather complex due to the fact
> > > > that we want to individualize dma_resv pointers before putting
> > > > the object
> > > > on the delayed-destroy list. That individualization is currently
> > > > protected
> > > > by the lru lock forcing us to do try-reservations under the lru
> > > > lock when
> > > > reserving an object from the lru- or ddestroy lists, which complicates
> > > > moving over to per-manager lru locks.
> > > > 
> > > > Instead protect the individualization with the object kref == 0,
> > > > and ensure kref != 0 before trying to reserve an object from the list.
> > > > Remove the lru lock around individualization and protect the delayed
> > > > destroy list with a separate spinlock. Isolate the delayed destroy list
> > > > similarly to a lockless list before traversing it. Also don't
> > > > try to drop
> > > > resource references from the delayed destroy list traversal, but leave
> > > > that to the final object release. The role of the delayed
> > > > destroy thread
> > > > will now simply be to try to idle the object and when idle, drop
> > > > the delayed destoy reference.
> > > > 
> > > > This should pave the way for per-manager lru lists, sleeping
> > > > eviction locks
> > > > that are kept, optional drm_mm eviction roster usage and moving
> > > > over to a
> > > > completely lockless delayed destroy list even if the traversal order
> > > > will then change because there is no llist_add_tail().
> > > > 
> > > > Cc: Christian König <christian.koenig at amd.com>
> > > > Cc: Daniel Vetter <daniel at ffwll.ch>
> > > > Signed-off-by: Thomas Hellström <thomas.hellstrom at linux.intel.com>
> > > > ---
> > > >   drivers/gpu/drm/ttm/ttm_bo.c     | 193
> > > > +++++++++++++------------------
> > > >   drivers/gpu/drm/ttm/ttm_device.c |   1 +
> > > >   include/drm/ttm/ttm_device.h     |   1 +
> > > >   3 files changed, 82 insertions(+), 113 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c
> > > > b/drivers/gpu/drm/ttm/ttm_bo.c
> > > > index 6ab7b66ce36d..fd57252bc8fe 100644
> > > > --- a/drivers/gpu/drm/ttm/ttm_bo.c
> > > > +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> > > > @@ -217,6 +217,9 @@ static int ttm_bo_handle_move_mem(struct
> > > > ttm_buffer_object *bo,
> > > >     static void ttm_bo_cleanup_memtype_use(struct
> > > > ttm_buffer_object *bo)
> > > >   {
> > > > +    if (kref_read(&bo->kref))
> > > > +        dma_resv_assert_held(bo->base.resv);
> > > > +
> > > >       if (bo->bdev->funcs->delete_mem_notify)
> > > >           bo->bdev->funcs->delete_mem_notify(bo);
> > > >   @@ -239,13 +242,18 @@ static int
> > > > ttm_bo_individualize_resv(struct ttm_buffer_object *bo)
> > > >           return r;
> > > >         if (bo->type != ttm_bo_type_sg) {
> > > > -        /* This works because the BO is about to be destroyed
> > > > and nobody
> > > > -         * reference it any more. The only tricky case is the
> > > > trylock on
> > > > -         * the resv object while holding the lru_lock.
> > > > +        /*
> > > > +         * This works because nobody besides us can lock the bo or
> > > > +         * assume it is locked while its refcount is zero.
> > > >            */
> > > > -        spin_lock(&bo->bdev->lru_lock);
> > > > +        WARN_ON_ONCE(kref_read(&bo->kref));
> > > >           bo->base.resv = &bo->base._resv;
> > > > -        spin_unlock(&bo->bdev->lru_lock);
> > > > +        /*
> > > > +         * Don't reorder against kref_init().
> > > > +         * Matches the implicit full barrier of a successful
> > > > +         * kref_get_unless_zero() after a lru_list_lookup.
> > > > +         */
> > > > +        smp_mb();
> > > >       }
> > > >         return r;
> > > > @@ -274,122 +282,66 @@ static void
> > > > ttm_bo_flush_all_fences(struct ttm_buffer_object *bo)
> > > >   }
> > > >     /**
> > > > - * function ttm_bo_cleanup_refs
> > > > - * If bo idle, remove from lru lists, and unref.
> > > > - * If not idle, block if possible.
> > > > - *
> > > > - * Must be called with lru_lock and reservation held, this function
> > > > - * will drop the lru lock and optionally the reservation lock
> > > > before returning.
> > > > + * ttm_bo_cleanup_refs - idle and remove pages and gpu-bindings
> > > > and remove
> > > > + * from lru_lists.
> > > >    *
> > > >    * @bo:                    The buffer object to clean-up
> > > >    * @interruptible:         Any sleeps should occur interruptibly.
> > > > - * @no_wait_gpu:           Never wait for gpu. Return -EBUSY instead.
> > > > - * @unlock_resv:           Unlock the reservation lock as well.
> > > > + * @no_wait_gpu:           Never wait for gpu. Return -EBUSY if
> > > > not idle.
> > > >    */
> > > > -
> > > >   static int ttm_bo_cleanup_refs(struct ttm_buffer_object *bo,
> > > > -                   bool interruptible, bool no_wait_gpu,
> > > > -                   bool unlock_resv)
> > > > +                   bool interruptible, bool no_wait_gpu)
> > > >   {
> > > > -    struct dma_resv *resv = &bo->base._resv;
> > > >       int ret;
> > > >   -    if (dma_resv_test_signaled_rcu(resv, true))
> > > > -        ret = 0;
> > > > -    else
> > > > -        ret = -EBUSY;
> > > > -
> > > > -    if (ret && !no_wait_gpu) {
> > > > -        long lret;
> > > > -
> > > > -        if (unlock_resv)
> > > > -            dma_resv_unlock(bo->base.resv);
> > > > -        spin_unlock(&bo->bdev->lru_lock);
> > > > -
> > > > -        lret = dma_resv_wait_timeout_rcu(resv, true, interruptible,
> > > > -                         30 * HZ);
> > > > -
> > > > -        if (lret < 0)
> > > > -            return lret;
> > > > -        else if (lret == 0)
> > > > -            return -EBUSY;
> > > > -
> > > > -        spin_lock(&bo->bdev->lru_lock);
> > > > -        if (unlock_resv && !dma_resv_trylock(bo->base.resv)) {
> > > > -            /*
> > > > -             * We raced, and lost, someone else holds the
> > > > reservation now,
> > > > -             * and is probably busy in ttm_bo_cleanup_memtype_use.
> > > > -             *
> > > > -             * Even if it's not the case, because we finished
> > > > waiting any
> > > > -             * delayed destruction would succeed, so just
> > > > return success
> > > > -             * here.
> > > > -             */
> > > > -            spin_unlock(&bo->bdev->lru_lock);
> > > > -            return 0;
> > > > -        }
> > > > -        ret = 0;
> > > > -    }
> > > > -
> > > > -    if (ret || unlikely(list_empty(&bo->ddestroy))) {
> > > > -        if (unlock_resv)
> > > > -            dma_resv_unlock(bo->base.resv);
> > > > -        spin_unlock(&bo->bdev->lru_lock);
> > > > +    dma_resv_assert_held(bo->base.resv);
> > > > +    WARN_ON_ONCE(!bo->deleted);
> > > > +    ret = ttm_bo_wait(bo, interruptible, no_wait_gpu);
> > > > +    if (ret)
> > > >           return ret;
> > > > -    }
> > > >   +    ttm_bo_cleanup_memtype_use(bo);
> > > > +    spin_lock(&bo->bdev->lru_lock);
> > > >       ttm_bo_del_from_lru(bo);
> > > > -    list_del_init(&bo->ddestroy);
> > > >       spin_unlock(&bo->bdev->lru_lock);
> > > > -    ttm_bo_cleanup_memtype_use(bo);
> > > > -
> > > > -    if (unlock_resv)
> > > > -        dma_resv_unlock(bo->base.resv);
> > > > -
> > > > -    ttm_bo_put(bo);
> > > >         return 0;
> > > >   }
> > > >     /*
> > > > - * Traverse the delayed list, and call ttm_bo_cleanup_refs on all
> > > > - * encountered buffers.
> > > > + * Isolate a part of the delayed destroy list and check for
> > > > idle on all
> > > > + * buffer objects on it. If idle, remove from the list and drop the
> > > > + * delayed destroy refcount. Return all busy buffers to the delayed
> > > > + * destroy list.
> > > >    */
> > > >   bool ttm_bo_delayed_delete(struct ttm_device *bdev, bool remove_all)
> > > >   {
> > > > -    struct list_head removed;
> > > > -    bool empty;
> > > > -
> > > > -    INIT_LIST_HEAD(&removed);
> > > > -
> > > > -    spin_lock(&bdev->lru_lock);
> > > > -    while (!list_empty(&bdev->ddestroy)) {
> > > > -        struct ttm_buffer_object *bo;
> > > > -
> > > > -        bo = list_first_entry(&bdev->ddestroy, struct
> > > > ttm_buffer_object,
> > > > -                      ddestroy);
> > > > -        list_move_tail(&bo->ddestroy, &removed);
> > > > -        if (!ttm_bo_get_unless_zero(bo))
> > > > -            continue;
> > > > -
> > > > -        if (remove_all || bo->base.resv != &bo->base._resv) {
> > > > -            spin_unlock(&bdev->lru_lock);
> > > > -            dma_resv_lock(bo->base.resv, NULL);
> > > > -
> > > > -            spin_lock(&bdev->lru_lock);
> > > > -            ttm_bo_cleanup_refs(bo, false, !remove_all, true);
> > > > -
> > > > -        } else if (dma_resv_trylock(bo->base.resv)) {
> > > > -            ttm_bo_cleanup_refs(bo, false, !remove_all, true);
> > > > -        } else {
> > > > -            spin_unlock(&bdev->lru_lock);
> > > > +    struct ttm_buffer_object *bo, *next;
> > > > +    struct list_head isolated;
> > > > +    bool empty, idle;
> > > > +
> > > > +    INIT_LIST_HEAD(&isolated);
> > > > +
> > > > +    spin_lock(&bdev->ddestroy_lock);
> > > > +    list_splice_init(&bdev->ddestroy, &isolated);
> > > > +    spin_unlock(&bdev->ddestroy_lock);
> > > > +    list_for_each_entry_safe(bo, next, &isolated, ddestroy) {
> > > > +        WARN_ON_ONCE(!kref_read(&bo->kref) ||
> > > > +                 bo->base.resv != &bo->base._resv);
> > > > +        if (!remove_all)
> > > > +            idle = dma_resv_test_signaled_rcu(bo->base.resv, true);
> > > > +        else
> > > > +            idle = (dma_resv_wait_timeout_rcu(bo->base.resv, true,
> > > > +                              false, 30 * HZ) > 0);
> > > > +        if (idle) {
> > > > +            list_del_init(&bo->ddestroy);
> > > > +            ttm_bo_put(bo);
> > > >           }
> > > > -
> > > > -        ttm_bo_put(bo);
> > > > -        spin_lock(&bdev->lru_lock);
> > > >       }
> > > > -    list_splice_tail(&removed, &bdev->ddestroy);
> > > > +    spin_lock(&bdev->ddestroy_lock);
> > > > +    list_splice(&isolated, &bdev->ddestroy);
> > > >       empty = list_empty(&bdev->ddestroy);
> > > > -    spin_unlock(&bdev->lru_lock);
> > > > +    spin_unlock(&bdev->ddestroy_lock);
> > > >         return empty;
> > > >   }
> > > > @@ -405,10 +357,16 @@ static void ttm_bo_release(struct kref *kref)
> > > >           ret = ttm_bo_individualize_resv(bo);
> > > >           if (ret) {
> > > >               /* Last resort, if we fail to allocate memory for the
> > > > -             * fences block for the BO to become idle
> > > > +             * fences block for the BO to become idle, and then
> > > > +             * we are free to update the resv pointer.
> > > >                */
> > > >               dma_resv_wait_timeout_rcu(bo->base.resv, true, false,
> > > >                             30 * HZ);
> > > > +            if (bo->type != ttm_bo_type_sg) {
> > > > +                bo->base.resv = &bo->base._resv;
> > > > +                /* Please see ttm_bo_individualize_resv() */
> > > > +                smp_mb();
> > > > +            }
> > > >           }
> > > >             if (bo->bdev->funcs->release_notify)
> > > > @@ -422,9 +380,8 @@ static void ttm_bo_release(struct kref *kref)
> > > >           !dma_resv_trylock(bo->base.resv)) {
> > > >           /* The BO is not idle, resurrect it for delayed destroy */
> > > >           ttm_bo_flush_all_fences(bo);
> > > > -        bo->deleted = true;
> > > > -
> > > >           spin_lock(&bo->bdev->lru_lock);
> > > > +        bo->deleted = true;
> > > >             /*
> > > >            * Make pinned bos immediately available to
> > > > @@ -439,8 +396,10 @@ static void ttm_bo_release(struct kref *kref)
> > > >               ttm_bo_move_to_lru_tail(bo, &bo->mem, NULL);
> > > >           }
> > > >   +        spin_lock(&bo->bdev->ddestroy_lock);
> > > >           kref_init(&bo->kref);
> > > >           list_add_tail(&bo->ddestroy, &bdev->ddestroy);
> > > > +        spin_unlock(&bo->bdev->ddestroy_lock);
> > > >           spin_unlock(&bo->bdev->lru_lock);
> > > >             schedule_delayed_work(&bdev->wq,
> > > > @@ -450,9 +409,11 @@ static void ttm_bo_release(struct kref *kref)
> > > >         spin_lock(&bo->bdev->lru_lock);
> > > >       ttm_bo_del_from_lru(bo);
> > > > -    list_del(&bo->ddestroy);
> > > >       spin_unlock(&bo->bdev->lru_lock);
> > > >   +    pr_info("Deleting.\n");
> > > > +    WARN_ON_ONCE(!list_empty_careful(&bo->ddestroy));
> > > > +
> > > >       ttm_bo_cleanup_memtype_use(bo);
> > > >       dma_resv_unlock(bo->base.resv);
> > > >   @@ -630,25 +591,26 @@ int ttm_mem_evict_first(struct
> > > > ttm_device *bdev,
> > > >           list_for_each_entry(bo, &man->lru[i], lru) {
> > > >               bool busy;
> > > >   +            if (!ttm_bo_get_unless_zero(bo))
> > > > +                continue;
> > > > +
> > > >               if (!ttm_bo_evict_swapout_allowable(bo, ctx, &locked,
> > > >                                   &busy)) {
> > > >                   if (busy && !busy_bo && ticket !=
> > > >                       dma_resv_locking_ctx(bo->base.resv))
> > > >                       busy_bo = bo;
> > > > +                ttm_bo_put(bo);
> > > >                   continue;
> > > >               }
> > > >                 if (place && !bdev->funcs->eviction_valuable(bo,
> > > >                                         place)) {
> > > > +                ttm_bo_put(bo);
> > > >                   if (locked)
> > > >                       dma_resv_unlock(bo->base.resv);
> > > >                   continue;
> > > >               }
> > > > -            if (!ttm_bo_get_unless_zero(bo)) {
> > > > -                if (locked)
> > > > -                    dma_resv_unlock(bo->base.resv);
> > > > -                continue;
> > > > -            }
> > > > +
> > > >               break;
> > > >           }
> > > >   @@ -670,8 +632,11 @@ int ttm_mem_evict_first(struct ttm_device *bdev,
> > > >       }
> > > >         if (bo->deleted) {
> > > > +        spin_unlock(&bdev->lru_lock);
> > > >           ret = ttm_bo_cleanup_refs(bo, ctx->interruptible,
> > > > -                      ctx->no_wait_gpu, locked);
> > > > +                      ctx->no_wait_gpu);
> > > > +        if (locked)
> > > > +            dma_resv_unlock(bo->base.resv);
> > > >           ttm_bo_put(bo);
> > > >           return ret;
> > > >       }
> > > > @@ -1168,17 +1133,19 @@ int ttm_bo_swapout(struct
> > > > ttm_buffer_object *bo, struct ttm_operation_ctx *ctx,
> > > >       bool locked;
> > > >       int ret;
> > > >   -    if (!ttm_bo_evict_swapout_allowable(bo, ctx, &locked, NULL))
> > > > +    if (!ttm_bo_get_unless_zero(bo))
> > > >           return -EBUSY;
> > > >   -    if (!ttm_bo_get_unless_zero(bo)) {
> > > > -        if (locked)
> > > > -            dma_resv_unlock(bo->base.resv);
> > > > +    if (!ttm_bo_evict_swapout_allowable(bo, ctx, &locked, NULL)) {
> > > > +        ttm_bo_put(bo);
> > > >           return -EBUSY;
> > > >       }
> > > >         if (bo->deleted) {
> > > > -        ttm_bo_cleanup_refs(bo, false, false, locked);
> > > > +        spin_unlock(&bo->bdev->lru_lock);
> > > > +        ttm_bo_cleanup_refs(bo, false, false);
> > > > +        if (locked)
> > > > +            dma_resv_unlock(bo->base.resv);
> > > >           ttm_bo_put(bo);
> > > >           return 0;
> > > >       }
> > > > diff --git a/drivers/gpu/drm/ttm/ttm_device.c
> > > > b/drivers/gpu/drm/ttm/ttm_device.c
> > > > index 9b787b3caeb5..b941989885ed 100644
> > > > --- a/drivers/gpu/drm/ttm/ttm_device.c
> > > > +++ b/drivers/gpu/drm/ttm/ttm_device.c
> > > > @@ -228,6 +228,7 @@ int ttm_device_init(struct ttm_device *bdev,
> > > > struct ttm_device_funcs *funcs,
> > > >       bdev->vma_manager = vma_manager;
> > > >       INIT_DELAYED_WORK(&bdev->wq, ttm_device_delayed_workqueue);
> > > >       spin_lock_init(&bdev->lru_lock);
> > > > +    spin_lock_init(&bdev->ddestroy_lock);
> > > >       INIT_LIST_HEAD(&bdev->ddestroy);
> > > >       bdev->dev_mapping = mapping;
> > > >       mutex_lock(&ttm_global_mutex);
> > > > diff --git a/include/drm/ttm/ttm_device.h
> > > > b/include/drm/ttm/ttm_device.h
> > > > index 7c8f87bd52d3..fa8c4f6a169e 100644
> > > > --- a/include/drm/ttm/ttm_device.h
> > > > +++ b/include/drm/ttm/ttm_device.h
> > > > @@ -279,6 +279,7 @@ struct ttm_device {
> > > >        * Protection for the per manager LRU and ddestroy lists.
> > > >        */
> > > >       spinlock_t lru_lock;
> > > > +    spinlock_t ddestroy_lock;
> > > >       struct list_head ddestroy;
> > > >         /*
> > > 
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


More information about the dri-devel mailing list