[RFC PATCH] drm/ttm: Simplify the delayed destroy locking

Christian König christian.koenig at amd.com
Mon Apr 12 14:44:01 UTC 2021


Am 12.04.21 um 16:40 schrieb Thomas Hellström:
>
> On 4/12/21 4:21 PM, Christian König wrote:
>>
>>
>> Am 12.04.21 um 16:16 schrieb Thomas Hellström:
>>> Hi, Christian,
>>>
>>> On 4/12/21 4:01 PM, Christian König wrote:
>>>> Hi Thomas,
>>>>
>>>> well in general a good idea, but I'm working on a different plan 
>>>> for a while now.
>>>>
>>>> My idea here is that instead of the BO the resource object is kept 
>>>> on a double linked lru list.
>>>>
>>>> The resource objects then have a pointer to either the BO or a 
>>>> fence object.
>>>>
>>>> When it is a fence object we can just grab a reference to it and 
>>>> wait for it to complete.
>>>>
>>>> If it is a BO we evict it the same way we currently do.
>>>>
>>>> This allows to remove both the delayed delete, individualization of 
>>>> BOs, ghost objects etc...
>>>
>>> Hmm, ok. So in that case, what would trigger the final release of 
>>> the buffer object in the absence of a delayed delete list? Would we 
>>> use a fence callback for that?
>>
>> Key point is you don't need any final release of the BO any more. 
>> When the BOs reference count becomes zero it can be destructed 
>> immediately.
>>
>> Only the resource object is kept around and protected by a fence 
>> until it can be released finally.
>>
>> How that resource object is then destroyed could be handled in 
>> different ways, we could use a delayed work, shrinker callback or 
>> just rely on new allocations to scan the deleted objects.
>
> Hmm, ok, but what about when the object is backed by pages? Would the 
> pages be attached to the resource object or how do we know when to 
> free them?

Yes, exactly that's the idea.

Basically the ttm pointer will be moved into the resource object as well.

Christian.

>
>>
>>>
>>> Otherwise sounds like a good idea, and if it runs into problems, we 
>>> could possibly resurrect this.
>>
>> Well the main problem is that I need to change all drivers so that 
>> bo->mem can now be a pointer. IIRC you suggested that years ago and 
>> I'm working on the details of that idea ever since.
>
> Hmm, I might have, yes. Can't remember the context, though.
>
> Thanks,
> /Thomas
>
>
>>
>> Regards,
>> Christian.
>>
>>>
>>> /Thomas
>>>
>>>
>>>
>>>
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>> Am 12.04.21 um 15:17 schrieb Thomas Hellström:
>>>>> This RFC needs some decent testing on a driver with bos that share
>>>>> reservation objects, and of course a check for whether I missed
>>>>> something obvious.
>>>>>
>>>>> The locking around delayed destroy is rather complex due to the fact
>>>>> that we want to individualize dma_resv pointers before putting the 
>>>>> object
>>>>> on the delayed-destroy list. That individualization is currently 
>>>>> protected
>>>>> by the lru lock forcing us to do try-reservations under the lru 
>>>>> lock when
>>>>> reserving an object from the lru- or ddestroy lists, which 
>>>>> complicates
>>>>> moving over to per-manager lru locks.
>>>>>
>>>>> Instead protect the individualization with the object kref == 0,
>>>>> and ensure kref != 0 before trying to reserve an object from the 
>>>>> list.
>>>>> Remove the lru lock around individualization and protect the delayed
>>>>> destroy list with a separate spinlock. Isolate the delayed destroy 
>>>>> list
>>>>> similarly to a lockless list before traversing it. Also don't try 
>>>>> to drop
>>>>> resource references from the delayed destroy list traversal, but 
>>>>> leave
>>>>> that to the final object release. The role of the delayed destroy 
>>>>> thread
>>>>> will now simply be to try to idle the object and when idle, drop
>>>>> the delayed destoy reference.
>>>>>
>>>>> This should pave the way for per-manager lru lists, sleeping 
>>>>> eviction locks
>>>>> that are kept, optional drm_mm eviction roster usage and moving 
>>>>> over to a
>>>>> completely lockless delayed destroy list even if the traversal order
>>>>> will then change because there is no llist_add_tail().
>>>>>
>>>>> Cc: Christian König <christian.koenig at amd.com>
>>>>> Cc: Daniel Vetter <daniel at ffwll.ch>
>>>>> Signed-off-by: Thomas Hellström <thomas.hellstrom at linux.intel.com>
>>>>> ---
>>>>>   drivers/gpu/drm/ttm/ttm_bo.c     | 193 
>>>>> +++++++++++++------------------
>>>>>   drivers/gpu/drm/ttm/ttm_device.c |   1 +
>>>>>   include/drm/ttm/ttm_device.h     |   1 +
>>>>>   3 files changed, 82 insertions(+), 113 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c 
>>>>> b/drivers/gpu/drm/ttm/ttm_bo.c
>>>>> index 6ab7b66ce36d..fd57252bc8fe 100644
>>>>> --- a/drivers/gpu/drm/ttm/ttm_bo.c
>>>>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>>>>> @@ -217,6 +217,9 @@ static int ttm_bo_handle_move_mem(struct 
>>>>> ttm_buffer_object *bo,
>>>>>     static void ttm_bo_cleanup_memtype_use(struct 
>>>>> ttm_buffer_object *bo)
>>>>>   {
>>>>> +    if (kref_read(&bo->kref))
>>>>> +        dma_resv_assert_held(bo->base.resv);
>>>>> +
>>>>>       if (bo->bdev->funcs->delete_mem_notify)
>>>>>           bo->bdev->funcs->delete_mem_notify(bo);
>>>>>   @@ -239,13 +242,18 @@ static int 
>>>>> ttm_bo_individualize_resv(struct ttm_buffer_object *bo)
>>>>>           return r;
>>>>>         if (bo->type != ttm_bo_type_sg) {
>>>>> -        /* This works because the BO is about to be destroyed and 
>>>>> nobody
>>>>> -         * reference it any more. The only tricky case is the 
>>>>> trylock on
>>>>> -         * the resv object while holding the lru_lock.
>>>>> +        /*
>>>>> +         * This works because nobody besides us can lock the bo or
>>>>> +         * assume it is locked while its refcount is zero.
>>>>>            */
>>>>> -        spin_lock(&bo->bdev->lru_lock);
>>>>> +        WARN_ON_ONCE(kref_read(&bo->kref));
>>>>>           bo->base.resv = &bo->base._resv;
>>>>> -        spin_unlock(&bo->bdev->lru_lock);
>>>>> +        /*
>>>>> +         * Don't reorder against kref_init().
>>>>> +         * Matches the implicit full barrier of a successful
>>>>> +         * kref_get_unless_zero() after a lru_list_lookup.
>>>>> +         */
>>>>> +        smp_mb();
>>>>>       }
>>>>>         return r;
>>>>> @@ -274,122 +282,66 @@ static void ttm_bo_flush_all_fences(struct 
>>>>> ttm_buffer_object *bo)
>>>>>   }
>>>>>     /**
>>>>> - * function ttm_bo_cleanup_refs
>>>>> - * If bo idle, remove from lru lists, and unref.
>>>>> - * If not idle, block if possible.
>>>>> - *
>>>>> - * Must be called with lru_lock and reservation held, this function
>>>>> - * will drop the lru lock and optionally the reservation lock 
>>>>> before returning.
>>>>> + * ttm_bo_cleanup_refs - idle and remove pages and gpu-bindings 
>>>>> and remove
>>>>> + * from lru_lists.
>>>>>    *
>>>>>    * @bo:                    The buffer object to clean-up
>>>>>    * @interruptible:         Any sleeps should occur interruptibly.
>>>>> - * @no_wait_gpu:           Never wait for gpu. Return -EBUSY 
>>>>> instead.
>>>>> - * @unlock_resv:           Unlock the reservation lock as well.
>>>>> + * @no_wait_gpu:           Never wait for gpu. Return -EBUSY if 
>>>>> not idle.
>>>>>    */
>>>>> -
>>>>>   static int ttm_bo_cleanup_refs(struct ttm_buffer_object *bo,
>>>>> -                   bool interruptible, bool no_wait_gpu,
>>>>> -                   bool unlock_resv)
>>>>> +                   bool interruptible, bool no_wait_gpu)
>>>>>   {
>>>>> -    struct dma_resv *resv = &bo->base._resv;
>>>>>       int ret;
>>>>>   -    if (dma_resv_test_signaled_rcu(resv, true))
>>>>> -        ret = 0;
>>>>> -    else
>>>>> -        ret = -EBUSY;
>>>>> -
>>>>> -    if (ret && !no_wait_gpu) {
>>>>> -        long lret;
>>>>> -
>>>>> -        if (unlock_resv)
>>>>> -            dma_resv_unlock(bo->base.resv);
>>>>> -        spin_unlock(&bo->bdev->lru_lock);
>>>>> -
>>>>> -        lret = dma_resv_wait_timeout_rcu(resv, true, interruptible,
>>>>> -                         30 * HZ);
>>>>> -
>>>>> -        if (lret < 0)
>>>>> -            return lret;
>>>>> -        else if (lret == 0)
>>>>> -            return -EBUSY;
>>>>> -
>>>>> -        spin_lock(&bo->bdev->lru_lock);
>>>>> -        if (unlock_resv && !dma_resv_trylock(bo->base.resv)) {
>>>>> -            /*
>>>>> -             * We raced, and lost, someone else holds the 
>>>>> reservation now,
>>>>> -             * and is probably busy in ttm_bo_cleanup_memtype_use.
>>>>> -             *
>>>>> -             * Even if it's not the case, because we finished 
>>>>> waiting any
>>>>> -             * delayed destruction would succeed, so just return 
>>>>> success
>>>>> -             * here.
>>>>> -             */
>>>>> -            spin_unlock(&bo->bdev->lru_lock);
>>>>> -            return 0;
>>>>> -        }
>>>>> -        ret = 0;
>>>>> -    }
>>>>> -
>>>>> -    if (ret || unlikely(list_empty(&bo->ddestroy))) {
>>>>> -        if (unlock_resv)
>>>>> -            dma_resv_unlock(bo->base.resv);
>>>>> -        spin_unlock(&bo->bdev->lru_lock);
>>>>> +    dma_resv_assert_held(bo->base.resv);
>>>>> +    WARN_ON_ONCE(!bo->deleted);
>>>>> +    ret = ttm_bo_wait(bo, interruptible, no_wait_gpu);
>>>>> +    if (ret)
>>>>>           return ret;
>>>>> -    }
>>>>>   +    ttm_bo_cleanup_memtype_use(bo);
>>>>> +    spin_lock(&bo->bdev->lru_lock);
>>>>>       ttm_bo_del_from_lru(bo);
>>>>> -    list_del_init(&bo->ddestroy);
>>>>>       spin_unlock(&bo->bdev->lru_lock);
>>>>> -    ttm_bo_cleanup_memtype_use(bo);
>>>>> -
>>>>> -    if (unlock_resv)
>>>>> -        dma_resv_unlock(bo->base.resv);
>>>>> -
>>>>> -    ttm_bo_put(bo);
>>>>>         return 0;
>>>>>   }
>>>>>     /*
>>>>> - * Traverse the delayed list, and call ttm_bo_cleanup_refs on all
>>>>> - * encountered buffers.
>>>>> + * Isolate a part of the delayed destroy list and check for idle 
>>>>> on all
>>>>> + * buffer objects on it. If idle, remove from the list and drop the
>>>>> + * delayed destroy refcount. Return all busy buffers to the delayed
>>>>> + * destroy list.
>>>>>    */
>>>>>   bool ttm_bo_delayed_delete(struct ttm_device *bdev, bool 
>>>>> remove_all)
>>>>>   {
>>>>> -    struct list_head removed;
>>>>> -    bool empty;
>>>>> -
>>>>> -    INIT_LIST_HEAD(&removed);
>>>>> -
>>>>> -    spin_lock(&bdev->lru_lock);
>>>>> -    while (!list_empty(&bdev->ddestroy)) {
>>>>> -        struct ttm_buffer_object *bo;
>>>>> -
>>>>> -        bo = list_first_entry(&bdev->ddestroy, struct 
>>>>> ttm_buffer_object,
>>>>> -                      ddestroy);
>>>>> -        list_move_tail(&bo->ddestroy, &removed);
>>>>> -        if (!ttm_bo_get_unless_zero(bo))
>>>>> -            continue;
>>>>> -
>>>>> -        if (remove_all || bo->base.resv != &bo->base._resv) {
>>>>> -            spin_unlock(&bdev->lru_lock);
>>>>> -            dma_resv_lock(bo->base.resv, NULL);
>>>>> -
>>>>> -            spin_lock(&bdev->lru_lock);
>>>>> -            ttm_bo_cleanup_refs(bo, false, !remove_all, true);
>>>>> -
>>>>> -        } else if (dma_resv_trylock(bo->base.resv)) {
>>>>> -            ttm_bo_cleanup_refs(bo, false, !remove_all, true);
>>>>> -        } else {
>>>>> -            spin_unlock(&bdev->lru_lock);
>>>>> +    struct ttm_buffer_object *bo, *next;
>>>>> +    struct list_head isolated;
>>>>> +    bool empty, idle;
>>>>> +
>>>>> +    INIT_LIST_HEAD(&isolated);
>>>>> +
>>>>> +    spin_lock(&bdev->ddestroy_lock);
>>>>> +    list_splice_init(&bdev->ddestroy, &isolated);
>>>>> +    spin_unlock(&bdev->ddestroy_lock);
>>>>> +    list_for_each_entry_safe(bo, next, &isolated, ddestroy) {
>>>>> +        WARN_ON_ONCE(!kref_read(&bo->kref) ||
>>>>> +                 bo->base.resv != &bo->base._resv);
>>>>> +        if (!remove_all)
>>>>> +            idle = dma_resv_test_signaled_rcu(bo->base.resv, true);
>>>>> +        else
>>>>> +            idle = (dma_resv_wait_timeout_rcu(bo->base.resv, true,
>>>>> +                              false, 30 * HZ) > 0);
>>>>> +        if (idle) {
>>>>> +            list_del_init(&bo->ddestroy);
>>>>> +            ttm_bo_put(bo);
>>>>>           }
>>>>> -
>>>>> -        ttm_bo_put(bo);
>>>>> -        spin_lock(&bdev->lru_lock);
>>>>>       }
>>>>> -    list_splice_tail(&removed, &bdev->ddestroy);
>>>>> +    spin_lock(&bdev->ddestroy_lock);
>>>>> +    list_splice(&isolated, &bdev->ddestroy);
>>>>>       empty = list_empty(&bdev->ddestroy);
>>>>> -    spin_unlock(&bdev->lru_lock);
>>>>> +    spin_unlock(&bdev->ddestroy_lock);
>>>>>         return empty;
>>>>>   }
>>>>> @@ -405,10 +357,16 @@ static void ttm_bo_release(struct kref *kref)
>>>>>           ret = ttm_bo_individualize_resv(bo);
>>>>>           if (ret) {
>>>>>               /* Last resort, if we fail to allocate memory for the
>>>>> -             * fences block for the BO to become idle
>>>>> +             * fences block for the BO to become idle, and then
>>>>> +             * we are free to update the resv pointer.
>>>>>                */
>>>>>               dma_resv_wait_timeout_rcu(bo->base.resv, true, false,
>>>>>                             30 * HZ);
>>>>> +            if (bo->type != ttm_bo_type_sg) {
>>>>> +                bo->base.resv = &bo->base._resv;
>>>>> +                /* Please see ttm_bo_individualize_resv() */
>>>>> +                smp_mb();
>>>>> +            }
>>>>>           }
>>>>>             if (bo->bdev->funcs->release_notify)
>>>>> @@ -422,9 +380,8 @@ static void ttm_bo_release(struct kref *kref)
>>>>>           !dma_resv_trylock(bo->base.resv)) {
>>>>>           /* The BO is not idle, resurrect it for delayed destroy */
>>>>>           ttm_bo_flush_all_fences(bo);
>>>>> -        bo->deleted = true;
>>>>> -
>>>>>           spin_lock(&bo->bdev->lru_lock);
>>>>> +        bo->deleted = true;
>>>>>             /*
>>>>>            * Make pinned bos immediately available to
>>>>> @@ -439,8 +396,10 @@ static void ttm_bo_release(struct kref *kref)
>>>>>               ttm_bo_move_to_lru_tail(bo, &bo->mem, NULL);
>>>>>           }
>>>>>   +        spin_lock(&bo->bdev->ddestroy_lock);
>>>>>           kref_init(&bo->kref);
>>>>>           list_add_tail(&bo->ddestroy, &bdev->ddestroy);
>>>>> +        spin_unlock(&bo->bdev->ddestroy_lock);
>>>>>           spin_unlock(&bo->bdev->lru_lock);
>>>>>             schedule_delayed_work(&bdev->wq,
>>>>> @@ -450,9 +409,11 @@ static void ttm_bo_release(struct kref *kref)
>>>>>         spin_lock(&bo->bdev->lru_lock);
>>>>>       ttm_bo_del_from_lru(bo);
>>>>> -    list_del(&bo->ddestroy);
>>>>>       spin_unlock(&bo->bdev->lru_lock);
>>>>>   +    pr_info("Deleting.\n");
>>>>> + WARN_ON_ONCE(!list_empty_careful(&bo->ddestroy));
>>>>> +
>>>>>       ttm_bo_cleanup_memtype_use(bo);
>>>>>       dma_resv_unlock(bo->base.resv);
>>>>>   @@ -630,25 +591,26 @@ int ttm_mem_evict_first(struct ttm_device 
>>>>> *bdev,
>>>>>           list_for_each_entry(bo, &man->lru[i], lru) {
>>>>>               bool busy;
>>>>>   +            if (!ttm_bo_get_unless_zero(bo))
>>>>> +                continue;
>>>>> +
>>>>>               if (!ttm_bo_evict_swapout_allowable(bo, ctx, &locked,
>>>>>                                   &busy)) {
>>>>>                   if (busy && !busy_bo && ticket !=
>>>>> dma_resv_locking_ctx(bo->base.resv))
>>>>>                       busy_bo = bo;
>>>>> +                ttm_bo_put(bo);
>>>>>                   continue;
>>>>>               }
>>>>>                 if (place && !bdev->funcs->eviction_valuable(bo,
>>>>>                                         place)) {
>>>>> +                ttm_bo_put(bo);
>>>>>                   if (locked)
>>>>>                       dma_resv_unlock(bo->base.resv);
>>>>>                   continue;
>>>>>               }
>>>>> -            if (!ttm_bo_get_unless_zero(bo)) {
>>>>> -                if (locked)
>>>>> -                    dma_resv_unlock(bo->base.resv);
>>>>> -                continue;
>>>>> -            }
>>>>> +
>>>>>               break;
>>>>>           }
>>>>>   @@ -670,8 +632,11 @@ int ttm_mem_evict_first(struct ttm_device 
>>>>> *bdev,
>>>>>       }
>>>>>         if (bo->deleted) {
>>>>> +        spin_unlock(&bdev->lru_lock);
>>>>>           ret = ttm_bo_cleanup_refs(bo, ctx->interruptible,
>>>>> -                      ctx->no_wait_gpu, locked);
>>>>> +                      ctx->no_wait_gpu);
>>>>> +        if (locked)
>>>>> +            dma_resv_unlock(bo->base.resv);
>>>>>           ttm_bo_put(bo);
>>>>>           return ret;
>>>>>       }
>>>>> @@ -1168,17 +1133,19 @@ int ttm_bo_swapout(struct 
>>>>> ttm_buffer_object *bo, struct ttm_operation_ctx *ctx,
>>>>>       bool locked;
>>>>>       int ret;
>>>>>   -    if (!ttm_bo_evict_swapout_allowable(bo, ctx, &locked, NULL))
>>>>> +    if (!ttm_bo_get_unless_zero(bo))
>>>>>           return -EBUSY;
>>>>>   -    if (!ttm_bo_get_unless_zero(bo)) {
>>>>> -        if (locked)
>>>>> -            dma_resv_unlock(bo->base.resv);
>>>>> +    if (!ttm_bo_evict_swapout_allowable(bo, ctx, &locked, NULL)) {
>>>>> +        ttm_bo_put(bo);
>>>>>           return -EBUSY;
>>>>>       }
>>>>>         if (bo->deleted) {
>>>>> -        ttm_bo_cleanup_refs(bo, false, false, locked);
>>>>> +        spin_unlock(&bo->bdev->lru_lock);
>>>>> +        ttm_bo_cleanup_refs(bo, false, false);
>>>>> +        if (locked)
>>>>> +            dma_resv_unlock(bo->base.resv);
>>>>>           ttm_bo_put(bo);
>>>>>           return 0;
>>>>>       }
>>>>> diff --git a/drivers/gpu/drm/ttm/ttm_device.c 
>>>>> b/drivers/gpu/drm/ttm/ttm_device.c
>>>>> index 9b787b3caeb5..b941989885ed 100644
>>>>> --- a/drivers/gpu/drm/ttm/ttm_device.c
>>>>> +++ b/drivers/gpu/drm/ttm/ttm_device.c
>>>>> @@ -228,6 +228,7 @@ int ttm_device_init(struct ttm_device *bdev, 
>>>>> struct ttm_device_funcs *funcs,
>>>>>       bdev->vma_manager = vma_manager;
>>>>>       INIT_DELAYED_WORK(&bdev->wq, ttm_device_delayed_workqueue);
>>>>>       spin_lock_init(&bdev->lru_lock);
>>>>> +    spin_lock_init(&bdev->ddestroy_lock);
>>>>>       INIT_LIST_HEAD(&bdev->ddestroy);
>>>>>       bdev->dev_mapping = mapping;
>>>>>       mutex_lock(&ttm_global_mutex);
>>>>> diff --git a/include/drm/ttm/ttm_device.h 
>>>>> b/include/drm/ttm/ttm_device.h
>>>>> index 7c8f87bd52d3..fa8c4f6a169e 100644
>>>>> --- a/include/drm/ttm/ttm_device.h
>>>>> +++ b/include/drm/ttm/ttm_device.h
>>>>> @@ -279,6 +279,7 @@ struct ttm_device {
>>>>>        * Protection for the per manager LRU and ddestroy lists.
>>>>>        */
>>>>>       spinlock_t lru_lock;
>>>>> +    spinlock_t ddestroy_lock;
>>>>>       struct list_head ddestroy;
>>>>>         /*
>>>>
>>



More information about the dri-devel mailing list