[PATCH 3/6] drm/ttm: call ttm_bo_cleanup_refs with reservation and lru lock held

Thu Nov 29 01:42:05 PST 2012

On 11/28/2012 11:26 PM, Maarten Lankhorst wrote:
> Op 28-11-12 20:21, Thomas Hellstrom schreef:
>> On 11/28/2012 07:32 PM, Maarten Lankhorst wrote:
>>> Op 28-11-12 16:10, Thomas Hellstrom schreef:
>>>> On 11/28/2012 03:46 PM, Maarten Lankhorst wrote:
>>>>> Op 28-11-12 15:23, Thomas Hellstrom schreef:
>>>>>> On 11/28/2012 02:55 PM, Maarten Lankhorst wrote:
>>>>>>> Op 28-11-12 14:21, Thomas Hellstrom schreef:
>>>>>>>> On 11/28/2012 01:15 PM, Maarten Lankhorst wrote:
>>>>>>>>> Op 28-11-12 12:54, Thomas Hellstrom schreef:
>>>>>>>>>> On 11/28/2012 12:25 PM, Maarten Lankhorst wrote:
>>>>>>>>>>> By removing the unlocking of lru and retaking it immediately, a race is
>>>>>>>>>>> removed where the bo is taken off the swap list or the lru list between
>>>>>>>>>>> the unlock and relock. As such the cleanup_refs code can be simplified,
>>>>>>>>>>> it will attempt to call ttm_bo_wait non-blockingly, and if it fails
>>>>>>>>>>> it will drop the locks and perform a blocking wait, or return an error
>>>>>>>>>>> if no_wait_gpu was set.
>>>>>>>>>>>
>>>>>>>>>>> The need for looping is also eliminated, since swapout and evict_mem_first
>>>>>>>>>>> will always follow the destruction path, so no new fence is allowed
>>>>>>>>>>> to be attached. As far as I can see this may already have been the case,
>>>>>>>>>>> but the unlocking / relocking required a complicated loop to deal with
>>>>>>>>>>> re-reservation.
>>>>>>>>>>>
>>>>>>>>>>> The downside is that ttm_bo_cleanup_memtype_use is no longer called with
>>>>>>>>>>> reservation held, so drivers must be aware that move_notify with a null
>>>>>>>>>>> parameter doesn't require a reservation.
>>>>>>>>>> Why can't we unreserve *after* ttm_bo_cleanup_memtype_use? That's not
>>>>>>>>>> immediately clear from this patch.
>>>>>>>>> Because we would hold the reservation while waiting and with the object still
>>>>>>>>> on swap and lru lists still, that would defeat the whole purpose of keeping
>>>>>>>>> the object on multiple lists, plus break current code that assumes bo's on the
>>>>>>>>> those lists can always be reserved.
>>>>>>>>>
>>>>>>>>> the if (ret && !no_wait_gpu) path has to drop the reservation and lru lock, and
>>>>>>>>> isn't guaranteed to be able to retake it. Maybe it could be guaranteed now, but
>>>>>>>>> I'm sure that would not be the case if the reservations were shared across
>>>>>>>>> devices.
>>>>>>>> The evict path removes the BO from the LRU lists, drops the LRU lock but hangs on to the reservation,
>>>>>>>> and in case the wait goes wrong, re-adds the bo to the LRU lists and returns an error.
>>>>>>> If you really want to, we could hang on to the !no_wait_gpu path, wait shouldn't ever fail there, so I suppose
>>>>>>> leaving it off the lru lists and not re-add on any list in case of wait fail is fine. It's still on the ddestroy list in that
>>>>>>> case, so not adding it back to the other lists is harmless.
>>>>>>>
>>>>>> Well I'm a bit afraid that theoretically, other callers may have a bo reserved, while cleanup_refs_and_unlock
>>>>>> more or less runs the whole destroy path on that buffer. Sure, we have control over those other reservers,
>>>>>> but it may come back and bite us.
>>>>> That's why initially I moved all the destruction to ttm_bo_release_list, to have all destruction in
>>>>> only 1 place. But even now it's serialized with the lru lock, while the destruction may not happen
>>>>> right away, it still happens before last list ref to the bo is dropped.
>>>>>
>>>>> But it's your call, just choose the approach you want and I'll resubmit this. :-)
>>>>>
>>>>>> Also the wait might fail if a signal is hit, so it's definitely possible, and even likely in the case of the X server process.
>>>>>>
>>>>>> Anyway, I prefer if we could try to keep the reservation across the ttm_cleanup_memtype_use  function, and as far
>>>>>> as I can tell, the only thing preventing that is the reservation release in the (!no_wait_gpu) path. So if we alter that to
>>>>>> do the same as the evict path I think without looking to deeply into the consequences that we should be safe.
>>>>> I think returning success early without removing off ddestroy list if re-reserving fails
>>>>> with lru lock held would be better.
>>>>>
>>>>> We completed the wait and attempt to reserve the bo, which failed. Without the lru
>>>>> lock atomicity, this can't happen since the only places that would do it call this with
>>>>> the lru lock held.
>>>>>
>>>>> With the atomicity removal, the only place that could do this is ttm_bo_delayed_delete
>>>>> with remove_all set to true. And even if that happened the destruction code would run
>>>>> *anyway* since we completed the waiting part already, it would just not necessarily be
>>>>> run from this thread, but that guarantee didn't exist anyway.
>>>>>> Then we should be able to skip patch 2 as well.
>>>>> If my tryreserve approach sounds sane, second patch should still be skippable. :-)
>>>> Sure, Lets go for that approach.
>>> Ok updated patch below,  you only need to check if you like the approach or not,
>>> since I haven't tested it yet beyond compiling. Rest of series (minus patch 2)
>>> should still apply without modification.
>>>
>>>       drm/ttm: call ttm_bo_cleanup_refs with reservation and lru lock held, v2
>>>            By removing the unlocking of lru and retaking it immediately, a race is
>>>       removed where the bo is taken off the swap list or the lru list between
>>>       the unlock and relock. As such the cleanup_refs code can be simplified,
>>>       it will attempt to call ttm_bo_wait non-blockingly, and if it fails
>>>       it will drop the locks and perform a blocking wait, or return an error
>>>       if no_wait_gpu was set.
>>>            The need for looping is also eliminated, since swapout and evict_mem_first
>>>       will always follow the destruction path, no new fence is allowed
>>>       to be attached. As far as I can see this may already have been the case,
>>>       but the unlocking / relocking required a complicated loop to deal with
>>>       re-reservation.
>>>            Changes since v1:
>>>        - Simplify no_wait_gpu case by folding it in with empty ddestroy.
>>>        - Hold a reservation while calling ttm_bo_cleanup_memtype_use again.
>>>            Signed-off-by: Maarten Lankhorst <maarten.lankhorst at canonical.com>
>>>
>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
>>> index 202fc20..e9f01fe 100644
>>> --- a/drivers/gpu/drm/ttm/ttm_bo.c
>>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>>> @@ -488,12 +488,16 @@ static void ttm_bo_cleanup_memtype_use(struct ttm_buffer_object *bo)
>>>        ttm_bo_mem_put(bo, &bo->mem);
>>>          atomic_set(&bo->reserved, 0);
>>> +    wake_up_all(&bo->event_queue);
>>>          /*
>>> -     * Make processes trying to reserve really pick it up.
>>> +     * Since the final reference to this bo may not be dropped by
>>> +     * the current task we have to put a memory barrier here to make
>>> +     * sure the changes done in this function are always visible.
>>> +     *
>>> +     * This function only needs protection against the final kref_put.
>>>         */
>>> -    smp_mb__after_atomic_dec();
>>> -    wake_up_all(&bo->event_queue);
>>> +    smp_mb__before_atomic_dec();
>>>    }
>>>      static void ttm_bo_cleanup_refs_or_queue(struct ttm_buffer_object *bo)
>>> @@ -543,68 +547,95 @@ static void ttm_bo_cleanup_refs_or_queue(struct ttm_buffer_object *bo)
>>>    }
>>>      /**
>>> - * function ttm_bo_cleanup_refs
>>> + * function ttm_bo_cleanup_refs_and_unlock
>>>     * If bo idle, remove from delayed- and lru lists, and unref.
>>>     * If not idle, do nothing.
>>>     *
>>> + * Must be called with lru_lock and reservation held, this function
>>> + * will drop both before returning.
>>> + *
>>>     * @interruptible         Any sleeps should occur interruptibly.
>>> - * @no_wait_reserve       Never wait for reserve. Return -EBUSY instead.
>>>     * @no_wait_gpu           Never wait for gpu. Return -EBUSY instead.
>>>     */
>>>    -static int ttm_bo_cleanup_refs(struct ttm_buffer_object *bo,
>>> -                   bool interruptible,
>>> -                   bool no_wait_reserve,
>>> -                   bool no_wait_gpu)
>>> +static int ttm_bo_cleanup_refs_and_unlock(struct ttm_buffer_object *bo,
>>> +                      bool interruptible,
>>> +                      bool no_wait_gpu)
>>>    {
>>>        struct ttm_bo_device *bdev = bo->bdev;
>>> +    struct ttm_bo_driver *driver = bdev->driver;
>>>        struct ttm_bo_global *glob = bo->glob;
>>>        int put_count;
>>> -    int ret = 0;
>>> +    int ret;
>>>    -retry:
>>>        spin_lock(&bdev->fence_lock);
>>> -    ret = ttm_bo_wait(bo, false, interruptible, no_wait_gpu);
>>> -    spin_unlock(&bdev->fence_lock);
>>> +    ret = ttm_bo_wait(bo, false, false, true);
>>>    -    if (unlikely(ret != 0))
>>> -        return ret;
>>> +    if (ret && !no_wait_gpu) {
>>> +        void *sync_obj;
>>>    -retry_reserve:
>>> -    spin_lock(&glob->lru_lock);
>>> +        /*
>>> +         * Take a reference to the fence and unreserve,
>>> +         * at this point the buffer should be dead, so
>>> +         * no new sync objects can be attached.
>>> +         */
>>> +        sync_obj = driver->sync_obj_ref(&bo->sync_obj);
>>> +        spin_unlock(&bdev->fence_lock);
>>>    -    if (unlikely(list_empty(&bo->ddestroy))) {
>>> +        put_count = ttm_bo_del_from_lru(bo);
>>> +
>>> +        atomic_set(&bo->reserved, 0);
>>> +        wake_up_all(&bo->event_queue);
>>>            spin_unlock(&glob->lru_lock);
>>> -        return 0;
>>> -    }
>>>    -    ret = ttm_bo_reserve_locked(bo, false, true, false, 0);
>>> +        ttm_bo_list_ref_sub(bo, put_count, true);
>>>    -    if (unlikely(ret == -EBUSY)) {
>>> -        spin_unlock(&glob->lru_lock);
>>> -        if (likely(!no_wait_reserve))
>>> -            ret = ttm_bo_wait_unreserved(bo, interruptible);
>>> -        if (unlikely(ret != 0))
>>> +        ret = driver->sync_obj_wait(sync_obj, false, interruptible);
>>> +        driver->sync_obj_unref(&sync_obj);
>>> +        if (ret) {
>>> +            /*
>>> +             * Either the wait returned -ERESTARTSYS, or -EDEADLK
>>> +             * (radeon lockup) here. No effort is made to re-add
>>> +             * this bo to any lru list. Instead the bo only
>>> +             * appears on the delayed destroy list.
>>> +             */
>>>                return ret;
>>> +        }
>> Actually, we *must* re-add the bo to LRU lists here, because otherwise when a driver needs
>> to evict a memory type completely, there's a large chance it will miss this bo.
>>
>> So I think either we need to keep the reservation, or keep the bo on the LRU lists.
> The second option is what v1 did, except I never bothered to re-take the reservation. ;-)

Yes, I know ;)

> It shouldn't cause troubles to leave it on the lru lists if we drop the the reservation,
> we can keep handling re-reservation failure in the same way as in v2.
>
> In that case would v3 be the same as v2 of this patch, except with those 2 lines from the
> ret && !no_wait_gpu branch removed:
>
> put_count = ttm_bo_del_from_lru(bo);
> ttm_bo_list_ref_sub(bo, put_count, true);
>
> And of course the comment after sync_obj_wait failure would no longer apply.

Yes, sounds good, although note that if the tryreserve fails in the 
!no_wait_gpu case, and we give up
returning 0, that may cause a similar problem (ttm_bo_force_list_clean() 
not *ensuring* that a bo was removed, at least
not by the time the function completes, but if we make sure the 
while(!list_empty()) in ttm_bo_force_list_clean() remains,
that won't be a problem either.

/Thomas

>
> ~Maarten