[PATCH 3/6] drm/ttm: call ttm_bo_cleanup_refs with reservation and lru lock held

Wed Nov 28 14:26:27 PST 2012

Op 28-11-12 20:21, Thomas Hellstrom schreef:
> On 11/28/2012 07:32 PM, Maarten Lankhorst wrote:
>> Op 28-11-12 16:10, Thomas Hellstrom schreef:
>>> On 11/28/2012 03:46 PM, Maarten Lankhorst wrote:
>>>> Op 28-11-12 15:23, Thomas Hellstrom schreef:
>>>>> On 11/28/2012 02:55 PM, Maarten Lankhorst wrote:
>>>>>> Op 28-11-12 14:21, Thomas Hellstrom schreef:
>>>>>>> On 11/28/2012 01:15 PM, Maarten Lankhorst wrote:
>>>>>>>> Op 28-11-12 12:54, Thomas Hellstrom schreef:
>>>>>>>>> On 11/28/2012 12:25 PM, Maarten Lankhorst wrote:
>>>>>>>>>> By removing the unlocking of lru and retaking it immediately, a race is
>>>>>>>>>> removed where the bo is taken off the swap list or the lru list between
>>>>>>>>>> the unlock and relock. As such the cleanup_refs code can be simplified,
>>>>>>>>>> it will attempt to call ttm_bo_wait non-blockingly, and if it fails
>>>>>>>>>> it will drop the locks and perform a blocking wait, or return an error
>>>>>>>>>> if no_wait_gpu was set.
>>>>>>>>>>
>>>>>>>>>> The need for looping is also eliminated, since swapout and evict_mem_first
>>>>>>>>>> will always follow the destruction path, so no new fence is allowed
>>>>>>>>>> to be attached. As far as I can see this may already have been the case,
>>>>>>>>>> but the unlocking / relocking required a complicated loop to deal with
>>>>>>>>>> re-reservation.
>>>>>>>>>>
>>>>>>>>>> The downside is that ttm_bo_cleanup_memtype_use is no longer called with
>>>>>>>>>> reservation held, so drivers must be aware that move_notify with a null
>>>>>>>>>> parameter doesn't require a reservation.
>>>>>>>>> Why can't we unreserve *after* ttm_bo_cleanup_memtype_use? That's not
>>>>>>>>> immediately clear from this patch.
>>>>>>>> Because we would hold the reservation while waiting and with the object still
>>>>>>>> on swap and lru lists still, that would defeat the whole purpose of keeping
>>>>>>>> the object on multiple lists, plus break current code that assumes bo's on the
>>>>>>>> those lists can always be reserved.
>>>>>>>>
>>>>>>>> the if (ret && !no_wait_gpu) path has to drop the reservation and lru lock, and
>>>>>>>> isn't guaranteed to be able to retake it. Maybe it could be guaranteed now, but
>>>>>>>> I'm sure that would not be the case if the reservations were shared across
>>>>>>>> devices.
>>>>>>> The evict path removes the BO from the LRU lists, drops the LRU lock but hangs on to the reservation,
>>>>>>> and in case the wait goes wrong, re-adds the bo to the LRU lists and returns an error.
>>>>>> If you really want to, we could hang on to the !no_wait_gpu path, wait shouldn't ever fail there, so I suppose
>>>>>> leaving it off the lru lists and not re-add on any list in case of wait fail is fine. It's still on the ddestroy list in that
>>>>>> case, so not adding it back to the other lists is harmless.
>>>>>>
>>>>> Well I'm a bit afraid that theoretically, other callers may have a bo reserved, while cleanup_refs_and_unlock
>>>>> more or less runs the whole destroy path on that buffer. Sure, we have control over those other reservers,
>>>>> but it may come back and bite us.
>>>> That's why initially I moved all the destruction to ttm_bo_release_list, to have all destruction in
>>>> only 1 place. But even now it's serialized with the lru lock, while the destruction may not happen
>>>> right away, it still happens before last list ref to the bo is dropped.
>>>>
>>>> But it's your call, just choose the approach you want and I'll resubmit this. :-)
>>>>
>>>>> Also the wait might fail if a signal is hit, so it's definitely possible, and even likely in the case of the X server process.
>>>>>
>>>>> Anyway, I prefer if we could try to keep the reservation across the ttm_cleanup_memtype_use  function, and as far
>>>>> as I can tell, the only thing preventing that is the reservation release in the (!no_wait_gpu) path. So if we alter that to
>>>>> do the same as the evict path I think without looking to deeply into the consequences that we should be safe.
>>>> I think returning success early without removing off ddestroy list if re-reserving fails
>>>> with lru lock held would be better.
>>>>
>>>> We completed the wait and attempt to reserve the bo, which failed. Without the lru
>>>> lock atomicity, this can't happen since the only places that would do it call this with
>>>> the lru lock held.
>>>>
>>>> With the atomicity removal, the only place that could do this is ttm_bo_delayed_delete
>>>> with remove_all set to true. And even if that happened the destruction code would run
>>>> *anyway* since we completed the waiting part already, it would just not necessarily be
>>>> run from this thread, but that guarantee didn't exist anyway.
>>>>> Then we should be able to skip patch 2 as well.
>>>> If my tryreserve approach sounds sane, second patch should still be skippable. :-)
>>> Sure, Lets go for that approach.
>> Ok updated patch below,  you only need to check if you like the approach or not,
>> since I haven't tested it yet beyond compiling. Rest of series (minus patch 2)
>> should still apply without modification.
>>
>>      drm/ttm: call ttm_bo_cleanup_refs with reservation and lru lock held, v2
>>           By removing the unlocking of lru and retaking it immediately, a race is
>>      removed where the bo is taken off the swap list or the lru list between
>>      the unlock and relock. As such the cleanup_refs code can be simplified,
>>      it will attempt to call ttm_bo_wait non-blockingly, and if it fails
>>      it will drop the locks and perform a blocking wait, or return an error
>>      if no_wait_gpu was set.
>>           The need for looping is also eliminated, since swapout and evict_mem_first
>>      will always follow the destruction path, no new fence is allowed
>>      to be attached. As far as I can see this may already have been the case,
>>      but the unlocking / relocking required a complicated loop to deal with
>>      re-reservation.
>>           Changes since v1:
>>       - Simplify no_wait_gpu case by folding it in with empty ddestroy.
>>       - Hold a reservation while calling ttm_bo_cleanup_memtype_use again.
>>           Signed-off-by: Maarten Lankhorst <maarten.lankhorst at canonical.com>
>>
>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
>> index 202fc20..e9f01fe 100644
>> --- a/drivers/gpu/drm/ttm/ttm_bo.c
>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>> @@ -488,12 +488,16 @@ static void ttm_bo_cleanup_memtype_use(struct ttm_buffer_object *bo)
>>       ttm_bo_mem_put(bo, &bo->mem);
>>         atomic_set(&bo->reserved, 0);
>> +    wake_up_all(&bo->event_queue);
>>         /*
>> -     * Make processes trying to reserve really pick it up.
>> +     * Since the final reference to this bo may not be dropped by
>> +     * the current task we have to put a memory barrier here to make
>> +     * sure the changes done in this function are always visible.
>> +     *
>> +     * This function only needs protection against the final kref_put.
>>        */
>> -    smp_mb__after_atomic_dec();
>> -    wake_up_all(&bo->event_queue);
>> +    smp_mb__before_atomic_dec();
>>   }
>>     static void ttm_bo_cleanup_refs_or_queue(struct ttm_buffer_object *bo)
>> @@ -543,68 +547,95 @@ static void ttm_bo_cleanup_refs_or_queue(struct ttm_buffer_object *bo)
>>   }
>>     /**
>> - * function ttm_bo_cleanup_refs
>> + * function ttm_bo_cleanup_refs_and_unlock
>>    * If bo idle, remove from delayed- and lru lists, and unref.
>>    * If not idle, do nothing.
>>    *
>> + * Must be called with lru_lock and reservation held, this function
>> + * will drop both before returning.
>> + *
>>    * @interruptible         Any sleeps should occur interruptibly.
>> - * @no_wait_reserve       Never wait for reserve. Return -EBUSY instead.
>>    * @no_wait_gpu           Never wait for gpu. Return -EBUSY instead.
>>    */
>>   -static int ttm_bo_cleanup_refs(struct ttm_buffer_object *bo,
>> -                   bool interruptible,
>> -                   bool no_wait_reserve,
>> -                   bool no_wait_gpu)
>> +static int ttm_bo_cleanup_refs_and_unlock(struct ttm_buffer_object *bo,
>> +                      bool interruptible,
>> +                      bool no_wait_gpu)
>>   {
>>       struct ttm_bo_device *bdev = bo->bdev;
>> +    struct ttm_bo_driver *driver = bdev->driver;
>>       struct ttm_bo_global *glob = bo->glob;
>>       int put_count;
>> -    int ret = 0;
>> +    int ret;
>>   -retry:
>>       spin_lock(&bdev->fence_lock);
>> -    ret = ttm_bo_wait(bo, false, interruptible, no_wait_gpu);
>> -    spin_unlock(&bdev->fence_lock);
>> +    ret = ttm_bo_wait(bo, false, false, true);
>>   -    if (unlikely(ret != 0))
>> -        return ret;
>> +    if (ret && !no_wait_gpu) {
>> +        void *sync_obj;
>>   -retry_reserve:
>> -    spin_lock(&glob->lru_lock);
>> +        /*
>> +         * Take a reference to the fence and unreserve,
>> +         * at this point the buffer should be dead, so
>> +         * no new sync objects can be attached.
>> +         */
>> +        sync_obj = driver->sync_obj_ref(&bo->sync_obj);
>> +        spin_unlock(&bdev->fence_lock);
>>   -    if (unlikely(list_empty(&bo->ddestroy))) {
>> +        put_count = ttm_bo_del_from_lru(bo);
>> +
>> +        atomic_set(&bo->reserved, 0);
>> +        wake_up_all(&bo->event_queue);
>>           spin_unlock(&glob->lru_lock);
>> -        return 0;
>> -    }
>>   -    ret = ttm_bo_reserve_locked(bo, false, true, false, 0);
>> +        ttm_bo_list_ref_sub(bo, put_count, true);
>>   -    if (unlikely(ret == -EBUSY)) {
>> -        spin_unlock(&glob->lru_lock);
>> -        if (likely(!no_wait_reserve))
>> -            ret = ttm_bo_wait_unreserved(bo, interruptible);
>> -        if (unlikely(ret != 0))
>> +        ret = driver->sync_obj_wait(sync_obj, false, interruptible);
>> +        driver->sync_obj_unref(&sync_obj);
>> +        if (ret) {
>> +            /*
>> +             * Either the wait returned -ERESTARTSYS, or -EDEADLK
>> +             * (radeon lockup) here. No effort is made to re-add
>> +             * this bo to any lru list. Instead the bo only
>> +             * appears on the delayed destroy list.
>> +             */
>>               return ret;
>> +        }
> Actually, we *must* re-add the bo to LRU lists here, because otherwise when a driver needs
> to evict a memory type completely, there's a large chance it will miss this bo.
>
> So I think either we need to keep the reservation, or keep the bo on the LRU lists.
The second option is what v1 did, except I never bothered to re-take the reservation. ;-)
It shouldn't cause troubles to leave it on the lru lists if we drop the the reservation,
we can keep handling re-reservation failure in the same way as in v2.

In that case would v3 be the same as v2 of this patch, except with those 2 lines from the
ret && !no_wait_gpu branch removed:

put_count = ttm_bo_del_from_lru(bo);
ttm_bo_list_ref_sub(bo, put_count, true);

And of course the comment after sync_obj_wait failure would no longer apply.

~Maarten