[PATCH 3/6] drm/ttm: call ttm_bo_cleanup_refs with reservation and lru lock held

Wed Nov 28 10:32:48 PST 2012

Op 28-11-12 16:10, Thomas Hellstrom schreef:
> On 11/28/2012 03:46 PM, Maarten Lankhorst wrote:
>> Op 28-11-12 15:23, Thomas Hellstrom schreef:
>>> On 11/28/2012 02:55 PM, Maarten Lankhorst wrote:
>>>> Op 28-11-12 14:21, Thomas Hellstrom schreef:
>>>>> On 11/28/2012 01:15 PM, Maarten Lankhorst wrote:
>>>>>> Op 28-11-12 12:54, Thomas Hellstrom schreef:
>>>>>>> On 11/28/2012 12:25 PM, Maarten Lankhorst wrote:
>>>>>>>> By removing the unlocking of lru and retaking it immediately, a race is
>>>>>>>> removed where the bo is taken off the swap list or the lru list between
>>>>>>>> the unlock and relock. As such the cleanup_refs code can be simplified,
>>>>>>>> it will attempt to call ttm_bo_wait non-blockingly, and if it fails
>>>>>>>> it will drop the locks and perform a blocking wait, or return an error
>>>>>>>> if no_wait_gpu was set.
>>>>>>>>
>>>>>>>> The need for looping is also eliminated, since swapout and evict_mem_first
>>>>>>>> will always follow the destruction path, so no new fence is allowed
>>>>>>>> to be attached. As far as I can see this may already have been the case,
>>>>>>>> but the unlocking / relocking required a complicated loop to deal with
>>>>>>>> re-reservation.
>>>>>>>>
>>>>>>>> The downside is that ttm_bo_cleanup_memtype_use is no longer called with
>>>>>>>> reservation held, so drivers must be aware that move_notify with a null
>>>>>>>> parameter doesn't require a reservation.
>>>>>>> Why can't we unreserve *after* ttm_bo_cleanup_memtype_use? That's not
>>>>>>> immediately clear from this patch.
>>>>>> Because we would hold the reservation while waiting and with the object still
>>>>>> on swap and lru lists still, that would defeat the whole purpose of keeping
>>>>>> the object on multiple lists, plus break current code that assumes bo's on the
>>>>>> those lists can always be reserved.
>>>>>>
>>>>>> the if (ret && !no_wait_gpu) path has to drop the reservation and lru lock, and
>>>>>> isn't guaranteed to be able to retake it. Maybe it could be guaranteed now, but
>>>>>> I'm sure that would not be the case if the reservations were shared across
>>>>>> devices.
>>>>> The evict path removes the BO from the LRU lists, drops the LRU lock but hangs on to the reservation,
>>>>> and in case the wait goes wrong, re-adds the bo to the LRU lists and returns an error.
>>>> If you really want to, we could hang on to the !no_wait_gpu path, wait shouldn't ever fail there, so I suppose
>>>> leaving it off the lru lists and not re-add on any list in case of wait fail is fine. It's still on the ddestroy list in that
>>>> case, so not adding it back to the other lists is harmless.
>>>>
>>> Well I'm a bit afraid that theoretically, other callers may have a bo reserved, while cleanup_refs_and_unlock
>>> more or less runs the whole destroy path on that buffer. Sure, we have control over those other reservers,
>>> but it may come back and bite us.
>> That's why initially I moved all the destruction to ttm_bo_release_list, to have all destruction in
>> only 1 place. But even now it's serialized with the lru lock, while the destruction may not happen
>> right away, it still happens before last list ref to the bo is dropped.
>>
>> But it's your call, just choose the approach you want and I'll resubmit this. :-)
>>
>>> Also the wait might fail if a signal is hit, so it's definitely possible, and even likely in the case of the X server process.
>>>
>>> Anyway, I prefer if we could try to keep the reservation across the ttm_cleanup_memtype_use  function, and as far
>>> as I can tell, the only thing preventing that is the reservation release in the (!no_wait_gpu) path. So if we alter that to
>>> do the same as the evict path I think without looking to deeply into the consequences that we should be safe.
>> I think returning success early without removing off ddestroy list if re-reserving fails
>> with lru lock held would be better.
>>
>> We completed the wait and attempt to reserve the bo, which failed. Without the lru
>> lock atomicity, this can't happen since the only places that would do it call this with
>> the lru lock held.
>>
>> With the atomicity removal, the only place that could do this is ttm_bo_delayed_delete
>> with remove_all set to true. And even if that happened the destruction code would run
>> *anyway* since we completed the waiting part already, it would just not necessarily be
>> run from this thread, but that guarantee didn't exist anyway.
>>> Then we should be able to skip patch 2 as well.
>> If my tryreserve approach sounds sane, second patch should still be skippable. :-)
>
> Sure, Lets go for that approach.
Ok updated patch below,  you only need to check if you like the approach or not,
since I haven't tested it yet beyond compiling. Rest of series (minus patch 2)
should still apply without modification.

    drm/ttm: call ttm_bo_cleanup_refs with reservation and lru lock held, v2
    
    By removing the unlocking of lru and retaking it immediately, a race is
    removed where the bo is taken off the swap list or the lru list between
    the unlock and relock. As such the cleanup_refs code can be simplified,
    it will attempt to call ttm_bo_wait non-blockingly, and if it fails
    it will drop the locks and perform a blocking wait, or return an error
    if no_wait_gpu was set.
    
    The need for looping is also eliminated, since swapout and evict_mem_first
    will always follow the destruction path, no new fence is allowed
    to be attached. As far as I can see this may already have been the case,
    but the unlocking / relocking required a complicated loop to deal with
    re-reservation.
    
    Changes since v1:
     - Simplify no_wait_gpu case by folding it in with empty ddestroy.
     - Hold a reservation while calling ttm_bo_cleanup_memtype_use again.
    
    Signed-off-by: Maarten Lankhorst <maarten.lankhorst at canonical.com>

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 202fc20..e9f01fe 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -488,12 +488,16 @@ static void ttm_bo_cleanup_memtype_use(struct ttm_buffer_object *bo)
 	ttm_bo_mem_put(bo, &bo->mem);
 
 	atomic_set(&bo->reserved, 0);
+	wake_up_all(&bo->event_queue);
 
 	/*
-	 * Make processes trying to reserve really pick it up.
+	 * Since the final reference to this bo may not be dropped by
+	 * the current task we have to put a memory barrier here to make
+	 * sure the changes done in this function are always visible.
+	 *
+	 * This function only needs protection against the final kref_put.
 	 */
-	smp_mb__after_atomic_dec();
-	wake_up_all(&bo->event_queue);
+	smp_mb__before_atomic_dec();
 }
 
 static void ttm_bo_cleanup_refs_or_queue(struct ttm_buffer_object *bo)
@@ -543,68 +547,95 @@ static void ttm_bo_cleanup_refs_or_queue(struct ttm_buffer_object *bo)
 }
 
 /**
- * function ttm_bo_cleanup_refs
+ * function ttm_bo_cleanup_refs_and_unlock
  * If bo idle, remove from delayed- and lru lists, and unref.
  * If not idle, do nothing.
  *
+ * Must be called with lru_lock and reservation held, this function
+ * will drop both before returning.
+ *
  * @interruptible         Any sleeps should occur interruptibly.
- * @no_wait_reserve       Never wait for reserve. Return -EBUSY instead.
  * @no_wait_gpu           Never wait for gpu. Return -EBUSY instead.
  */
 
-static int ttm_bo_cleanup_refs(struct ttm_buffer_object *bo,
-			       bool interruptible,
-			       bool no_wait_reserve,
-			       bool no_wait_gpu)
+static int ttm_bo_cleanup_refs_and_unlock(struct ttm_buffer_object *bo,
+					  bool interruptible,
+					  bool no_wait_gpu)
 {
 	struct ttm_bo_device *bdev = bo->bdev;
+	struct ttm_bo_driver *driver = bdev->driver;
 	struct ttm_bo_global *glob = bo->glob;
 	int put_count;
-	int ret = 0;
+	int ret;
 
-retry:
 	spin_lock(&bdev->fence_lock);
-	ret = ttm_bo_wait(bo, false, interruptible, no_wait_gpu);
-	spin_unlock(&bdev->fence_lock);
+	ret = ttm_bo_wait(bo, false, false, true);
 
-	if (unlikely(ret != 0))
-		return ret;
+	if (ret && !no_wait_gpu) {
+		void *sync_obj;
 
-retry_reserve:
-	spin_lock(&glob->lru_lock);
+		/*
+		 * Take a reference to the fence and unreserve,
+		 * at this point the buffer should be dead, so
+		 * no new sync objects can be attached.
+		 */
+		sync_obj = driver->sync_obj_ref(&bo->sync_obj);
+		spin_unlock(&bdev->fence_lock);
 
-	if (unlikely(list_empty(&bo->ddestroy))) {
+		put_count = ttm_bo_del_from_lru(bo);
+
+		atomic_set(&bo->reserved, 0);
+		wake_up_all(&bo->event_queue);
 		spin_unlock(&glob->lru_lock);
-		return 0;
-	}
 
-	ret = ttm_bo_reserve_locked(bo, false, true, false, 0);
+		ttm_bo_list_ref_sub(bo, put_count, true);
 
-	if (unlikely(ret == -EBUSY)) {
-		spin_unlock(&glob->lru_lock);
-		if (likely(!no_wait_reserve))
-			ret = ttm_bo_wait_unreserved(bo, interruptible);
-		if (unlikely(ret != 0))
+		ret = driver->sync_obj_wait(sync_obj, false, interruptible);
+		driver->sync_obj_unref(&sync_obj);
+		if (ret) {
+			/*
+			 * Either the wait returned -ERESTARTSYS, or -EDEADLK
+			 * (radeon lockup) here. No effort is made to re-add
+			 * this bo to any lru list. Instead the bo only
+			 * appears on the delayed destroy list.
+			 */
 			return ret;
+		}
 
-		goto retry_reserve;
-	}
+		/*
+		 * remove sync_obj with ttm_bo_wait, the wait should be
+		 * finished, and no new wait object should have been added.
+		 */
+		spin_lock(&bdev->fence_lock);
+		ret = ttm_bo_wait(bo, false, false, true);
+		WARN_ON(ret);
+		spin_unlock(&bdev->fence_lock);
+		if (ret)
+			return ret;
 
-	BUG_ON(ret != 0);
+		spin_lock(&glob->lru_lock);
+		ret = ttm_bo_reserve_locked(bo, false, true, false, 0);
 
-	/**
-	 * We can re-check for sync object without taking
-	 * the bo::lock since setting the sync object requires
-	 * also bo::reserved. A busy object at this point may
-	 * be caused by another thread recently starting an accelerated
-	 * eviction.
-	 */
+		/*
+		 * We raced, and lost, someone else holds the reservation now,
+		 * and is probably busy in ttm_bo_cleanup_memtype_use.
+		 *
+		 * Even if it's not the case, because we finished waiting any
+		 * delayed destruction would succeed, so just return success
+		 * here.
+		 */
+		if (ret) {
+			spin_unlock(&glob->lru_lock);
+			return 0;
+		}
+	} else
+		spin_unlock(&bdev->fence_lock);
 
-	if (unlikely(bo->sync_obj)) {
+	if (ret || unlikely(list_empty(&bo->ddestroy))) {
 		atomic_set(&bo->reserved, 0);
 		wake_up_all(&bo->event_queue);
 		spin_unlock(&glob->lru_lock);
-		goto retry;
+		return ret;
 	}
 
 	put_count = ttm_bo_del_from_lru(bo);
@@ -647,9 +678,13 @@ static int ttm_bo_delayed_delete(struct ttm_bo_device *bdev, bool remove_all)
 			kref_get(&nentry->list_kref);
 		}
 
-		spin_unlock(&glob->lru_lock);
-		ret = ttm_bo_cleanup_refs(entry, false, !remove_all,
-					  !remove_all);
+		ret = ttm_bo_reserve_locked(entry, false, !remove_all, false, 0);
+		if (!ret)
+			ret = ttm_bo_cleanup_refs_and_unlock(entry, false,
+							     !remove_all);
+		else
+			spin_unlock(&glob->lru_lock);
+
 		kref_put(&entry->list_kref, ttm_bo_release_list);
 		entry = nentry;
 
@@ -803,9 +838,13 @@ retry:
 	kref_get(&bo->list_kref);
 
 	if (!list_empty(&bo->ddestroy)) {
-		spin_unlock(&glob->lru_lock);
-		ret = ttm_bo_cleanup_refs(bo, interruptible,
-					  no_wait_reserve, no_wait_gpu);
+		ret = ttm_bo_reserve_locked(bo, interruptible, no_wait_reserve, false, 0);
+		if (!ret)
+			ret = ttm_bo_cleanup_refs_and_unlock(bo, interruptible,
+							     no_wait_gpu);
+		else
+			spin_unlock(&glob->lru_lock);
+
 		kref_put(&bo->list_kref, ttm_bo_release_list);
 
 		return ret;
@@ -1799,8 +1838,9 @@ static int ttm_bo_swapout(struct ttm_mem_shrink *shrink)
 		kref_get(&bo->list_kref);
 
 		if (!list_empty(&bo->ddestroy)) {
-			spin_unlock(&glob->lru_lock);
-			(void) ttm_bo_cleanup_refs(bo, false, false, false);
+			ttm_bo_reserve_locked(bo, false, false, false, 0);
+			ttm_bo_cleanup_refs_and_unlock(bo, false, false);
+
 			kref_put(&bo->list_kref, ttm_bo_release_list);
 			spin_lock(&glob->lru_lock);
 			continue;