[PATCH 2/2] drm/i915: Ignore old error fences in dma-resv

Chris Wilson chris at chris-wilson.co.uk
Fri Mar 5 15:27:53 UTC 2021


Error propagation along fences is used to track the incomplete status of
inflight activity -- if an error occurs and we cannot complete an
operation, the work that depends on that operation must also be
prevented from accessing the void results. dma-resv will then keep those
fences forever (until an explicit prune, or adding a new fence
overwrites an old one) making the error be permanently trapped on an
object. On the one hand this is useful as it keeping the error shows that
the last exclusive operation to the object left the object in an
undefined state, but at the same time there is no mechanism to revoke
that error status. While for critical operations that error must be
trapped until it is repaired, and so we must maintain the mandatory
fences and their error states separately from the obj->base.resv. By
ensuring that distinction, we can go back to the old assumption that any
signaled fence in the dma-resv is volatile (as any signaled fence may be
replaced by adding a new fence) and the error status it provides not
required for defining subsequent behaviour.

The impact of propagating the fence errors from the dma-resv is that
once a GPU hang is injected into the rendering chain that will propagate
onto a persistent scanout surface, preventing that scanout from being
updated again until the dma-resv is flushed. And the error is likely
propagated onto the back buffer as well, causing each surface to be
frozen and the windowing system to flip between a fixed set of frames

This uses the single array output for shared/excl introduced by
commit a35f2f34b5b4 ("dma-buf: make returning the exclusive fence
optional").

Reported-and-tested-by: Miroslav Bendik
Reported-by: Marcin Slusarz <marcin.slusarz at intel.com>
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/3080
Fixes: 9e31c1fe45d5 ("drm/i915: Propagate errors on awaiting already signaled fences")
Testcase: igt/gem_exec_fence/forgetful-error
Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
Cc: <stable at vger.kernel.org> # v5.6+
---
 drivers/gpu/drm/i915/i915_request.c | 39 +++++++++++------------------
 1 file changed, 15 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index e7b4c4bc41a6..715bf0f8923c 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -1393,40 +1393,31 @@ i915_request_await_object(struct i915_request *to,
 			  struct drm_i915_gem_object *obj,
 			  bool write)
 {
-	struct dma_fence *excl;
-	int ret = 0;
+	struct dma_fence *excl, **shared = !
+	unsigned int count;
+	int ret;
 
 	if (write) {
-		struct dma_fence **shared;
-		unsigned int count, i;
-
 		ret = dma_resv_get_fences_rcu(obj->base.resv,
-							&excl, &count, &shared);
-		if (ret)
-			return ret;
-
-		for (i = 0; i < count; i++) {
-			ret = i915_request_await_dma_fence(to, shared[i]);
-			if (ret)
-				break;
-
-			dma_fence_put(shared[i]);
-		}
-
-		for (; i < count; i++)
-			dma_fence_put(shared[i]);
-		kfree(shared);
+					      NULL, &count, &shared);
 	} else {
 		excl = dma_resv_get_excl_rcu(obj->base.resv);
+		count = !!excl;
+		ret = 0;
 	}
 
-	if (excl) {
-		if (ret == 0)
-			ret = i915_request_await_dma_fence(to, excl);
+	while (count--) {
+		struct dma_fence *fence = shared[count];
 
-		dma_fence_put(excl);
+		if (ret == 0 && !dma_fence_is_signaled(fence))
+			ret = i915_request_await_dma_fence(to, fence);
+
+		dma_fence_put(fence);
 	}
 
+	if (shared != &excl)
+		kfree(shared);
+
 	return ret;
 }
 
-- 
2.20.1



More information about the Intel-gfx-trybot mailing list