[Intel-gfx] [PATCH] drm/i915: Prevent oops on req->engine in rcu-protected peeking

Fri Aug 5 16:37:00 UTC 2016

When only rcu-protected we might peek at a reinitializing request.
Prevent carnage by making sure we don't accidentally chase a NULL
pointer.

The proper fix for this is to drop the memset (with kzalloc) in the
request allocation function, since that avoids both the NULL check in
these fastpaths and makes request allocation a notch lighter. But it
also means we need to careful audit all the paths to make sure nothing
gets upset and runs into garbage. And that's a bit much on a late Friday
with Joonas already on w/e. Also, today is drm-intel-next tag day, and
this will be the tag for the first 4.9 pull request.

Hence this easier to review interim fix, which will be replaced early next
week by the proper fix Chris is working on.

Fixes: 0eafec6d3244 ("drm/i915: Enable lockless lookup of request...")
Cc: Chris Wilson <chris at chris-wilson.co.uk>
Cc: "Goel, Akash" <akash.goel at intel.com>
Cc: Daniel Vetter <daniel.vetter at ffwll.ch>
Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter at ffwll.ch>
---
 drivers/gpu/drm/i915/i915_gem_request.h | 24 ++++++++++++++++++++++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h
index 6002adc43523..e55492ba20ec 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.h
+++ b/drivers/gpu/drm/i915/i915_gem_request.h
@@ -244,6 +244,26 @@ i915_gem_request_started(const struct drm_i915_gem_request *req)
 }
 
 static inline bool
+i915_gem_request_completed_rcu(const struct drm_i915_gem_request *req)
+{
+	struct intel_engine_cs *engine = READ_ONCE(req->engine);
+
+	/* When we peek at a request solely under rcu protection, without
+	 * hodling a full reference, the request might be in the process of
+	 * getting freed and reallocated. Make sure we don't stumble over a NULL
+	 * engine in that case.
+	 *
+	 * If we are hitting this race it means that the old request has been
+	 * released, which only happens once it has completed.
+	 */
+	if (!engine)
+		return true;
+
+	return i915_seqno_passed(intel_engine_get_seqno(engine),
+				 req->fence.seqno);
+}
+
+static inline bool
 i915_gem_request_completed(const struct drm_i915_gem_request *req)
 {
 	return i915_seqno_passed(intel_engine_get_seqno(req->engine),
@@ -384,7 +404,7 @@ i915_gem_active_peek_rcu(const struct i915_gem_active *active)
 	struct drm_i915_gem_request *request;
 
 	request = rcu_dereference(active->request);
-	if (!request || i915_gem_request_completed(request))
+	if (!request || i915_gem_request_completed_rcu(request))
 		return NULL;
 
 	return request;
@@ -459,7 +479,7 @@ __i915_gem_active_get_rcu(const struct i915_gem_active *active)
 		struct drm_i915_gem_request *request;
 
 		request = rcu_dereference(active->request);
-		if (!request || i915_gem_request_completed(request))
+		if (!request || i915_gem_request_completed_rcu(request))
 			return NULL;
 
 		request = i915_gem_request_get_rcu(request);
-- 
2.8.1