[Intel-gfx] [PATCH] drm/i915: Restore engine->submit_request before unwedging

Fri Mar 10 13:18:00 UTC 2017

On Fri, Mar 10, 2017 at 12:59:47PM +0000, Tvrtko Ursulin wrote:
> 
> On 09/03/2017 10:22, Chris Wilson wrote:
> >+void i915_gem_unset_wedged(struct drm_i915_private *dev_priv)
> >+{
> >+	struct i915_gem_timeline *tl;
> >+	int i;
> >+
> >+	lockdep_assert_held(&dev_priv->drm.struct_mutex);
> >+	if (!test_bit(I915_WEDGED, &dev_priv->gpu_error.flags))
> >+		return;
> >+
> >+	/* Before unwedging, make sure that all pending operations
> >+	 * are flushed and errored out. No more can be submitted until
> >+	 * we reset the wedged bit.
> >+	 */
> >+	list_for_each_entry(tl, &dev_priv->gt.timelines, link) {
> >+		for (i = 0; i < ARRAY_SIZE(tl->engine); i++) {
> >+			struct drm_i915_gem_request *rq;
> >+
> >+			rq = i915_gem_active_peek(&tl->engine[i].last_request,
> >+						  &dev_priv->drm.struct_mutex);
> >+			if (!rq)
> >+				continue;
> >+
> >+			/* We can't use our normal waiter as we want to
> >+			 * avoid recursively trying to handle the current
> >+			 * reset.
> >+			 */
> >+			dma_fence_default_wait(&rq->fence, false,
> >+					       MAX_SCHEDULE_TIMEOUT);
> 
> Who will signal these since GPU is stuck and this happens before the
> GEM reset calls in i915_reset? Obviously I totally don't understand
> how is this supposed to work.. :)

All in-flight requests are completed by the reset and signaled. Then the
reset installs engine->submit_request = nop_submit_request to catch any
new requests that were waiting on a fence (either ours or a third party)
before being submitted. nop_submit_request() will advance the seqno and
wakeup the signaler who will then signal the fence (and
dma_fence_default_wait will ensure the signaler is armed).

It is a bit of a roundabout route just to ensure those outstanding
fences are indeed signaled. And the loop over all timelines is just as
ugly.

> >+		}
> >+	}
> >+
> >+	/* Undo nop_submit_request */
> >+	if (i915.enable_execlists)
> >+		intel_execlists_enable_submission(dev_priv);
> >+	else
> >+		intel_ringbuffer_enable_submission(dev_priv);
> 
> No need for stop machine as the wedging uses?

No. We prevent all new i915 requests from being queued (by
disallowing execbuf whilst wedged) so having waited for all active
requests above, we know the system is idle and do not have to worry
about a thread being inside engine->submit_request() as we swap over.

> >+
> >+	smp_mb__before_atomic();
> 
> What is this for?

Paranoid brain said that clear_bit() didn't guarantee a barrier.

/**
 * clear_bit - Clears a bit in memory
 * @nr: Bit to clear
 * @addr: Address to start counting from
 *
 * clear_bit() is atomic and may not be reordered.  However, it does
 * not contain a memory barrier, so if it is used for locking purposes,
 * you should call smp_mb__before_atomic() and/or smp_mb__after_atomic()
 * in order to ensure changes are visible on other processors.
 */

Though we have struct_mutex as a barrier right now, this should be on
less surprise later.

> >+	clear_bit(I915_WEDGED, &dev_priv->gpu_error.flags);
> >+}
> >+
> > static void
> > i915_gem_retire_work_handler(struct work_struct *work)
> > {
> >diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> >index 4a864f8c9387..753586f6ddbe 100644
> >--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> >+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> >@@ -2224,3 +2224,15 @@ int intel_init_vebox_ring_buffer(struct intel_engine_cs *engine)
> >
> > 	return intel_init_ring_buffer(engine);
> > }
> >+
> >+void intel_ringbuffer_enable_submission(struct drm_i915_private *i915)
> >+{
> >+	struct intel_engine_cs *engine;
> >+	enum intel_engine_id id;
> >+
> >+	for_each_engine(engine, i915, id) {
> >+		engine->submit_request = i9xx_submit_request;
> >+		if (IS_GEN6(i915) && id == VCS)
> >+			engine->submit_request = gen6_bsd_submit_request;
> >+	}
> 
> I wonder if it would be worth extracting setting of this vfunc (and
> schedule in execlists case) to a helper so the logic is not
> duplicated. Sounds a bit marginal at the moment, don't know.

I'm not happy with it either at the moment. It is a quick dirty interface,
 that unfortunately with have the tendency to stick around :|

So I'm thinking something like

	engine->enable_submit() or engine->install_submit_request()

?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre