[Intel-gfx] [PATCH v2 01/14] drm/i915: Keep a global seqno per-engine

Chris Wilson chris at chris-wilson.co.uk
Wed Feb 15 21:49:41 UTC 2017


On Wed, Feb 15, 2017 at 05:05:40PM +0000, Tvrtko Ursulin wrote:
> 
> On 14/02/2017 09:54, Chris Wilson wrote:
> >Replace the global device seqno with one for each engine, and account
> >for in-flight seqno on each separately. This is consistent with
> >dma-fence as each timeline has separate fence-contexts for each engine
> >and a seqno is only ordered within a fence-context (i.e.  seqno do not
> >need to be ordered wrt to other engines, just ordered within a single
> >engine). This is required to enable request rewinding for preemption on
> >individual engines (we have to rewind the global seqno to avoid
> >overflow, and we do not have to rewind all engines just to preempt one.)
> >
> >Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> >---
> > drivers/gpu/drm/i915/i915_debugfs.c      |  5 +--
> > drivers/gpu/drm/i915/i915_gem_request.c  | 68 +++++++++++++++-----------------
> > drivers/gpu/drm/i915/i915_gem_request.h  |  8 +---
> > drivers/gpu/drm/i915/i915_gem_timeline.h |  4 +-
> > drivers/gpu/drm/i915/intel_breadcrumbs.c | 33 +++++++---------
> > drivers/gpu/drm/i915/intel_engine_cs.c   |  2 -
> > drivers/gpu/drm/i915/intel_ringbuffer.h  |  4 +-
> > 7 files changed, 52 insertions(+), 72 deletions(-)
> >
> >diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> >index cda957c674ee..9b636962cab6 100644
> >--- a/drivers/gpu/drm/i915/i915_debugfs.c
> >+++ b/drivers/gpu/drm/i915/i915_debugfs.c
> >@@ -1080,10 +1080,7 @@ static const struct file_operations i915_error_state_fops = {
> > static int
> > i915_next_seqno_get(void *data, u64 *val)
> > {
> >-	struct drm_i915_private *dev_priv = data;
> >-
> >-	*val = 1 + atomic_read(&dev_priv->gt.global_timeline.seqno);
> >-	return 0;
> >+	return -ENODEV;
> 
> I assume reason for leaving this function in this state appears in a
> later patch? gt.global_timeline stays around for something else?

There's no longer a single global seqno, so we tell userspace (igt) it can't
have it.

> > }
> >
> > static int
> >diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
> >index 24571a5289a4..2d84da0e2b39 100644
> >--- a/drivers/gpu/drm/i915/i915_gem_request.c
> >+++ b/drivers/gpu/drm/i915/i915_gem_request.c
> >@@ -213,7 +213,9 @@ static void i915_gem_request_retire(struct drm_i915_gem_request *request)
> > 	GEM_BUG_ON(!i915_sw_fence_signaled(&request->submit));
> > 	GEM_BUG_ON(!i915_sw_fence_signaled(&request->execute));
> > 	GEM_BUG_ON(!i915_gem_request_completed(request));
> >+
> > 	GEM_BUG_ON(!request->i915->gt.active_requests);
> >+	GEM_BUG_ON(!request->engine->timeline->active_seqno);
> >
> > 	trace_i915_gem_request_retire(request);
> >
> >@@ -237,6 +239,7 @@ static void i915_gem_request_retire(struct drm_i915_gem_request *request)
> > 				 &request->i915->gt.idle_work,
> > 				 msecs_to_jiffies(100));
> > 	}
> >+	request->engine->timeline->active_seqno--;
> 
> Hm, decrementing the seqno, a new concept in our codebase. There is
> no comment where the structure field is added so I guess it's
> reverse engineering time again. :(

I thought the switch from active_requests to active_seqno would be
apparent.

> Back few minutes later - so this is a count of in flight requests?
> Should you call it active_seqnos then so it becomes obvious it is
> not a seqno as we know it?

Yes, Joonas suggested active_seqnos as well. I was trying to restrict
myself to only talking about seqno wrt to timelines.

> It sucks a bit that the decrement is in the infamous retire worker.
> I wonder how hard would it be to DoS the seqno space with it.

Harder now than the active_request space, i.e. virtually impossible. We
can only have 1^20 contexts and each context can only support around
1^10 requests (at least userspace requests, kernel requests can be more
compact, but still larger than 16bytes!).
 
> Protected by the timeline spinlock?

struct_mutex. (Still thinking about what mutex it will be in the
future.)
 
> Would the design work to assume GuC scheduler backend is in and to
> put in flight seqno accounting close to requests coming in and out
> from the hardware?

No, it is currently designed around being a coarse counter integrating
into the runtime pm framework as well as the mempools.

> > 	/* Walk through the active list, calling retire on each. This allows
> > 	 * objects to track their GPU activity and mark themselves as idle
> >@@ -325,15 +328,19 @@ static int i915_gem_init_global_seqno(struct drm_i915_private *i915, u32 seqno)
> > 	GEM_BUG_ON(i915->gt.active_requests > 1);
> >
> > 	/* If the seqno wraps around, we need to clear the breadcrumb rbtree */
> >-	if (!i915_seqno_passed(seqno, atomic_read(&timeline->seqno))) {
> >-		while (intel_breadcrumbs_busy(i915))
> >-			cond_resched(); /* spin until threads are complete */
> >-	}
> >-	atomic_set(&timeline->seqno, seqno);
> >+	for_each_engine(engine, i915, id) {
> >+		struct intel_timeline *tl = &timeline->engine[id];
> >
> >-	/* Finally reset hw state */
> >-	for_each_engine(engine, i915, id)
> >+		if (!i915_seqno_passed(seqno, tl->seqno)) {
> >+			/* spin until threads are complete */
> >+			while (intel_breadcrumbs_busy(engine))
> >+				cond_resched();
> >+		}
> >+
> >+		/* Finally reset hw state */
> >+		tl->seqno = seqno;
> > 		intel_engine_init_global_seqno(engine, seqno);
> >+	}
> 
> Came back here a bit later. Shouldn't you just handle one engine in
> this function if seqnos are per-engine now?

No. We still have multiple engines listening to the seqno of others
(legacy semaphores). So if we wraparound on RCS we have to idle xCS to
be sure they complete any semaphores (semaphores check for >= value, so
if we set future requests to be smaller, they have to wait for a long
time before a new RCS request overtakes the semaphore value).

> > 	list_for_each_entry(timeline, &i915->gt.timelines, link) {
> > 		for_each_engine(engine, i915, id) {
> >@@ -361,34 +368,28 @@ int i915_gem_set_global_seqno(struct drm_device *dev, u32 seqno)
> > 	return i915_gem_init_global_seqno(dev_priv, seqno - 1);
> > }
> >
> >-static int reserve_global_seqno(struct drm_i915_private *i915)
> >+static int reserve_global_seqno(struct intel_engine_cs *engine)
> 
> Rename to reserve_engine_seqno?

Yes.

> >-	u32 active_requests = ++i915->gt.active_requests;
> >-	u32 seqno = atomic_read(&i915->gt.global_timeline.seqno);
> >+	u32 active = ++engine->timeline->active_seqno;
> >+	u32 seqno = engine->timeline->seqno;
> > 	int ret;
> >
> > 	/* Reservation is fine until we need to wrap around */
> >-	if (likely(seqno + active_requests > seqno))
> >+	if (likely(seqno + active > seqno))
> 
> Case for one of the overflows type helpers?

add_overflows

if (likely(!add_overflows(seqno, active))

> 
> > 		return 0;
> >
> >-	ret = i915_gem_init_global_seqno(i915, 0);
> >+	ret = i915_gem_init_global_seqno(engine->i915, 0);
> 
> i915_gem_init_engine_seqno(engine) ?

No. Still retains global scope, and definitely wants to keep that
nuisance front and centre.

> > 	if (ret) {
> >-		i915->gt.active_requests--;
> >+		engine->timeline->active_seqno--;
> 
> It would be better for the active seqno count to be managed on the
> same level for readability. By that I mean having the decrement in
> add_request where it was incremented.

It's incremented in this function, so the unwind on error is here as
well.

> > 		return ret;
> > 	}
> >
> > 	return 0;
> > }
> >
> >-static u32 __timeline_get_seqno(struct i915_gem_timeline *tl)
> >+static u32 timeline_get_seqno(struct intel_timeline *tl)
> > {
> >-	/* seqno only incremented under a mutex */
> >-	return ++tl->seqno.counter;
> >-}
> >-
> >-static u32 timeline_get_seqno(struct i915_gem_timeline *tl)
> >-{
> >-	return atomic_inc_return(&tl->seqno);
> >+	return ++tl->seqno;
> > }
> >
> > void __i915_gem_request_submit(struct drm_i915_gem_request *request)
> >@@ -402,14 +403,10 @@ void __i915_gem_request_submit(struct drm_i915_gem_request *request)
> > 	GEM_BUG_ON(timeline == request->timeline);
> > 	assert_spin_locked(&timeline->lock);
> >
> >-	seqno = timeline_get_seqno(timeline->common);
> >+	seqno = timeline_get_seqno(timeline);
> > 	GEM_BUG_ON(!seqno);
> > 	GEM_BUG_ON(i915_seqno_passed(intel_engine_get_seqno(engine), seqno));
> >
> >-	GEM_BUG_ON(i915_seqno_passed(timeline->last_submitted_seqno, seqno));
> >-	request->previous_seqno = timeline->last_submitted_seqno;
> >-	timeline->last_submitted_seqno = seqno;
> >-
> > 	/* We may be recursing from the signal callback of another i915 fence */
> > 	spin_lock_nested(&request->lock, SINGLE_DEPTH_NESTING);
> > 	request->global_seqno = seqno;
> 
> This field could also be renamed to engine_seqno to be more
> self-documenting.

That's going to be wide-sweeping change, let's see what it looks like,
e.g. i915_gem_request_get_engine_seqno()

On the other hand, I thought I called the timeline "[global]"
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre


More information about the Intel-gfx mailing list