[Intel-gfx] [PATCH 49/50] drm/i915/bdw: Help out the ctx switch interrupt handler

Wed Jun 11 15:57:38 CEST 2014

On Wed, Jun 11, 2014 at 12:01:42PM +0000, Mateo Lozano, Oscar wrote:
> > -----Original Message-----
> > From: Daniel Vetter [mailto:daniel.vetter at ffwll.ch] On Behalf Of Daniel
> > Vetter
> > Sent: Wednesday, June 11, 2014 12:50 PM
> > To: Mateo Lozano, Oscar
> > Cc: intel-gfx at lists.freedesktop.org
> > Subject: Re: [Intel-gfx] [PATCH 49/50] drm/i915/bdw: Help out the ctx switch
> > interrupt handler
> > 
> > On Fri, May 09, 2014 at 01:09:19PM +0100, oscar.mateo at intel.com wrote:
> > > From: Oscar Mateo <oscar.mateo at intel.com>
> > >
> > > If we receive a storm of requests for the same context (see
> > > gem_storedw_loop_*) we might end up iterating over too many elements
> > > in interrupt time, looking for contexts to squash together. Instead,
> > > share the burden by giving more intelligence to the queue function. At
> > > most, the interrupt will iterate over three elements.
> > >
> > > Signed-off-by: Oscar Mateo <oscar.mateo at intel.com>
> > > ---
> > >  drivers/gpu/drm/i915/intel_lrc.c | 23 ++++++++++++++++++++---
> > >  1 file changed, 20 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/i915/intel_lrc.c
> > > b/drivers/gpu/drm/i915/intel_lrc.c
> > > index d9edd10..0aad721 100644
> > > --- a/drivers/gpu/drm/i915/intel_lrc.c
> > > +++ b/drivers/gpu/drm/i915/intel_lrc.c
> > > @@ -410,9 +410,11 @@ int gen8_switch_context_queue(struct
> > intel_engine *ring,
> > >  			      struct i915_hw_context *to,
> > >  			      u32 tail)
> > >  {
> > > +	struct drm_i915_private *dev_priv = ring->dev->dev_private;
> > >  	struct drm_i915_gem_request *req = NULL;
> > >  	unsigned long flags;
> > > -	bool was_empty;
> > > +	struct drm_i915_gem_request *cursor;
> > > +	int num_elements = 0;
> > >
> > >  	req = kzalloc(sizeof(*req), GFP_KERNEL);
> > >  	if (req == NULL)
> > > @@ -425,9 +427,24 @@ int gen8_switch_context_queue(struct
> > intel_engine
> > > *ring,
> > >
> > >  	spin_lock_irqsave(&ring->execlist_lock, flags);
> > >
> > > -	was_empty = list_empty(&ring->execlist_queue);
> > > +	list_for_each_entry(cursor, &ring->execlist_queue, execlist_link)
> > > +		if (++num_elements > 2)
> > > +			break;
> > > +
> > > +	if (num_elements > 2) {
> > > +		struct drm_i915_gem_request *tail_req =
> > > +				list_last_entry(&ring->execlist_queue,
> > > +					struct drm_i915_gem_request,
> > execlist_link);
> > > +		if (to == tail_req->ctx) {
> > > +			WARN(tail_req->elsp_submitted != 0,
> > > +					"More than 2 already-submitted reqs
> > queued\n");
> > > +			list_del(&tail_req->execlist_link);
> > > +			queue_work(dev_priv->wq, &tail_req->work);
> > > +		}
> > > +	}
> > 
> > Completely forgotten to mention this: Chris&I discussed this on irc and I
> > guess this issue will disappear if we track contexts instead of requests in the
> > scheduler. I guess this is an artifact of the gen7 scheduler you've based this
> > on, but even for that I think scheduling contexts (with preempt point after
> > each batch) is the right approach. But I haven't dug out the scheduler patches
> > again so might be wrong with that.
> > -Daniel
> 
> Hmmmm... I didn´t really base this on the scheduler. Some kind of queue
> to hold context submissions until the hardware was ready was needed, and
> queuing drm_i915_gem_requests seemed like a good choice at the time (by
> the way, in the next version I am using a new struct
> intel_ctx_submit_request, since I don´t need most of the fields in
> drm_i915_gem_requests, and I have to add a couple of new ones anyway).
> 
> What do you mean by "scheduling contexts"? Notice that the requests I am
> queuing basically just contain the context and the tail at the point it
> was submitted for execution...

Well I've thought we could just throw contexts at the hardware and throw
new ones at it when the old ones get stuck/are completed. But now I've
realized that since we do the cross-engine/ctx depency tracking in
software it's not quite that easy and we can't unconditionally update the
tail-pointer.

Still for the degenerate case of one ctx submitting batches exclusively
I've hoped just updating the tail pointer in the context and telling the
hw to reload the current context should have been enough. Or at least I've
hoped so, and that should take (mostly) care of the insane request
overload case your patch here addresses.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch