[Intel-gfx] [PATCH 49/50] drm/i915/bdw: Help out the ctx switch interrupt handler

Wed Jun 11 14:01:42 CEST 2014

> -----Original Message-----
> From: Daniel Vetter [mailto:daniel.vetter at ffwll.ch] On Behalf Of Daniel
> Vetter
> Sent: Wednesday, June 11, 2014 12:50 PM
> To: Mateo Lozano, Oscar
> Cc: intel-gfx at lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH 49/50] drm/i915/bdw: Help out the ctx switch
> interrupt handler
> 
> On Fri, May 09, 2014 at 01:09:19PM +0100, oscar.mateo at intel.com wrote:
> > From: Oscar Mateo <oscar.mateo at intel.com>
> >
> > If we receive a storm of requests for the same context (see
> > gem_storedw_loop_*) we might end up iterating over too many elements
> > in interrupt time, looking for contexts to squash together. Instead,
> > share the burden by giving more intelligence to the queue function. At
> > most, the interrupt will iterate over three elements.
> >
> > Signed-off-by: Oscar Mateo <oscar.mateo at intel.com>
> > ---
> >  drivers/gpu/drm/i915/intel_lrc.c | 23 ++++++++++++++++++++---
> >  1 file changed, 20 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/intel_lrc.c
> > b/drivers/gpu/drm/i915/intel_lrc.c
> > index d9edd10..0aad721 100644
> > --- a/drivers/gpu/drm/i915/intel_lrc.c
> > +++ b/drivers/gpu/drm/i915/intel_lrc.c
> > @@ -410,9 +410,11 @@ int gen8_switch_context_queue(struct
> intel_engine *ring,
> >  			      struct i915_hw_context *to,
> >  			      u32 tail)
> >  {
> > +	struct drm_i915_private *dev_priv = ring->dev->dev_private;
> >  	struct drm_i915_gem_request *req = NULL;
> >  	unsigned long flags;
> > -	bool was_empty;
> > +	struct drm_i915_gem_request *cursor;
> > +	int num_elements = 0;
> >
> >  	req = kzalloc(sizeof(*req), GFP_KERNEL);
> >  	if (req == NULL)
> > @@ -425,9 +427,24 @@ int gen8_switch_context_queue(struct
> intel_engine
> > *ring,
> >
> >  	spin_lock_irqsave(&ring->execlist_lock, flags);
> >
> > -	was_empty = list_empty(&ring->execlist_queue);
> > +	list_for_each_entry(cursor, &ring->execlist_queue, execlist_link)
> > +		if (++num_elements > 2)
> > +			break;
> > +
> > +	if (num_elements > 2) {
> > +		struct drm_i915_gem_request *tail_req =
> > +				list_last_entry(&ring->execlist_queue,
> > +					struct drm_i915_gem_request,
> execlist_link);
> > +		if (to == tail_req->ctx) {
> > +			WARN(tail_req->elsp_submitted != 0,
> > +					"More than 2 already-submitted reqs
> queued\n");
> > +			list_del(&tail_req->execlist_link);
> > +			queue_work(dev_priv->wq, &tail_req->work);
> > +		}
> > +	}
> 
> Completely forgotten to mention this: Chris&I discussed this on irc and I
> guess this issue will disappear if we track contexts instead of requests in the
> scheduler. I guess this is an artifact of the gen7 scheduler you've based this
> on, but even for that I think scheduling contexts (with preempt point after
> each batch) is the right approach. But I haven't dug out the scheduler patches
> again so might be wrong with that.
> -Daniel

Hmmmm... I didn´t really base this on the scheduler. Some kind of queue to hold context submissions until the hardware was ready was needed, and queuing drm_i915_gem_requests seemed like a good choice at the time (by the way, in the next version I am using a new struct intel_ctx_submit_request, since I don´t need most of the fields in drm_i915_gem_requests, and I have to add a couple of new ones anyway).

What do you mean by "scheduling contexts"? Notice that the requests I am queuing basically just contain the context and the tail at the point it was submitted for execution...