[Intel-gfx] [CI] drm/i915/execlists: Workaround switching back to a complete context

Fri Mar 27 20:33:29 UTC 2020

Chris Wilson <chris at chris-wilson.co.uk> writes:

> In what seems remarkably similar to the w/a required to not reload an
> idle context with HEAD==TAIL, it appears we must prevent the HW from
> switching to an idle context in ELSP[1], while simultaneously trying to
> preempt the HW to run another context and a continuation of the idle
> context (which is no longer idle).
>
> We can achieve this by preventing the context from completing while we
> reload a new ELSP (by applying ring_set_paused(1) across the whole of
> dequeue), except this eventually fails due to a lite-restore into a
> waiting semaphore does not generate an ACK. Instead, we try to avoid
> making the GPU do anything too challenging and not submit a new ELSP
> while the interrupts + CSB events appear to have fallen behind the
> completed contexts. We expect it to catch up shortly so we queue another
> tasklet execution and hope for the best.
>
> Closes: https://gitlab.freedesktop.org/drm/intel/issues/1501
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> Cc: Mika Kuoppala <mika.kuoppala at linux.intel.com>
> ---
>  drivers/gpu/drm/i915/gt/intel_lrc.c | 26 +++++++++++++++++++++++---
>  1 file changed, 23 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> index b12355048501..5f17ece07858 100644
> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> @@ -1915,11 +1915,26 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>  	 * of trouble.
>  	 */
>  	active = READ_ONCE(execlists->active);
> -	while ((last = *active) && i915_request_completed(last))
> -		active++;
>  
> -	if (last) {
> +	/*
> +	 * In theory we can skip over completed contexts that have not
> +	 * yet been processed by events (as those events are in flight):
> +	 *
> +	 * while ((last = *active) && i915_request_completed(last))
> +	 *	active++;
> +	 *
> +	 * However, the GPU is cannot handle this as it will ultimately

s/is//

I applaud the straightforward nature of this compared to the pausing.
Albeit this seems to have a cost. 

But this should be quite rare event comparatively?

> +	 * find itself trying to jump back into a context it has just
> +	 * completed and barf.
> +	 */
> +
> +	if ((last = *active)) {
>  		if (need_preempt(engine, last, rb)) {
> +			if (i915_request_completed(last)) {
> +				tasklet_hi_schedule(&execlists->tasklet);
> +				return;
> +			}
> +

I was pondering of the lost tracing and if you can
work it backwards to this condition.

But I really hope this nails it,
Reviewed-by: Mika Kuoppala <mika.kuoppala at linux.intel.com>

>  			ENGINE_TRACE(engine,
>  				     "preempting last=%llx:%lld, prio=%d, hint=%d\n",
>  				     last->fence.context,
> @@ -1947,6 +1962,11 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>  			last = NULL;
>  		} else if (need_timeslice(engine, last) &&
>  			   timer_expired(&engine->execlists.timer)) {
> +			if (i915_request_completed(last)) {
> +				tasklet_hi_schedule(&execlists->tasklet);
> +				return;
> +			}
> +
>  			ENGINE_TRACE(engine,
>  				     "expired last=%llx:%lld, prio=%d, hint=%d\n",
>  				     last->fence.context,
> -- 
> 2.20.1
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx