[Intel-gfx] [PATCH] drm/i915/execlists: Use a locked clear_bit() for synchronisation with interrupt

Wed Mar 21 10:14:25 UTC 2018

On 21/03/2018 09:10, Chris Wilson wrote:
> We were relying on the uncached reads when processing the CSB to provide
> ourselves with the serialisation with the interrupt handler (so we could
> detect new interrupts in the middle of processing the old one). However,
> in commit 767a983ab255 ("drm/i915/execlists: Read the context-status HEAD
> from the HWSP") those uncached reads were eliminated (on one path at
> least) and along with them our serialisation. The result is that we
> would very rarely miss notification of a new interrupt and leave a
> context-switch unprocessed, hanging the GPU.
> 
> Fixes: 767a983ab255 ("drm/i915/execlists: Read the context-status HEAD from the HWSP")
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Michel Thierry <michel.thierry at intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> Cc: Mika Kuoppala <mika.kuoppala at intel.com>
> ---
>   drivers/gpu/drm/i915/intel_lrc.c | 21 ++++++++-------------
>   1 file changed, 8 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 53f1c009ed7b..67b6a0f658d6 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -831,7 +831,8 @@ static void execlists_submission_tasklet(unsigned long data)
>   	struct drm_i915_private *dev_priv = engine->i915;
>   	bool fw = false;
>   
> -	/* We can skip acquiring intel_runtime_pm_get() here as it was taken
> +	/*
> +	 * We can skip acquiring intel_runtime_pm_get() here as it was taken
>   	 * on our behalf by the request (see i915_gem_mark_busy()) and it will
>   	 * not be relinquished until the device is idle (see
>   	 * i915_gem_idle_work_handler()). As a precaution, we make sure
> @@ -840,7 +841,8 @@ static void execlists_submission_tasklet(unsigned long data)
>   	 */
>   	GEM_BUG_ON(!dev_priv->gt.awake);
>   
> -	/* Prefer doing test_and_clear_bit() as a two stage operation to avoid
> +	/*
> +	 * Prefer doing test_and_clear_bit() as a two stage operation to avoid
>   	 * imposing the cost of a locked atomic transaction when submitting a
>   	 * new request (outside of the context-switch interrupt).
>   	 */
> @@ -856,17 +858,10 @@ static void execlists_submission_tasklet(unsigned long data)
>   			execlists->csb_head = -1; /* force mmio read of CSB ptrs */
>   		}
>   
> -		/* The write will be ordered by the uncached read (itself
> -		 * a memory barrier), so we do not need another in the form
> -		 * of a locked instruction. The race between the interrupt
> -		 * handler and the split test/clear is harmless as we order
> -		 * our clear before the CSB read. If the interrupt arrived
> -		 * first between the test and the clear, we read the updated
> -		 * CSB and clear the bit. If the interrupt arrives as we read
> -		 * the CSB or later (i.e. after we had cleared the bit) the bit
> -		 * is set and we do a new loop.
> -		 */
> -		__clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
> +		/* Clear before reading to catch new interrupts */
> +		clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
> +		smp_mb__after_atomic();
> +

Could theoretically avoid the locked cost in the mmio case by having two 
flavours of bit clearing in the "if" branches below but it doesn't 
sounds like a worthy complication in a wider context.

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>

Now off to apply it in desperation it might affect my weird reset issues...

Regards,

Tvrtko

>   		if (unlikely(execlists->csb_head == -1)) { /* following a reset */
>   			if (!fw) {
>   				intel_uncore_forcewake_get(dev_priv,
>