[Intel-gfx] [PATCH 3/3] drm/i915/execlists: Read the context-status HEAD from the HWSP

Wed Jul 12 23:41:38 UTC 2017

On 12/07/17 15:58, Chris Wilson wrote:
> The engine provides also provides mirror of the CSB write pointer in the
> HWSP, but not of our read pointer. To take advantage of this we need to
> remember where we read up to on the last interrupt and continue off from
> there. This poses a problem following a reset, as we don't know where
> the hw will start writing from, and due to the use of power contexts we
> cannot perform that query during the reset itself. So we continue the
> current modus operandi of delaying the first read of the context-status
> read/write pointers until after the first interrupt. With this we should
> now have eliminated all uncached mmio reads in handling the
> context-status interrupt, though we still have the uncached mmio writes
> for submitting new work, and many uncached mmio reads in the global
> interrupt handler itself. Still a step in the right direction towards
> reducing our resubmit latency, although it appears lost in the noise!
> 
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Michel Thierry <michel.thierry at intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> Cc: Mika Kuoppala <mika.kuoppala at intel.com>
> ---
>   drivers/gpu/drm/i915/intel_lrc.c        | 20 +++++++++++++++-----
>   drivers/gpu/drm/i915/intel_ringbuffer.h |  1 +
>   2 files changed, 16 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index e413465a552b..db750abb905e 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -562,9 +562,15 @@ static void intel_lrc_irq_handler(unsigned long data)
>   		 * is set and we do a new loop.
>   		 */
>   		__clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
> -		head = readl(csb_mmio);
> -		tail = GEN8_CSB_WRITE_PTR(head);
> -		head = GEN8_CSB_READ_PTR(head);
> +		if (unlikely(engine->csb_head == -1)) { /* following a reset */
> +			head = readl(csb_mmio);
> +			tail = GEN8_CSB_WRITE_PTR(head);
> +			head = GEN8_CSB_READ_PTR(head);
> +			engine->csb_head = head;
> +		} else {
> +			head = engine->csb_head;
> +			tail = buf[0xf];

In CNL the tail moves to offset 0x2f of the HWSP (i.e. buf[0x1f]), might 
  be worth considering it immediately since CNL support is being merged.

-Daniele

> +		}
>   		while (head != tail) {
>   			struct drm_i915_gem_request *rq;
>   			unsigned int status;
> @@ -618,8 +624,11 @@ static void intel_lrc_irq_handler(unsigned long data)
>   				   !(status & GEN8_CTX_STATUS_ACTIVE_IDLE));
>   		}
>   
> -		writel(_MASKED_FIELD(GEN8_CSB_READ_PTR_MASK, head << 8),
> -		       csb_mmio);
> +		if (head != engine->csb_head) {
> +			engine->csb_head = head;
> +			writel(_MASKED_FIELD(GEN8_CSB_READ_PTR_MASK, head << 8),
> +			       csb_mmio);
> +		}
>   	}
>   
>   	if (execlists_elsp_ready(engine))
> @@ -1246,6 +1255,7 @@ static int gen8_init_common_ring(struct intel_engine_cs *engine)
>   
>   	/* After a GPU reset, we may have requests to replay */
>   	clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
> +	engine->csb_head = -1;
>   
>   	submit = false;
>   	for (n = 0; n < ARRAY_SIZE(engine->execlist_port); n++) {
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index d33c93444c0d..56751413e40c 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -391,6 +391,7 @@ struct intel_engine_cs {
>   	struct rb_root execlist_queue;
>   	struct rb_node *execlist_first;
>   	unsigned int fw_domains;
> +	unsigned int csb_head;
>   
>   	/* Contexts are pinned whilst they are active on the GPU. The last
>   	 * context executed remains active whilst the GPU is idle - the
>