[Intel-gfx] [PATCH 017/190] drm/i915: Remove forcewake dance from seqno/irq barrier on legacy gen6+

Mon Jan 11 06:02:41 PST 2016

On 11/01/16 09:16, Chris Wilson wrote:
> In order to ensure seqno/irq coherency, we current read a ring register.
> We are not sure quite how it works, only that is does. Experiments show
> that e.g. doing a clflush(seqno) instead is not sufficient, but we can
> remove the forcewake dance from the mmio access.
>
> v2: Baytrail wants a clflush too.
>
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Daniel Vetter <daniel.vetter at ffwll.ch>
> ---
>   drivers/gpu/drm/i915/intel_ringbuffer.c | 15 +++++++++++++--
>   1 file changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 99780b674311..a1d43b2c7077 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -1490,10 +1490,21 @@ gen6_ring_get_seqno(struct intel_engine_cs *ring, bool lazy_coherency)
>   {
>   	/* Workaround to force correct ordering between irq and seqno writes on
>   	 * ivb (and maybe also on snb) by reading from a CS register (like
> -	 * ACTHD) before reading the status page. */
> +	 * ACTHD) before reading the status page.
> +	 *
> +	 * Note that this effectively effectively stalls the read by the time
> +	 * it takes to do a memory transaction, which more or less ensures
> +	 * that the write from the GPU has sufficient time to invalidate
> +	 * the CPU cacheline. Alternatively we could delay the interrupt from
> +	 * the CS ring to give the write time to land, but that would incur
> +	 * a delay after every batch i.e. much more frequent than a delay
> +	 * when waiting for the interrupt (with the same net latency).
> +	 */
>   	if (!lazy_coherency) {
>   		struct drm_i915_private *dev_priv = ring->dev->dev_private;
> -		POSTING_READ(RING_ACTHD(ring->mmio_base));
> +		POSTING_READ_FW(RING_ACTHD(ring->mmio_base));
> +
> +		intel_flush_status_page(ring, I915_GEM_HWS_INDEX);
>   	}
>
>   	return intel_read_status_page(ring, I915_GEM_HWS_INDEX);

Well, I generally like this, but my previous questions of 2015-01-05 
were not answered:

> Hmm ... would putting the flush /before/ the POSTING_READ be better?
>
> Depending on how the h/w implements the cacheline invalidation, it
 > might allow some overlap between the cache controller's internal
 > activities and the MMIO cycle ...
>
> Also, previously we only had the flush on BXT, whereas now you're
 > doing it on all gen6+. I think this is probably a good thing, but just
 > wondered whether there's any downside to it?
>
> Also ... are we sure that no-one calls this without having a
> forcewake  in effect at the time, in particular debugfs? Or is it not
 > going to end up going through here once lazy_coherency is abolished?

.Dave.