[Intel-gfx] [PATCH 14/32] drm/i915: Remove forcewake dance from seqno/irq barrier on legacy gen6+

Dave Gordon david.s.gordon at intel.com
Tue Jan 5 04:45:58 PST 2016


On 11/12/15 11:33, Chris Wilson wrote:
> In order to ensure seqno/irq coherency, we current read a ring register.
> We are not sure quite how it works, only that is does. Experiments show
> that e.g. doing a clflush(seqno) instead is not sufficient, but we can
> remove the forcewake dance from the mmio access.
>
> v2: Baytrail wants a clflush too.
>
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Daniel Vetter <daniel.vetter at ffwll.ch>
> ---
>   drivers/gpu/drm/i915/intel_ringbuffer.c | 15 +++++++++++++--
>   1 file changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 6cecc15ec01b..69dd69e46fa9 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -1490,10 +1490,21 @@ gen6_ring_get_seqno(struct intel_engine_cs *ring, bool lazy_coherency)
>   {
>   	/* Workaround to force correct ordering between irq and seqno writes on
>   	 * ivb (and maybe also on snb) by reading from a CS register (like
> -	 * ACTHD) before reading the status page. */
> +	 * ACTHD) before reading the status page.
> +	 *
> +	 * Note that this effectively effectively stalls the read by the time
> +	 * it takes to do a memory transaction, which more or less ensures
> +	 * that the write from the GPU has sufficient time to invalidate
> +	 * the CPU cacheline. Alternatively we could delay the interrupt from
> +	 * the CS ring to give the write time to land, but that would incur
> +	 * a delay after every batch i.e. much more frequent than a delay
> +	 * when waiting for the interrupt (with the same net latency).
> +	 */
>   	if (!lazy_coherency) {
>   		struct drm_i915_private *dev_priv = ring->dev->dev_private;
> -		POSTING_READ(RING_ACTHD(ring->mmio_base));
> +		POSTING_READ_FW(RING_ACTHD(ring->mmio_base));
> +
> +		intel_flush_status_page(ring, I915_GEM_HWS_INDEX);
>   	}
>
>   	return intel_read_status_page(ring, I915_GEM_HWS_INDEX);

Hmm ... would putting the flush before the POSTING_READ be better?

Depending on how the h/w implements the cacheline invalidation, it might 
allow some overlap between the cache controller's internal activities 
and the MMIO cycle ...

Also, previously we only had the flush on BXT, whereas now you're doing 
it on all gen6+. I think this is probably a good thing, but just 
wondered whether there's any downside to it?

Also ... are we sure that no-one calls this without having a forcewake 
in effect at the time, in particular debugfs? Or is it not going to end 
up going through here once lazy_coherency is abolished?

.Dave.


More information about the Intel-gfx mailing list