[Intel-gfx] [PATCH 017/190] drm/i915: Remove forcewake dance from seqno/irq barrier on legacy gen6+

Thu Jan 21 08:27:07 PST 2016

Dave Gordon <david.s.gordon at intel.com> writes:

> On 11/01/16 09:16, Chris Wilson wrote:
>> In order to ensure seqno/irq coherency, we current read a ring register.
>> We are not sure quite how it works, only that is does. Experiments show
>> that e.g. doing a clflush(seqno) instead is not sufficient, but we can
>> remove the forcewake dance from the mmio access.
>>
>> v2: Baytrail wants a clflush too.
>>
>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>> Cc: Daniel Vetter <daniel.vetter at ffwll.ch>
>> ---
>>   drivers/gpu/drm/i915/intel_ringbuffer.c | 15 +++++++++++++--
>>   1 file changed, 13 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
>> index 99780b674311..a1d43b2c7077 100644
>> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
>> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
>> @@ -1490,10 +1490,21 @@ gen6_ring_get_seqno(struct intel_engine_cs *ring, bool lazy_coherency)
>>   {
>>   	/* Workaround to force correct ordering between irq and seqno writes on
>>   	 * ivb (and maybe also on snb) by reading from a CS register (like
>> -	 * ACTHD) before reading the status page. */
>> +	 * ACTHD) before reading the status page.
>> +	 *
>> +	 * Note that this effectively effectively stalls the read by the time
>> +	 * it takes to do a memory transaction, which more or less ensures
>> +	 * that the write from the GPU has sufficient time to invalidate
>> +	 * the CPU cacheline. Alternatively we could delay the interrupt from
>> +	 * the CS ring to give the write time to land, but that would incur
>> +	 * a delay after every batch i.e. much more frequent than a delay
>> +	 * when waiting for the interrupt (with the same net latency).
>> +	 */
>>   	if (!lazy_coherency) {
>>   		struct drm_i915_private *dev_priv = ring->dev->dev_private;
>> -		POSTING_READ(RING_ACTHD(ring->mmio_base));
>> +		POSTING_READ_FW(RING_ACTHD(ring->mmio_base));
>> +
>> +		intel_flush_status_page(ring, I915_GEM_HWS_INDEX);
>>   	}
>>
>>   	return intel_read_status_page(ring, I915_GEM_HWS_INDEX);
>
> Well, I generally like this, but my previous questions of 2015-01-05 
> were not answered:
>
>> Hmm ... would putting the flush /before/ the POSTING_READ be better?
>>
>> Depending on how the h/w implements the cacheline invalidation, it
>  > might allow some overlap between the cache controller's internal
>  > activities and the MMIO cycle ...

I thought of the sequence of events like this:

#1: read(acthd9) -> gpu flushes the write (if cpu snoop fails,
we still have the correct on in dram)

#2: flush_status_page() -> is actually going to be invalidate for cpu as
it is only written from cpu side on init. (explained in bxt_a_get_seqno)

#3: read status page will get coherent value from dram.

>>
>> Also, previously we only had the flush on BXT, whereas now you're
>  > doing it on all gen6+. I think this is probably a good thing, but just
>  > wondered whether there's any downside to it?

Perf hit. But when seqno coherence is essential, we need
to be absolutely sure due to way we handle irqs. The only 
place where we force coherence is from __i915_wait_request,
and there only before final decision before going to sleep.
(well hangcheck also does the coherent read but that has no
relevance on perf).

Our irq handling is built on a principle that if
we enable interrupts with waiters, absolutely nothing can get lost.
There is no leeway after irq_get().

If we are going to put task to sleep, we can afford a cacheline
flush. 

So guarantee a coherent read asked to be coherent. And relax
the rules per gen basis, if someone can show passed tests with
improved perf metrics.

>>
>> Also ... are we sure that no-one calls this without having a
>> forcewake  in effect at the time, in particular debugfs? Or is it not
>  > going to end up going through here once lazy_coherency is abolished?
>

I asked about this from Chris in irc. The actual forcewake
status doesn't matter. If it was not on, we just get zero
but the sideffect still happens: seqno is written.

Thanks,
-Mika

> .Dave.
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx