[Intel-gfx] [PATCH 1/2] drm/i915/bxt: work around HW coherency issue when accessing GPU seqno
Mika Kuoppala
mika.kuoppala at linux.intel.com
Wed Jul 1 06:53:49 PDT 2015
Mika Kuoppala <mika.kuoppala at linux.intel.com> writes:
> Imre Deak <imre.deak at intel.com> writes:
>
>> By running igt/store_dword_loop_render on BXT we can hit a coherency
>> problem where the seqno written at GPU command completion time is not
>> seen by the CPU. This results in __i915_wait_request seeing the stale
>> seqno and not completing the request (not considering the lost
>> interrupt/GPU reset mechanism). I also verified that this isn't a case
>> of a lost interrupt, or that the command didn't complete somehow: when
>> the coherency issue occured I read the seqno via an uncached GTT mapping
>> too. While the cached version of the seqno still showed the stale value
>> the one read via the uncached mapping was the correct one.
>>
>> Work around this issue by clflushing the corresponding CPU cacheline
>> following any store of the seqno and preceding any reading of it. When
>> reading it do this only when the caller expects a coherent view.
>>
>> Testcase: igt/store_dword_loop_render
>> Signed-off-by: Imre Deak <imre.deak at intel.com>
>> ---
>> drivers/gpu/drm/i915/intel_lrc.c | 17 +++++++++++++++++
>> drivers/gpu/drm/i915/intel_ringbuffer.h | 7 +++++++
>> 2 files changed, 24 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
>> index 9f5485d..88bc5525 100644
>> --- a/drivers/gpu/drm/i915/intel_lrc.c
>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
>> @@ -1288,12 +1288,29 @@ static int gen8_emit_flush_render(struct intel_ringbuffer *ringbuf,
>>
>> static u32 gen8_get_seqno(struct intel_engine_cs *ring, bool lazy_coherency)
>> {
>> + /*
>> + * On BXT-A1 there is a coherency issue whereby the MI_STORE_DATA_IMM
>> + * storing the completed request's seqno occasionally doesn't
>> + * invalidate the CPU cache. Work around this by clflushing the
>> + * corresponding cacheline whenever the caller wants the coherency to
>> + * be guaranteed. Note that this cacheline is known to be
>> + * clean at this point, since we only write it in gen8_set_seqno(),
>> + * where we also do a clflush after the write. So this clflush in
>> + * practice becomes an invalidate operation.
>> + */
>> + if (IS_BROXTON(ring->dev) & !lazy_coherency)
>
> s/&/&& ?
s//Read The Whole Thread Before Replying
-Mika
> -Mika
>
>> + intel_flush_status_page(ring, I915_GEM_HWS_INDEX);
>> +
>> return intel_read_status_page(ring, I915_GEM_HWS_INDEX);
>> }
>>
>> static void gen8_set_seqno(struct intel_engine_cs *ring, u32 seqno)
>> {
>> intel_write_status_page(ring, I915_GEM_HWS_INDEX, seqno);
>> +
>> + /* See gen8_get_seqno() explaining the reason for the clflush. */
>> + if (IS_BROXTON(ring->dev))
>> + intel_flush_status_page(ring, I915_GEM_HWS_INDEX);
>> }
>>
>> static int gen8_emit_request(struct intel_ringbuffer *ringbuf,
>> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
>> index 39f6dfc..224a25b 100644
>> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
>> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
>> @@ -352,6 +352,13 @@ intel_ring_sync_index(struct intel_engine_cs *ring,
>> return idx;
>> }
>>
>> +static inline void
>> +intel_flush_status_page(struct intel_engine_cs *ring, int reg)
>> +{
>> + drm_clflush_virt_range(&ring->status_page.page_addr[reg],
>> + sizeof(uint32_t));
>> +}
>> +
>> static inline u32
>> intel_read_status_page(struct intel_engine_cs *ring,
>> int reg)
>> --
>> 2.1.4
>>
>> _______________________________________________
>> Intel-gfx mailing list
>> Intel-gfx at lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
More information about the Intel-gfx
mailing list