[Intel-gfx] [PATCH 1/2] drm/i915/bxt: work around HW coherency issue when accessing GPU seqno
Imre Deak
imre.deak at intel.com
Wed Jun 10 07:07:46 PDT 2015
On ti, 2015-06-09 at 11:21 +0300, Jani Nikula wrote:
> On Mon, 08 Jun 2015, Imre Deak <imre.deak at intel.com> wrote:
> > By running igt/store_dword_loop_render on BXT we can hit a coherency
> > problem where the seqno written at GPU command completion time is not
> > seen by the CPU. This results in __i915_wait_request seeing the stale
> > seqno and not completing the request (not considering the lost
> > interrupt/GPU reset mechanism). I also verified that this isn't a case
> > of a lost interrupt, or that the command didn't complete somehow: when
> > the coherency issue occured I read the seqno via an uncached GTT mapping
> > too. While the cached version of the seqno still showed the stale value
> > the one read via the uncached mapping was the correct one.
> >
> > Work around this issue by clflushing the corresponding CPU cacheline
> > following any store of the seqno and preceding any reading of it. When
> > reading it do this only when the caller expects a coherent view.
> >
> > Testcase: igt/store_dword_loop_render
> > Signed-off-by: Imre Deak <imre.deak at intel.com>
> > ---
> > drivers/gpu/drm/i915/intel_lrc.c | 17 +++++++++++++++++
> > drivers/gpu/drm/i915/intel_ringbuffer.h | 7 +++++++
> > 2 files changed, 24 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> > index 9f5485d..88bc5525 100644
> > --- a/drivers/gpu/drm/i915/intel_lrc.c
> > +++ b/drivers/gpu/drm/i915/intel_lrc.c
> > @@ -1288,12 +1288,29 @@ static int gen8_emit_flush_render(struct intel_ringbuffer *ringbuf,
> >
> > static u32 gen8_get_seqno(struct intel_engine_cs *ring, bool lazy_coherency)
> > {
> > + /*
> > + * On BXT-A1 there is a coherency issue whereby the MI_STORE_DATA_IMM
> > + * storing the completed request's seqno occasionally doesn't
> > + * invalidate the CPU cache. Work around this by clflushing the
> > + * corresponding cacheline whenever the caller wants the coherency to
> > + * be guaranteed. Note that this cacheline is known to be
> > + * clean at this point, since we only write it in gen8_set_seqno(),
> > + * where we also do a clflush after the write. So this clflush in
> > + * practice becomes an invalidate operation.
> > + */
> > + if (IS_BROXTON(ring->dev) & !lazy_coherency)
>
> Should be &&.
Thanks for catching it, I'll send a v2 with this fixed if there is no
more feedback.
>
> BR,
> Jani.
>
> > + intel_flush_status_page(ring, I915_GEM_HWS_INDEX);
> > +
> > return intel_read_status_page(ring, I915_GEM_HWS_INDEX);
> > }
> >
> > static void gen8_set_seqno(struct intel_engine_cs *ring, u32 seqno)
> > {
> > intel_write_status_page(ring, I915_GEM_HWS_INDEX, seqno);
> > +
> > + /* See gen8_get_seqno() explaining the reason for the clflush. */
> > + if (IS_BROXTON(ring->dev))
> > + intel_flush_status_page(ring, I915_GEM_HWS_INDEX);
> > }
> >
> > static int gen8_emit_request(struct intel_ringbuffer *ringbuf,
> > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> > index 39f6dfc..224a25b 100644
> > --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> > @@ -352,6 +352,13 @@ intel_ring_sync_index(struct intel_engine_cs *ring,
> > return idx;
> > }
> >
> > +static inline void
> > +intel_flush_status_page(struct intel_engine_cs *ring, int reg)
> > +{
> > + drm_clflush_virt_range(&ring->status_page.page_addr[reg],
> > + sizeof(uint32_t));
> > +}
> > +
> > static inline u32
> > intel_read_status_page(struct intel_engine_cs *ring,
> > int reg)
> > --
> > 2.1.4
> >
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx at lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
>
More information about the Intel-gfx
mailing list