[Intel-gfx] [PATCH 2/6] drm/i915/vlv: Added a rendering specific Hw WA 'WaReadAfterWriteHazard'

Fri Mar 21 17:50:18 CET 2014

On Fri, 2014-03-21 at 14:58 +0000, Daniel Vetter wrote:
> On Fri, Mar 21, 2014 at 11:53:40AM +0000, Gupta, Sourab wrote:
> > On Wed, 2014-01-22 at 11:11 +0000, Chris Wilson wrote:
> > > On Wed, Jan 22, 2014 at 12:54:51PM +0200, Ville Syrjälä wrote:
> > > > On Wed, Jan 22, 2014 at 09:15:06AM +0530, akash.goel at intel.com wrote:
> > > > > From: Akash Goel <akash.goel at intel.com>
> > > > >
> > > > > Added a new rendering specific Workaround 'WaReadAfterWriteHazard'.
> > > > > In this WA, need to add 12 MI Store Dword commands to ensure proper
> > > > > flush of h/w pipeline.
> > > > >
> > > > > Signed-off-by: Akash Goel <akash.goel at intel.com>
> > > > > ---
> > > > >  drivers/gpu/drm/i915/intel_ringbuffer.c | 25 +++++++++++++++++++++++++
> > > > >  1 file changed, 25 insertions(+)
> > > > >
> > > > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > > index 133d273..e8ec536 100644
> > > > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > > @@ -2167,6 +2167,31 @@ intel_ring_flush_all_caches(struct intel_ring_buffer *ring)
> > > > >
> > > > >     trace_i915_gem_ring_flush(ring, 0, I915_GEM_GPU_DOMAINS);
> > > > >
> > > > > +   if (IS_VALLEYVIEW(ring->dev)) {
> > > > > +           /*
> > > > > +            * WaReadAfterWriteHazard
> > > > > +            * Send a number of Store Data commands here to finish
> > > > > +            * flushing hardware pipeline.This is needed in the case
> > > > > +            * where the next workload tries reading from the same
> > > > > +            * surface that this batch writes to. Without these StoreDWs,
> > > > > +            * not all of the data will actually be flushd to the surface
> > > > > +            * by the time the next batch starts reading it, possibly
> > > > > +            * causing a small amount of corruption.
> > > > > +            */
> > > > > +           int i;
> > > > > +           ret = intel_ring_begin(ring, 4 * 12);
> > > >
> > > > BSpec says 8 is enough. Is Bspec incorrect.
> > > 
> > > No, these are just figures they plucked out of the air. Last I heard
> > > they were using 32...
> > > 
> > > > Also this workaround is also listed for everything SNB+.
> > > 
> > > And we already have a more effective workaround.
> > > -Chris
> > 
> > Can you please let us know whether this workaround patch is required or
> > not. If not, then how is this currently handled.
> 
> We emit XY_SETUP_BLT before certain blt operations to insert a
> sufficiently long stall. The underlying bug this works around is that the
> cache controller of the cpu falls over in certain very specific rwm
> cycles. The official w/a is a full pipeline flush when switching between
> reading and writing to a surface, which has a horribly perf impact on the
> blitter.
> -Daniel

Hi Daniel,
In the kernel code, we're not able to see any such operation during the
batchbuffer submission. Is the XY_SETUP_BLT emit done through the
userspace, in these specific usecases?
Does this mean that this patch is not admissible as such for this
underlying bug?

Regards,
Sourab