[Intel-gfx] [PATCH 3/4] drm/i915: Insert a full mb() before reading the seqno from the status page

Sat Jan 19 13:02:20 CET 2013

On Fri, 19 Oct 2012 13:52:49 -0700, Jesse Barnes <jbarnes at virtuousgeek.org> wrote:
> On Fri, 19 Oct 2012 21:40:17 +0100
> Chris Wilson <chris at chris-wilson.co.uk> wrote:
> 
> > On Thu, 11 Oct 2012 12:46:00 -0700, Jesse Barnes <jbarnes at virtuousgeek.org> wrote:
> > > On Tue,  9 Oct 2012 19:24:39 +0100
> > > Chris Wilson <chris at chris-wilson.co.uk> wrote:
> > > 
> > > > Hopefully this will reduce a few of the missed IRQ warnings.
> > > > 
> > > > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > > > ---
> > > >  drivers/gpu/drm/i915/intel_ringbuffer.c |    8 +++++++-
> > > >  drivers/gpu/drm/i915/intel_ringbuffer.h |    2 --
> > > >  2 files changed, 7 insertions(+), 3 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > index e069e69..133beb6 100644
> > > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > @@ -704,14 +704,18 @@ gen6_ring_get_seqno(struct intel_ring_buffer *ring, bool lazy_coherency)
> > > >  	/* Workaround to force correct ordering between irq and seqno writes on
> > > >  	 * ivb (and maybe also on snb) by reading from a CS register (like
> > > >  	 * ACTHD) before reading the status page. */
> > > > -	if (!lazy_coherency)
> > > > +	if (!lazy_coherency) {
> > > >  		intel_ring_get_active_head(ring);
> > > > +		mb();
> > > > +	}
> > > >  	return intel_read_status_page(ring, I915_GEM_HWS_INDEX);
> > > >  }
> > > >  
> > > >  static u32
> > > >  ring_get_seqno(struct intel_ring_buffer *ring, bool lazy_coherency)
> > > >  {
> > > > +	if (!lazy_coherency)
> > > > +		mb();
> > > >  	return intel_read_status_page(ring, I915_GEM_HWS_INDEX);
> > > >  }
> > > >  
> > > > @@ -719,6 +723,8 @@ static u32
> > > >  pc_render_get_seqno(struct intel_ring_buffer *ring, bool lazy_coherency)
> > > >  {
> > > >  	struct pipe_control *pc = ring->private;
> > > > +	if (!lazy_coherency)
> > > > +		mb();
> > > >  	return pc->cpu_page[0];
> > > >  }
> > > >  
> > > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> > > > index 2ea7a31..40b252e 100644
> > > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> > > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> > > > @@ -160,8 +160,6 @@ static inline u32
> > > >  intel_read_status_page(struct intel_ring_buffer *ring,
> > > >  		       int reg)
> > > >  {
> > > > -	/* Ensure that the compiler doesn't optimize away the load. */
> > > > -	barrier();
> > > >  	return ring->status_page.page_addr[reg];
> > > >  }
> > > >  
> > > 
> > > This looks a bit more voodoo-y.  Theoretically an mb() on the CPU side
> > > should have nothing to do with what the GPU just wrote to the status
> > > page.  It'll slow down the read a bit but shouldn't affect coherence at
> > > all...  An MMIO read from the GPU otoh should flush any stubborn DMA
> > > buffers.
> > 
> > Absolutely convinced? Aren't we here more worried about the view of the
> > shared cache from any particular core and so need to treat this as an
> > SMP programming problem, in which case we do need to worry about memory
> > barriers around dependent reads and writes between processors?
> > 
> > But it is definitely more voodoo...
> 
> If it's an SMP issue, barriers won't help, we need actual
> synchronization in the form of locks or something.

Glad you agree. How are locks implemented? :)

Irrespectively of the contentious patches Daniel thought would be a good
idea, we need the first 2 to fix the mb() around fences. Poke.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre