[Intel-gfx] [PATCH v3 3/5] drm/i915: Make sprite updates atomic

Thu Feb 6 10:25:17 CET 2014

On Mon, Jan 27, 2014 at 05:03:39PM +0100, Daniel Vetter wrote:
> On Mon, Jan 27, 2014 at 01:06:33PM +0200, Ville Syrjälä wrote:
> > On Sun, Jan 26, 2014 at 06:29:06PM +0100, Daniel Vetter wrote:
> > > On Tue, Jan 21, 2014 at 04:12:38PM +0200, ville.syrjala at linux.intel.com wrote:
> > > > From: Ville Syrjälä <ville.syrjala at linux.intel.com>
> > > >
> > > > Add a mechanism by which we can evade the leading edge of vblank. This
> > > > guarantees that no two sprite register writes will straddle on either
> > > > side of the vblank start, and that means all the writes will be latched
> > > > together in one atomic operation.
> > > >
> > > > We do the vblank evade by checking the scanline counter, and if it's too
> > > > close to the start of vblank (too close has been hardcoded to 100usec
> > > > for now), we will wait for the vblank start to pass. In order to
> > > > eliminate random delayes from the rest of the system, we operate with
> > > > interrupts disabled, except when waiting for the vblank obviously.
> > > >
> > > > Note that we now go digging through pipe_to_crtc_mapping[] in the
> > > > vblank interrupt handler, which is a bit dangerous since we set up
> > > > interrupts before the crtcs. However in this case since it's the vblank
> > > > interrupt, we don't actually unmask it until some piece of code
> > > > requests it.
> > > >
> > > > v2: preempt_check_resched() calls after local_irq_enable() (Jesse)
> > > >     Hook up the vblank irq stuff on BDW as well
> > > > v3: Pass intel_crtc instead of drm_crtc (Daniel)
> > > >     Warn if crtc.mutex isn't locked (Daniel)
> > > >     Add an explicit compiler barrier and document the barriers (Daniel)
> > > >     Note the irq vs. modeset setup madness in the commit message (Daniel)
> > > >
> > > > Reviewed-by: Jesse Barnes <jbarnes at virtuousgeek.org>
> > > > Signed-off-by: Ville Syrjälä <ville.syrjala at linux.intel.com>
> > > > ---
> > > >  drivers/gpu/drm/i915/i915_irq.c      |  29 ++++++++--
> > > >  drivers/gpu/drm/i915/intel_display.c |   2 +
> > > >  drivers/gpu/drm/i915/intel_drv.h     |   3 +
> > > >  drivers/gpu/drm/i915/intel_sprite.c  | 103 +++++++++++++++++++++++++++++++++++
> > > >  4 files changed, 133 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> > > > index ffb56a9..8ef9edb 100644
> > > > --- a/drivers/gpu/drm/i915/i915_irq.c
> > > > +++ b/drivers/gpu/drm/i915/i915_irq.c
> > > > @@ -1439,6 +1439,15 @@ static void gen6_rps_irq_handler(struct drm_i915_private *dev_priv, u32 pm_iir)
> > > >   }
> > > >  }
> > > >  
> > > > +static void intel_pipe_handle_vblank(struct drm_device *dev, enum pipe pipe)
> > > > +{
> > > > + struct intel_crtc *intel_crtc =
> > > > + to_intel_crtc(intel_get_crtc_for_pipe(dev, pipe));
> > > > +
> > > > + intel_crtc->vbl_received = true;
> > > > + wake_up(&intel_crtc->vbl_wait);
> > > > +}
> > > > +
> > > >  static irqreturn_t valleyview_irq_handler(int irq, void *arg)
> > > >  {
> > > >   struct drm_device *dev = (struct drm_device *) arg;
> > > > @@ -1479,8 +1488,10 @@ static irqreturn_t valleyview_irq_handler(int irq, void *arg)
> > > >   spin_unlock_irqrestore(&dev_priv->irq_lock, irqflags);
> > > >  
> > > >   for_each_pipe(pipe) {
> > > > - if (pipe_stats[pipe] & PIPE_START_VBLANK_INTERRUPT_STATUS)
> > > > + if (pipe_stats[pipe] & PIPE_START_VBLANK_INTERRUPT_STATUS) {
> > > >   drm_handle_vblank(dev, pipe);
> > > > + intel_pipe_handle_vblank(dev, pipe);
> > > > + }
> > > >  
> > > >   if (pipe_stats[pipe] & PLANE_FLIPDONE_INT_STATUS_VLV) {
> > > >   intel_prepare_page_flip(dev, pipe);
> > > > @@ -1679,8 +1690,10 @@ static void ilk_display_irq_handler(struct drm_device *dev, u32 de_iir)
> > > >   DRM_ERROR("Poison interrupt\n");
> > > >  
> > > >   for_each_pipe(pipe) {
> > > > - if (de_iir & DE_PIPE_VBLANK(pipe))
> > > > + if (de_iir & DE_PIPE_VBLANK(pipe)) {
> > > >   drm_handle_vblank(dev, pipe);
> > > > + intel_pipe_handle_vblank(dev, pipe);
> > > > + }
> > > >  
> > > >   if (de_iir & DE_PIPE_FIFO_UNDERRUN(pipe))
> > > >   if (intel_set_cpu_fifo_underrun_reporting(dev, pipe, false))
> > > > @@ -1729,8 +1742,10 @@ static void ivb_display_irq_handler(struct drm_device *dev, u32 de_iir)
> > > >   intel_opregion_asle_intr(dev);
> > > >  
> > > >   for_each_pipe(i) {
> > > > - if (de_iir & (DE_PIPE_VBLANK_IVB(i)))
> > > > + if (de_iir & (DE_PIPE_VBLANK_IVB(i))) {
> > > >   drm_handle_vblank(dev, i);
> > > > + intel_pipe_handle_vblank(dev, i);
> > > > + }
> > > >  
> > > >   /* plane/pipes map 1:1 on ilk+ */
> > > >   if (de_iir & DE_PLANE_FLIP_DONE_IVB(i)) {
> > > > @@ -1872,8 +1887,10 @@ static irqreturn_t gen8_irq_handler(int irq, void *arg)
> > > >   continue;
> > > >  
> > > >   pipe_iir = I915_READ(GEN8_DE_PIPE_IIR(pipe));
> > > > - if (pipe_iir & GEN8_PIPE_VBLANK)
> > > > + if (pipe_iir & GEN8_PIPE_VBLANK) {
> > > >   drm_handle_vblank(dev, pipe);
> > > > + intel_pipe_handle_vblank(dev, pipe);
> > > > + }
> > > >  
> > > >   if (pipe_iir & GEN8_PIPE_FLIP_DONE) {
> > > >   intel_prepare_page_flip(dev, pipe);
> > > > @@ -3161,6 +3178,8 @@ static bool i8xx_handle_vblank(struct drm_device *dev,
> > > >   if (!drm_handle_vblank(dev, pipe))
> > > >   return false;
> > > >  
> > > > + intel_pipe_handle_vblank(dev, pipe);
> > > > +
> > > >   if ((iir & flip_pending) == 0)
> > > >   return false;
> > > >  
> > > > @@ -3344,6 +3363,8 @@ static bool i915_handle_vblank(struct drm_device *dev,
> > > >   if (!drm_handle_vblank(dev, pipe))
> > > >   return false;
> > > >  
> > > > + intel_pipe_handle_vblank(dev, pipe);
> > > > +
> > > >   if ((iir & flip_pending) == 0)
> > > >   return false;
> > > >  
> > > > diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
> > > > index d1cc7ea..b39e445 100644
> > > > --- a/drivers/gpu/drm/i915/intel_display.c
> > > > +++ b/drivers/gpu/drm/i915/intel_display.c
> > > > @@ -10297,6 +10297,8 @@ static void intel_crtc_init(struct drm_device *dev, int pipe)
> > > >   intel_crtc->plane = !pipe;
> > > >   }
> > > >  
> > > > + init_waitqueue_head(&intel_crtc->vbl_wait);
> > > > +
> > > >   BUG_ON(pipe >= ARRAY_SIZE(dev_priv->plane_to_crtc_mapping) ||
> > > >         dev_priv->plane_to_crtc_mapping[intel_crtc->plane] != NULL);
> > > >   dev_priv->plane_to_crtc_mapping[intel_crtc->plane] = &intel_crtc->base;
> > > > diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
> > > > index 318bb4c..e539550 100644
> > > > --- a/drivers/gpu/drm/i915/intel_drv.h
> > > > +++ b/drivers/gpu/drm/i915/intel_drv.h
> > > > @@ -376,6 +376,9 @@ struct intel_crtc {
> > > >   /* watermarks currently being used  */
> > > >   struct intel_pipe_wm active;
> > > >   } wm;
> > > > +
> > > > + wait_queue_head_t vbl_wait;
> > > > + bool vbl_received;
> > > >  };
> > > >  
> > > >  struct intel_plane_wm_parameters {
> > > > diff --git a/drivers/gpu/drm/i915/intel_sprite.c b/drivers/gpu/drm/i915/intel_sprite.c
> > > > index ed9fa7c..1ffa7ba 100644
> > > > --- a/drivers/gpu/drm/i915/intel_sprite.c
> > > > +++ b/drivers/gpu/drm/i915/intel_sprite.c
> > > > @@ -37,6 +37,79 @@
> > > >  #include <drm/i915_drm.h>
> > > >  #include "i915_drv.h"
> > > >  
> > > > +static unsigned int usecs_to_scanlines(const struct drm_display_mode *mode, unsigned int usecs)
> > > > +{
> > > > + /* paranoia */
> > > > + if (!mode->crtc_htotal)
> > > > + return 1;
> > > > +
> > > > + return DIV_ROUND_UP(usecs * mode->crtc_clock, 1000 * mode->crtc_htotal);
> > > > +}
> > > > +
> > > > +static void intel_pipe_update_start(struct intel_crtc *crtc)
> > > > +{
> > > > + struct drm_device *dev = crtc->base.dev;
> > > > + const struct drm_display_mode *mode = &crtc->config.adjusted_mode;
> > > > + enum pipe pipe = crtc->pipe;
> > > > + /* FIXME needs to be calibrated sensibly */
> > > > + unsigned int min = mode->crtc_vblank_start - usecs_to_scanlines(mode, 100);
> > > > + unsigned int max = mode->crtc_vblank_start - 1;
> > > > + long timeout = msecs_to_jiffies_timeout(1);
> > > > + unsigned int scanline;
> > > > +
> > > > + WARN_ON(!mutex_is_locked(&crtc->base.mutex));
> > > > +
> > > > + if (WARN_ON(drm_vblank_get(dev, pipe)))
> > > > + return;
> > > > +
> > > > + local_irq_disable();
> > > > +
> > > > + /*
> > > > + * Explicit compiler barrier is used to make sure that vbl_received
> > > > + * store happens before reading the current scanline, otherwise
> > > > + * something like this this might happen:
> > > > + *
> > > > + * 1. scanline = intel_get_crtc_scanline();
> > > > + * 2. vblank irq -> vbl_received = true
> > > > + * 3. vbl_received = false
> > > > + *
> > > > + * Which could make us miss the fact that we already crossed into
> > > > + * vblank even though the scanline indicates that we're somewhere
> > > > + * just before the start of vblank.
> > > > + *
> > > > + * wake_up() vs. wait_event_timeout() imply the necessary memory
> > > > + * barries to make sure wait_event_timeout() sees the correct
> > > > + * value of vbl_received after waking up.
> > > > + */
> > > > + crtc->vbl_received = false;
> > > > + barrier();
> > > > + scanline = intel_get_crtc_scanline(&crtc->base);
> > > 
> > > Ok, took a bit longer but here's it. I think we need a bit more polish
> > > with this one. My thoughts about all this:
> > > 
> > > - Definitely an s/barrier/smp_*/ required. Yeah, on x86 with the total
> > >   store order it's ok, but a big reason for adding barriers and comments
> > >   is documentation. And a big reason for that is to deter people from too
> > >   many lockless tricks ;-)
> > 
> > I still don't see why you'd need an smp_ memory barrier here. It's just
> > one regular store vs. mmio read ordering issue we have here.
> > 
> > If we'd really need to worry about hardware reordering the operations,
> > then it would seem that we'd need a mandatory memory barrier instead
> > of an smp one.
> 
> Hm yeah, full barrier makes more sense. I'd also throw a full barrier in
> _before_ we reset vbl_received in the interrupt handling code.
> 
> Also a style nitpick with barrier comments is to have it a both places and
> reference the place of the other one. So I think we should throw in an
> another comment in intel_pipe_handle_vblank. Actually two to mention that
> the wake_up implicit parrier syncs with the implicit barrier in wait_event
> in intel_pipe_update_start. With that I'll shut up about my nagging ;-)

I still think a compiler barrier is all we need here. We only have one
piece of shared data, and any ordering restriction we have is strictly
within one CPU (vbl_received=false store vs. scanline read), so normal
cache coherency is enough to handle the irq handler vs. waiter situation.

I could be convinced to use mb() between the vbl_received store and
scanline read for paranoia's sake, but adding a mb() to the irq handler
I don't get at all. I suppose I could add it there just for fun, but then
then the comment would have to be:

/* danvet's memory barrier */

since I wouldn't know what else to write ;)

-- 
Ville Syrjälä
Intel OTC