[Intel-gfx] [PATCH v3 3/5] drm/i915: Make sprite updates atomic
Daniel Vetter
daniel at ffwll.ch
Thu Feb 6 10:42:26 CET 2014
On Thu, Feb 06, 2014 at 11:25:17AM +0200, Ville Syrjälä wrote:
> On Mon, Jan 27, 2014 at 05:03:39PM +0100, Daniel Vetter wrote:
> > On Mon, Jan 27, 2014 at 01:06:33PM +0200, Ville Syrjälä wrote:
> > > On Sun, Jan 26, 2014 at 06:29:06PM +0100, Daniel Vetter wrote:
> > > > On Tue, Jan 21, 2014 at 04:12:38PM +0200, ville.syrjala at linux.intel.com wrote:
> > > > > From: Ville Syrjälä <ville.syrjala at linux.intel.com>
> > > > >
> > > > > Add a mechanism by which we can evade the leading edge of vblank. This
> > > > > guarantees that no two sprite register writes will straddle on either
> > > > > side of the vblank start, and that means all the writes will be latched
> > > > > together in one atomic operation.
> > > > >
> > > > > We do the vblank evade by checking the scanline counter, and if it's too
> > > > > close to the start of vblank (too close has been hardcoded to 100usec
> > > > > for now), we will wait for the vblank start to pass. In order to
> > > > > eliminate random delayes from the rest of the system, we operate with
> > > > > interrupts disabled, except when waiting for the vblank obviously.
> > > > >
> > > > > Note that we now go digging through pipe_to_crtc_mapping[] in the
> > > > > vblank interrupt handler, which is a bit dangerous since we set up
> > > > > interrupts before the crtcs. However in this case since it's the vblank
> > > > > interrupt, we don't actually unmask it until some piece of code
> > > > > requests it.
> > > > >
> > > > > v2: preempt_check_resched() calls after local_irq_enable() (Jesse)
> > > > > Hook up the vblank irq stuff on BDW as well
> > > > > v3: Pass intel_crtc instead of drm_crtc (Daniel)
> > > > > Warn if crtc.mutex isn't locked (Daniel)
> > > > > Add an explicit compiler barrier and document the barriers (Daniel)
> > > > > Note the irq vs. modeset setup madness in the commit message (Daniel)
> > > > >
> > > > > Reviewed-by: Jesse Barnes <jbarnes at virtuousgeek.org>
> > > > > Signed-off-by: Ville Syrjälä <ville.syrjala at linux.intel.com>
> > > > > ---
> > > > > drivers/gpu/drm/i915/i915_irq.c | 29 ++++++++--
> > > > > drivers/gpu/drm/i915/intel_display.c | 2 +
> > > > > drivers/gpu/drm/i915/intel_drv.h | 3 +
> > > > > drivers/gpu/drm/i915/intel_sprite.c | 103 +++++++++++++++++++++++++++++++++++
> > > > > 4 files changed, 133 insertions(+), 4 deletions(-)
> > > > >
> > > > > diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> > > > > index ffb56a9..8ef9edb 100644
> > > > > --- a/drivers/gpu/drm/i915/i915_irq.c
> > > > > +++ b/drivers/gpu/drm/i915/i915_irq.c
> > > > > @@ -1439,6 +1439,15 @@ static void gen6_rps_irq_handler(struct drm_i915_private *dev_priv, u32 pm_iir)
> > > > > }
> > > > > }
> > > > >
> > > > > +static void intel_pipe_handle_vblank(struct drm_device *dev, enum pipe pipe)
> > > > > +{
> > > > > + struct intel_crtc *intel_crtc =
> > > > > + to_intel_crtc(intel_get_crtc_for_pipe(dev, pipe));
> > > > > +
> > > > > + intel_crtc->vbl_received = true;
> > > > > + wake_up(&intel_crtc->vbl_wait);
> > > > > +}
> > > > > +
> > > > > static irqreturn_t valleyview_irq_handler(int irq, void *arg)
> > > > > {
> > > > > struct drm_device *dev = (struct drm_device *) arg;
> > > > > @@ -1479,8 +1488,10 @@ static irqreturn_t valleyview_irq_handler(int irq, void *arg)
> > > > > spin_unlock_irqrestore(&dev_priv->irq_lock, irqflags);
> > > > >
> > > > > for_each_pipe(pipe) {
> > > > > - if (pipe_stats[pipe] & PIPE_START_VBLANK_INTERRUPT_STATUS)
> > > > > + if (pipe_stats[pipe] & PIPE_START_VBLANK_INTERRUPT_STATUS) {
> > > > > drm_handle_vblank(dev, pipe);
> > > > > + intel_pipe_handle_vblank(dev, pipe);
> > > > > + }
> > > > >
> > > > > if (pipe_stats[pipe] & PLANE_FLIPDONE_INT_STATUS_VLV) {
> > > > > intel_prepare_page_flip(dev, pipe);
> > > > > @@ -1679,8 +1690,10 @@ static void ilk_display_irq_handler(struct drm_device *dev, u32 de_iir)
> > > > > DRM_ERROR("Poison interrupt\n");
> > > > >
> > > > > for_each_pipe(pipe) {
> > > > > - if (de_iir & DE_PIPE_VBLANK(pipe))
> > > > > + if (de_iir & DE_PIPE_VBLANK(pipe)) {
> > > > > drm_handle_vblank(dev, pipe);
> > > > > + intel_pipe_handle_vblank(dev, pipe);
> > > > > + }
> > > > >
> > > > > if (de_iir & DE_PIPE_FIFO_UNDERRUN(pipe))
> > > > > if (intel_set_cpu_fifo_underrun_reporting(dev, pipe, false))
> > > > > @@ -1729,8 +1742,10 @@ static void ivb_display_irq_handler(struct drm_device *dev, u32 de_iir)
> > > > > intel_opregion_asle_intr(dev);
> > > > >
> > > > > for_each_pipe(i) {
> > > > > - if (de_iir & (DE_PIPE_VBLANK_IVB(i)))
> > > > > + if (de_iir & (DE_PIPE_VBLANK_IVB(i))) {
> > > > > drm_handle_vblank(dev, i);
> > > > > + intel_pipe_handle_vblank(dev, i);
> > > > > + }
> > > > >
> > > > > /* plane/pipes map 1:1 on ilk+ */
> > > > > if (de_iir & DE_PLANE_FLIP_DONE_IVB(i)) {
> > > > > @@ -1872,8 +1887,10 @@ static irqreturn_t gen8_irq_handler(int irq, void *arg)
> > > > > continue;
> > > > >
> > > > > pipe_iir = I915_READ(GEN8_DE_PIPE_IIR(pipe));
> > > > > - if (pipe_iir & GEN8_PIPE_VBLANK)
> > > > > + if (pipe_iir & GEN8_PIPE_VBLANK) {
> > > > > drm_handle_vblank(dev, pipe);
> > > > > + intel_pipe_handle_vblank(dev, pipe);
> > > > > + }
> > > > >
> > > > > if (pipe_iir & GEN8_PIPE_FLIP_DONE) {
> > > > > intel_prepare_page_flip(dev, pipe);
> > > > > @@ -3161,6 +3178,8 @@ static bool i8xx_handle_vblank(struct drm_device *dev,
> > > > > if (!drm_handle_vblank(dev, pipe))
> > > > > return false;
> > > > >
> > > > > + intel_pipe_handle_vblank(dev, pipe);
> > > > > +
> > > > > if ((iir & flip_pending) == 0)
> > > > > return false;
> > > > >
> > > > > @@ -3344,6 +3363,8 @@ static bool i915_handle_vblank(struct drm_device *dev,
> > > > > if (!drm_handle_vblank(dev, pipe))
> > > > > return false;
> > > > >
> > > > > + intel_pipe_handle_vblank(dev, pipe);
> > > > > +
> > > > > if ((iir & flip_pending) == 0)
> > > > > return false;
> > > > >
> > > > > diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
> > > > > index d1cc7ea..b39e445 100644
> > > > > --- a/drivers/gpu/drm/i915/intel_display.c
> > > > > +++ b/drivers/gpu/drm/i915/intel_display.c
> > > > > @@ -10297,6 +10297,8 @@ static void intel_crtc_init(struct drm_device *dev, int pipe)
> > > > > intel_crtc->plane = !pipe;
> > > > > }
> > > > >
> > > > > + init_waitqueue_head(&intel_crtc->vbl_wait);
> > > > > +
> > > > > BUG_ON(pipe >= ARRAY_SIZE(dev_priv->plane_to_crtc_mapping) ||
> > > > > dev_priv->plane_to_crtc_mapping[intel_crtc->plane] != NULL);
> > > > > dev_priv->plane_to_crtc_mapping[intel_crtc->plane] = &intel_crtc->base;
> > > > > diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
> > > > > index 318bb4c..e539550 100644
> > > > > --- a/drivers/gpu/drm/i915/intel_drv.h
> > > > > +++ b/drivers/gpu/drm/i915/intel_drv.h
> > > > > @@ -376,6 +376,9 @@ struct intel_crtc {
> > > > > /* watermarks currently being used */
> > > > > struct intel_pipe_wm active;
> > > > > } wm;
> > > > > +
> > > > > + wait_queue_head_t vbl_wait;
> > > > > + bool vbl_received;
> > > > > };
> > > > >
> > > > > struct intel_plane_wm_parameters {
> > > > > diff --git a/drivers/gpu/drm/i915/intel_sprite.c b/drivers/gpu/drm/i915/intel_sprite.c
> > > > > index ed9fa7c..1ffa7ba 100644
> > > > > --- a/drivers/gpu/drm/i915/intel_sprite.c
> > > > > +++ b/drivers/gpu/drm/i915/intel_sprite.c
> > > > > @@ -37,6 +37,79 @@
> > > > > #include <drm/i915_drm.h>
> > > > > #include "i915_drv.h"
> > > > >
> > > > > +static unsigned int usecs_to_scanlines(const struct drm_display_mode *mode, unsigned int usecs)
> > > > > +{
> > > > > + /* paranoia */
> > > > > + if (!mode->crtc_htotal)
> > > > > + return 1;
> > > > > +
> > > > > + return DIV_ROUND_UP(usecs * mode->crtc_clock, 1000 * mode->crtc_htotal);
> > > > > +}
> > > > > +
> > > > > +static void intel_pipe_update_start(struct intel_crtc *crtc)
> > > > > +{
> > > > > + struct drm_device *dev = crtc->base.dev;
> > > > > + const struct drm_display_mode *mode = &crtc->config.adjusted_mode;
> > > > > + enum pipe pipe = crtc->pipe;
> > > > > + /* FIXME needs to be calibrated sensibly */
> > > > > + unsigned int min = mode->crtc_vblank_start - usecs_to_scanlines(mode, 100);
> > > > > + unsigned int max = mode->crtc_vblank_start - 1;
> > > > > + long timeout = msecs_to_jiffies_timeout(1);
> > > > > + unsigned int scanline;
> > > > > +
> > > > > + WARN_ON(!mutex_is_locked(&crtc->base.mutex));
> > > > > +
> > > > > + if (WARN_ON(drm_vblank_get(dev, pipe)))
> > > > > + return;
> > > > > +
> > > > > + local_irq_disable();
> > > > > +
> > > > > + /*
> > > > > + * Explicit compiler barrier is used to make sure that vbl_received
> > > > > + * store happens before reading the current scanline, otherwise
> > > > > + * something like this this might happen:
> > > > > + *
> > > > > + * 1. scanline = intel_get_crtc_scanline();
> > > > > + * 2. vblank irq -> vbl_received = true
> > > > > + * 3. vbl_received = false
> > > > > + *
> > > > > + * Which could make us miss the fact that we already crossed into
> > > > > + * vblank even though the scanline indicates that we're somewhere
> > > > > + * just before the start of vblank.
> > > > > + *
> > > > > + * wake_up() vs. wait_event_timeout() imply the necessary memory
> > > > > + * barries to make sure wait_event_timeout() sees the correct
> > > > > + * value of vbl_received after waking up.
> > > > > + */
> > > > > + crtc->vbl_received = false;
> > > > > + barrier();
> > > > > + scanline = intel_get_crtc_scanline(&crtc->base);
> > > >
> > > > Ok, took a bit longer but here's it. I think we need a bit more polish
> > > > with this one. My thoughts about all this:
> > > >
> > > > - Definitely an s/barrier/smp_*/ required. Yeah, on x86 with the total
> > > > store order it's ok, but a big reason for adding barriers and comments
> > > > is documentation. And a big reason for that is to deter people from too
> > > > many lockless tricks ;-)
> > >
> > > I still don't see why you'd need an smp_ memory barrier here. It's just
> > > one regular store vs. mmio read ordering issue we have here.
> > >
> > > If we'd really need to worry about hardware reordering the operations,
> > > then it would seem that we'd need a mandatory memory barrier instead
> > > of an smp one.
> >
> > Hm yeah, full barrier makes more sense. I'd also throw a full barrier in
> > _before_ we reset vbl_received in the interrupt handling code.
> >
> > Also a style nitpick with barrier comments is to have it a both places and
> > reference the place of the other one. So I think we should throw in an
> > another comment in intel_pipe_handle_vblank. Actually two to mention that
> > the wake_up implicit parrier syncs with the implicit barrier in wait_event
> > in intel_pipe_update_start. With that I'll shut up about my nagging ;-)
>
> I still think a compiler barrier is all we need here. We only have one
> piece of shared data, and any ordering restriction we have is strictly
> within one CPU (vbl_received=false store vs. scanline read), so normal
> cache coherency is enough to handle the irq handler vs. waiter situation.
>
> I could be convinced to use mb() between the vbl_received store and
> scanline read for paranoia's sake, but adding a mb() to the irq handler
> I don't get at all. I suppose I could add it there just for fun, but then
> then the comment would have to be:
>
> /* danvet's memory barrier */
>
> since I wouldn't know what else to write ;)
Wrt the compiler barrier: The subsequent mmio write already clobbers state
and so serves as a compiler barrier iirc. So adding another one won't
change anything. But we not only need to order the write against the mmio
read, we also need to make sure that the write lands in-order wrt the
write from the interrupt handler. Hence smp barriers in both places.
I know that on x86 the hw can't reorder stuff that badly anyway, so this
is purely documentation. Or maybe I'm just paranoid. In any case I think
either no barrier (and the comment still there) or more barriers ...
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
More information about the Intel-gfx
mailing list