[Intel-gfx] [PATCH 2/3] drm/i915: close PM interrupt masking races in the rps work func

Sun Sep 4 22:10:30 CEST 2011

On Sun, Sep 04, 2011 at 07:56:57PM +0000, Ben Widawsky wrote:
> On Sun, Sep 04, 2011 at 09:26:48PM +0200, Daniel Vetter wrote:
> > On Sun, Sep 04, 2011 at 10:08:17AM -0700, Ben Widawsky wrote:
> > > diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> > > index 55518e3..3bc1479 100644
> > > --- a/drivers/gpu/drm/i915/i915_irq.c
> > > +++ b/drivers/gpu/drm/i915/i915_irq.c
> > > @@ -415,12 +415,7 @@ static void gen6_pm_rps_work(struct work_struct *work)
> > >         gen6_set_rps(dev_priv->dev, new_delay);
> > >         dev_priv->cur_delay = new_delay;
> > >  
> > > -       /*
> > > -        * rps_lock not held here because clearing is non-destructive. There is
> > > -        * an *extremely* unlikely race with gen6_rps_enable() that is prevented
> > > -        * by holding struct_mutex for the duration of the write.
> > > -        */
> > > -       I915_WRITE(GEN6_PMIMR, pm_imr & ~pm_iir);
> > > +       I915_WRITE(GEN6_PMIMR, pm_imr & dev_priv->pm_iir);
> > >         mutex_unlock(&dev_priv->dev->struct_mutex);
> > >  }
> > 
> > For this to work we'd need to hold the rps_lock (to avoid racing with the
> > irq handler). But imo my approach is conceptually simpler: The work func
> > grabs all oustanding PM interrupts and then enables them again in one go
> > (protected by rps_lock).
> 
> I agree your approach is similar, but I think we should really consider
> whether my approach actually requires the lock. I *think* it doesn't. At
> least in my head my patch should fix the error you spotted. I don't
> know, maybe I need to think some more.

1. rps work reads dev_priv->pm_iir (anew in the line you've added).
2. irq handler runs, adds a new bit to dev_priv->pm_iir and sets PMIMR to
dev_priv->pm_iir (under irqsafe rps_lock).
3. rps work writes crap to PMIMR.

I.e. same race, you've just dramatically reduced the window ;-)

> The reason I worked so hard to avoid doing it the way you did in my
> original implementation is I was trying really hard to not break the
> cardinal rule about minimizing time holding spinlock_irqs. To go with
> the other patch, you probably want a POSTING_READ also before releasing
> the spin_lock (though I think this is being a bit paranoid).

There POSTING_READ was to order the PMIMR write with the PMIIR write (both
in the irq handler). There's no such ordering here (and the irq handler
can't be interrupted) so I think we're save.

-Daniel
-- 
Daniel Vetter
Mail: daniel at ffwll.ch
Mobile: +41 (0)79 365 57 48