[Intel-gfx] [PATCH 1/2] drm/i915: fix race when clearing RPS IIR bits

Imre Deak imre.deak at intel.com
Tue Mar 24 13:52:32 PDT 2015


On Tue, 2015-03-24 at 09:24 +0000, Chris Wilson wrote:
> On Tue, Mar 24, 2015 at 10:14:03AM +0100, Daniel Vetter wrote:
> > On Mon, Mar 23, 2015 at 09:10:15PM +0000, Chris Wilson wrote:
> > > On Mon, Mar 23, 2015 at 07:11:34PM +0200, Imre Deak wrote:
> > > > When disabling RPS interrupts there is a race where we disable RPS
> > > > inerrupts while the interrupt handler is running and the handler has
> > > > already latched the pending RPS interrupt from the master IIR register.
> > > > Afterwards the disabling path clears the PM IIR bits, making the state
> > > > of pending interrupts inconsistent from the interrupt handler's point of
> > > > view. This triggers the following warning: "The master control interrupt
> > > > lied (PM)!".
> > > > 
> > > > To fix this make sure that any running interrupt handler (which may
> > > > have already latched the master IIR) finishes before clearing the IIR
> > > > bits.
> > > > 
> > > 
> > > Isn't this overkill for what is just a bogus WARN? If the WARN is a
> > > logical consequence of the code, let's just remove the WARN.
> > > 
> > > Or iow can you not find a cheaper way to fix this?
> > 
> > We only run this on suspend/resume afaik, overhead should be acceptable.
> > And we've had that much overhead before we've done all the runtime pm
> > unification, it's still less synchronization than disabling interrupts
> > completely.
> 
> Hmm, I thought this was in conjunction with RPS pm masking (i.e. fired
> everytime we no longer expect to receive RPS interrupts).

Yes, it affects only the driver loading (where it's a nop) and the
runtime/system suspend path.

> If it is only the infrequent, then yeah I can't complain too much - I
> still think it is slightly fishy, but I can accept that it is just a
> quirk of the buffering the interrupt does.

The reason for sync_irq is to prevent any subsequent
gen6_enable_rps_interrupts() call to trigger spurious interrupts.
Without it we could have:

CPU 0                               CPU 1
gen8_gt_irq_handler()
tmp = I915_READ(GEN8_GT_IIR(2))
<tmp has pending RPS bits>
                                    gen6_disable_rps_interrupts()
                                    gen6_reset_rps_interrupts()
                                    <clear GEN8_GT_IIR(2)>
                                    gen6_enable_rps_interrupts()
                                 
gen6_rps_irq_handler(tmp)
<handle the now stale RPS events>

Now I admit the above is quite unlikely, or even impossible due to what
happens between gen6_disable_rps_interrupts() and
gen6_enable_rps_interrupts() in our code atm. I still thought it's safer
not to rely on those.

One alternative would have been to extend the irq_lock in the interrupt
handler, so that we read GEN8_GT_IIR(2) and handle any events in it in
one critical section, but that would add the overhead where it actually
matters.

--Imre




More information about the Intel-gfx mailing list