[Intel-gfx] [PATCH v4] drm/i915: avoid processing spurious/shared interrupts in low-power states

Daniel Vetter daniel at ffwll.ch
Tue Feb 24 02:42:20 PST 2015


On Tue, Feb 24, 2015 at 09:29:15AM +0000, Chris Wilson wrote:
> On Tue, Feb 24, 2015 at 11:14:30AM +0200, Imre Deak wrote:
> > Atm, it's possible that the interrupt handler is called when the device
> > is in D3 or some other low-power state. It can be due to another device
> > that is still in D0 state and shares the interrupt line with i915, or on
> > some platforms there could be spurious interrupts even without sharing
> > the interrupt line. The latter case was reported by Klaus Ethgen using a
> > Lenovo x61p machine (gen 4). He noticed this issue via a system
> > suspend/resume hang and bisected it to the following commit:
> > 
> > commit e11aa362308f5de467ce355a2a2471321b15a35c
> > Author: Jesse Barnes <jbarnes at virtuousgeek.org>
> > Date:   Wed Jun 18 09:52:55 2014 -0700
> > 
> >     drm/i915: use runtime irq suspend/resume in freeze/thaw
> > 
> > This is a problem, since in low-power states IIR will always read
> > 0xffffffff resulting in an endless IRQ servicing loop.
> > 
> > Fix this by handling interrupts only when the driver explicitly enables
> > them and so it's guaranteed that the interrupt registers return a valid
> > value.
> > 
> > Note that this issue existed even before the above commit, since during
> > runtime suspend/resume we never unregistered the handler.
> > 
> > v2:
> > - clarify the purpose of smp_mb() vs. synchronize_irq() in the
> >   code comment (Chris)
> > 
> > v3:
> > - no need for an explicit smp_mb(), we can assume that synchronize_irq()
> >   and the mmio read/writes in the install hooks provide for this (Daniel)
> > - remove code comment as the remaining synchronize_irq() is self
> >   explanatory (Daniel)
> 
> Why make the assumption though? I thought the comments documenting the
> interaction between the irq enablements, the irq handler and shared
> interrupts was beneficial. At the very least the assumption should be
> made explicit through comments in the code - because I am not convinced
> that a cached write will be flushed by an uncached write to another area
> of memory. In particular, note that on the gen most troubled by rogue
> irqs (gen4), we do not have any memory barriers in the mmio paths.

The synchronize_irq is a fairly massive barrier and I've figured the name
is descriptive enough to make clear what's going on. At least I've felt
any comment on top would be redundant.

Also the hard rule for adding comments to explicit barriers is mostly a
reminder that you always need barriers on both sides, and the comment
then must explain where the other side is in the code. Imo with
synchronize_irq it's clear that the other side is the irq handler already.

What do you want to clarify on top of that in the comment?
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch


More information about the Intel-gfx mailing list