[Intel-gfx] [PATCH 4/4] drm/i915: Avoid race of intel_crt_detect_hotplug() with HPD interrupt

Daniel Vetter daniel at ffwll.ch
Wed Sep 2 07:46:19 PDT 2015


On Wed, Sep 02, 2015 at 04:19:00PM +0200, Egbert Eich wrote:
> Daniel Vetter writes:
>  > On Tue, Sep 01, 2015 at 10:21:35PM +0200, Egbert Eich wrote:
>  > > A HPD interrupt may fire during intel_crt_detect_hotplug() - especially
>  > > when HPD interrupt storms occur.
>  > > Since the interrupt handler changes the enabled interrupt lines when it
>  > > detects a storm this races with intel_crt_detect_hotplug().
>  > > To avoid this, shiled the rmw cycles with IRQ save spinlocks.
>  > > 
>  > > Signed-off-by: Egbert Eich <eich at suse.de>
>  > 
>  > I think this only reduces one source of such races, but fundamentally we
>  > can't avoid them. E.g. if you're _very_ unlucky you might cause a real hpd
>  > right when the strom code frobs around. Plugging the race with this known
> 
> This is exactly the scenatio I'm getting here. I get HPD interrupts at an 
> order of 10^4 / sec.
> 
>  > interrupt source here doesn't fix the fundamental issue.
>  > 
>  > Also I think the actual interrupt deliver is fairly asynchronous, both in
>  > the hardware and in the sw handling. E.g. spin_lock_irq does not
>  > synchronize with the interrupt handler on SMP systems, it only guarantees
>  > that it's not running concurrently on the same cpu (which would deadlock).
>  > 
>  > I think fundamentally this race is unfixable.
> 
> 
> There is one important race we avoid with this patch: It is that
> we change the PORT_HOTPLUG_EN concurrently in the interrupt handler
> (thru xxx_hpd_irq_setup() and in intel_crt_detect_hotplug()).
> 
> With the test system that I have here the old version of the code
> easily runs into this as the time spent inside intel_crt_detect_hotplug() 
> is quite long - especially when no CRT is connected.
> 
> What happens intel_crt_detect_hotplug() reads the value of PORT_HOTPLUG_EN
> at entry, then frobs around for ages, during this time several HPD interrupts
> occur, the storm is detected, the bit related to the stormy line is unset
> then after ages intel_crt_detect_hotplug() decides to give up and restores
> the value it read on entry.
> 
> To remedy this this patch reads the value of PORT_HOTPLUG_EN whenever 
> it is going to change it and adds the spin locks to make sure the 
> read-modify-write cycles don't happen concurrently.
> 
> PORT_HOTPLUG_EN is only touched in two places: in xxx_hpd_irq_setup()
> and in intel_crt_detect_hotplug(), the former can be called from an
> interrupt handler.
> 
> Not sure why you see a problem here.

Hm I missed that this same register is also accessed by the irq handler
code, and it's not just that touching these bits can cause interrupts. So
yeah we need your patch, but it needs to be clearer in the commit message
that there's also trouble with concurrent register access to
PORT_HOTPLUG_EN.

Also I think a commen in the code why we grab that spinlock would be good.
For that extracting a small helper to manipulate the register (like we do
with other irq mask registers with functions like ilk_update_gt_irq) would
be good - then we have just one place to put that commment.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


More information about the Intel-gfx mailing list