[Intel-gfx] [PATCH] drm/i915: make sure power domains don't get disabled during error capture

Daniel Vetter daniel at ffwll.ch
Thu Nov 28 14:34:07 CET 2013


On Thu, Nov 28, 2013 at 12:52:07PM +0200, Imre Deak wrote:
> On Thu, 2013-11-28 at 11:05 +0100, Daniel Vetter wrote:
> > On Thu, Nov 28, 2013 at 10:04:42AM +0200, Imre Deak wrote:
> > > So far we had a race during error capturing where a power domain could
> > > get disabled after we queried its state to be on. Fix this by protecting
> > > the power domain state tracking changes with a lock and holding this
> > > lock during error capturing.
> > > 
> > > Note that this still allows the case where we are halfway in enabling a
> > > power domain and still report it as disabled. This isn't a problem as we
> > > only want to avoid register accesses while the domain is off and that is
> > > now guaranteed.
> > > 
> > > Signed-off-by: Imre Deak <imre.deak at intel.com>
> > > 
> > > [ Applies on top of "drm/i915: add intel_display_power_enabled_sw() for
> > > use in atomic ctx" ]
> > 
> > Tbh I don't see the point for this - the error capture code is inherently
> > racy, we don't grab any of the other modeset locks. We also don't care and
> > furthermore we should try to grab as few locks as possible to increase the
> > chances that we can actually capture the error state successfully. Every
> > lock you add adds another depency and so more ways to end in tears and
> > deadlocks.
> >
> > So if the racy error capture is the only reason I'll reject this.
> 
> Yes it is only the race this fixes, but I'd like to argue for its
> usefulness. It fixes in particular the following two things:
> 
> 1. avoiding unclaimed register accesses, which was the original goal of 
>     
> commit 9d1cb9147dbe45f6e94dc796518ecf67cb64b359
> Author: Paulo Zanoni <paulo.r.zanoni at intel.com>
> Date:   Fri Nov 1 13:32:08 2013 -0200
> 
> drm/i915: avoid unclaimed registers when capturing the error state

Hm, I've thought we won't hit any unclaimed register warnings if there's
no concurrent/racing other thread doing stuff to the power wells?

And as explained I don't care that much about racing hangcheck ...

> 2. avoiding an inconsistent power on/off value in the error state wrt.
> to the captured registers. Without the lock we can have an error state
> showing the power was on, but register values captured with power off,
> with some unpredictable values. This consistency check was the whole
> point of adding the power on/off value to the error state.
> 
> I understand that adding complexity in general is risky, but in this
> case the risk is minimal. The only place we take the lock is to adjust a
> counter and to read HW registers, without any further nested locks, so
> no chance for lock inversions.

The thing is that we already have tons of other similar races, e.g. we
could capture hilariously incosistent modeset state where the pfit is
disabled but pipesrc already set to something that would need upscaling.
And it's not a problem. So I still think we can just ignore those races,
even more we /should/ do so.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch



More information about the Intel-gfx mailing list