[Intel-gfx] [PATCH] drm/i915: make sure power domains don't get disabled during error capture

Imre Deak imre.deak at intel.com
Thu Nov 28 11:52:07 CET 2013


On Thu, 2013-11-28 at 11:05 +0100, Daniel Vetter wrote:
> On Thu, Nov 28, 2013 at 10:04:42AM +0200, Imre Deak wrote:
> > So far we had a race during error capturing where a power domain could
> > get disabled after we queried its state to be on. Fix this by protecting
> > the power domain state tracking changes with a lock and holding this
> > lock during error capturing.
> > 
> > Note that this still allows the case where we are halfway in enabling a
> > power domain and still report it as disabled. This isn't a problem as we
> > only want to avoid register accesses while the domain is off and that is
> > now guaranteed.
> > 
> > Signed-off-by: Imre Deak <imre.deak at intel.com>
> > 
> > [ Applies on top of "drm/i915: add intel_display_power_enabled_sw() for
> > use in atomic ctx" ]
> 
> Tbh I don't see the point for this - the error capture code is inherently
> racy, we don't grab any of the other modeset locks. We also don't care and
> furthermore we should try to grab as few locks as possible to increase the
> chances that we can actually capture the error state successfully. Every
> lock you add adds another depency and so more ways to end in tears and
> deadlocks.
>
> So if the racy error capture is the only reason I'll reject this.

Yes it is only the race this fixes, but I'd like to argue for its
usefulness. It fixes in particular the following two things:

1. avoiding unclaimed register accesses, which was the original goal of 
    
commit 9d1cb9147dbe45f6e94dc796518ecf67cb64b359
Author: Paulo Zanoni <paulo.r.zanoni at intel.com>
Date:   Fri Nov 1 13:32:08 2013 -0200

drm/i915: avoid unclaimed registers when capturing the error state

2. avoiding an inconsistent power on/off value in the error state wrt.
to the captured registers. Without the lock we can have an error state
showing the power was on, but register values captured with power off,
with some unpredictable values. This consistency check was the whole
point of adding the power on/off value to the error state.

I understand that adding complexity in general is risky, but in this
case the risk is minimal. The only place we take the lock is to adjust a
counter and to read HW registers, without any further nested locks, so
no chance for lock inversions.

--Imre





More information about the Intel-gfx mailing list