[Intel-gfx] [PATCH 06/43] drm/i915: protect force_wake_(get|put) with the gt_lock

Daniel Vetter daniel.vetter at ffwll.ch
Wed Jan 4 18:11:18 CET 2012


On Wed, Jan 4, 2012 at 00:33, Keith Packard <keithp at keithp.com> wrote:
> On Tue, 3 Jan 2012 22:49:52 +0100, Daniel Vetter <daniel.vetter at ffwll.ch> wrote:
>
>> Nope, current hangcheck blows up, and we have an i-g-t testcase for it
>> (which the commit msg clearly states). There are also numerous bug
>> reports where a dying gpu results in tons of
>> WARN_ON(!mutex_locked(dev->struct_mutex)) noise in dmesg (which drowns
>> out the gpu hang warning). The locking change fixes this.
>
> Ah, ok, that makes sense. Of course, hangcheck *could* have just taken
> struct_mutex were it run in a suitable context.

Nope, we cannot move the hangcheck into process context by using a
delayed work item and then grabbing struct_mutex. If the gpu is dead,
we usually have a task stuck waiting for it and already holding
struct_mutex. It is *absolutely* imperial that the hangcheck and error
state capture code do not block on anything that the i915 gem code
might hold onto.

>> The patch adds the required locking to i915_reset.
>
> No, the spinlock protects the forcewake_count access and not the actual
> register access, which leaves all kinds of potential for races in
> threads not also holding struct_mutex while accessing registers.

Ah, I think I see you're concern: Between the time we reset the gpu
and the time we fix up the forcewake state somebody might sneak in and
see an inconstency between our tracking and the actual hw state, hence
reading garbage. Correct?

> If you want a spinlock to protect the register access, it must surround
> the whole operation.

Between the time the hangcheck declares the gpu dead and the time we
deem it officially resurrected at the end of i915_reset there's no
issue with returning garbage from register writes - after all, the gpu
just went down.

The only thing we have to take care of is that we don't leave behind
an inconsistent state after i915_reset, which the current locking in
my patch takes care of.

Hence I think that no further protection is required.
-Daniel
-- 
Daniel Vetter
daniel.vetter at ffwll.ch - +41 (0) 79 365 57 48 - http://blog.ffwll.ch



More information about the Intel-gfx mailing list