[Intel-gfx] [PATCH 3.0-rc3] i915: Fix gen6 (SNB) GPU stalling

Wed Jun 15 05:24:38 CEST 2011

On 15 June 2011 10:06, Eric Anholt <eric at anholt.net> wrote:
> On Wed, 15 Jun 2011 00:51:47 +0800, Daniel J Blueman <daniel.blueman at gmail.com> wrote:
>> On 14 June 2011 13:23, Eric Anholt <eric at anholt.net> wrote:
>> > On Tue, 14 Jun 2011 12:18:36 +0800, Daniel J Blueman <daniel.blueman at gmail.com> wrote:
>> >> Hi Eric,
>> >>
>> >> The frequent ~1.5s pauses I hit with SNB hardware in the gnome3 UI (eg
>> >> whenever you hit the top-left of the screen to show all windows) are
>> >> nicely addressed by your recent wake patch [1] (ported to -rc3). Thus
>> >> I see no 'missed IRQ' kernel messages.
>> >>
>> >> As this addresses a significant usability regression, are you happy to
>> >> add it to the 3.0-rc queue? I think it has very good value in -stable
>> >> also (assuming correctness). What do you think?
>> >
>> > This one had significant performance impacts, and later hacks in this
>> > series worked around the problem to approximately the same level of
>> > success with less impact, and we don't actually have a justification of
>> > why any of them work.  We were still hoping to come up with some clue,
>> > and haven't yet.
>>
>> True; that is quite heavy handed delay looping.
>>
>> It's a pity the usual Intel font didn't make it to the programmer's
>> reference manuals. Anyway, unmasking the blitter user interrupt in the hardware
>> status mask register addresses the root cause. Out of reset it's FFFFFFFFh,
>> so we don't need to read it here.
>>
>> It would be good to get this into -rc4. -stable probably needs some additional
>> tweaks.
>>
>> Signed-off-by: Daniel J Blueman <daniel.blueman at gmail.com>

> So you're saying that our interrupts at the top-level IMR are triggered
> by the write to the status page of the lower-level ring?  That's
> surprising to me.  Or do you think that this write is just happening to
> trigger serialization so the interrupt comes after the DWORD write of
> the seqno by the ring?  (hw folks just recently indicated that our
> particular code is not expected to serialize the interrupt after the
> seqno store, unless we had an MI_FLUSH_DWORD in between)

When ISR bits not masked by the hardware status mask register change,
a write is generated with the ISR contents to the status page, so I
believe that the blitter command streamer wasn't generating an
interrupt when an MI_USER_INTERRUPT command was issued. The interrupt
handler would kick in for other interrupts, read all the IIRs and
notice the bit set and wake the ring interrupt queue anyway.

I guess we could test this by observing that the BLT user interrupt
IIR bit is always not set on it's own (ie another interrupt woke us)
in the interrupt handler.

Thanks,
  Daniel

> This patch has now passed 7000 iterations of the testcase that had a
> ~.5% failure rate before.
>
> Tested-by: Eric Anholt <eric at anholt.net>
-- 
Daniel J Blueman