On 18 May 2011 19:04, Chris Wilson <chris at chris-wilson.co.uk> wrote:
> On Wed, 18 May 2011 12:38:44 +0100, Daniel J Blueman <daniel.blueman at gmail.com> wrote:
>> Hi Chris et al,
>> On my Sandy Bridge GPU (8086:0126 rev 09) laptop, I often see hangs
>> that are correctly recovered and sometimes ones which aren't (causing
>> X lockup or kernel hard lock), hurting usability.
>> I'm able to reproduce GPU hangs often with the composite tests in
>> rendercheck (may need to restart a few times):
>> $ ./rendercheck -t composite,cacomposite
>> Begin composite mask test on a8
>> <command hang, maybe GPU hang too>
> Reproducing use rendercheck is unusual, as that is something that we do
> run frequently.
> Looking through the i915_error_state made me wince (lots of 1x1 copies
> over the same pixel...) but it does remind me of
>  https://bugzilla.kernel.org/show_bug.cgi?id=27892
> for which the workaround is to flush the caches after every op (Option
> "DebugFlushCaches" "True").

After considerable time with 'DebugFlushCaches' enabled, I still hit a
crash. With just 'DebugFlushBatches', it's solid so far (>20 sigma).
Sometimes, I still see the hangcheck timer get hit [1] but nothing

This is quite a heavy-handed workaround and is instrumental to
stability, so it would be really good going to find a more specific
workaround that will be enabled by default.


--- [1]

[   43.015860] [drm:i915_hangcheck_ring_idle] *ERROR* Hangcheck timer
elapsed... blt ring idle [waiting on 8471, at 8471], missed IRQ?
[30413.917749] [drm:i915_hangcheck_ring_idle] *ERROR* Hangcheck timer
elapsed... blt ring idle [waiting on 1390703, at 1390703], missed IRQ?
[33033.624549] [drm:i915_hangcheck_ring_idle] *ERROR* Hangcheck timer
elapsed... blt ring idle [waiting on 4186964, at 4186964], missed IRQ?
