[Intel-gfx] Another GPU hang

Sun May 22 00:47:01 CEST 2011

On Sat, May 21, 2011 at 10:45:24PM +0100, Chris Wilson wrote:
> On Sun, 22 May 2011 00:08:10 +0300, Dmitry Nezhevenko <dion at inhex.net> wrote:
> > Hi,
> > 
> > I'm not sure why I'm getting this. I've performed a lot of configuration
> > changes, so can't be sure, which one causes. I've switched to amd64
> > distro and upgraded everything to latest debian unstable. Also I'm on
> > 2.6.39 now.
> > 
> > I'm getting such hang sometimes just after unplugging HDMI cable from
> > laptop. 
> 
> Hmm. Are you absolutely sure? I've an open bug 35576, but nothing that
> indicated a correlation with modeswitching. The hang would appear to be
> due to waiting on a dead pipe then. We've had a long history with such
> bugs and purposely changing modes and so are now careful to flush any
> pending waits from userspace and in KMS before modeswitching. This raises
> the question that maybe there is a window for a hotplug event to turn-off
> the pipe before userspace has finished flushing its queue of pending ops.

I've just tried to reproduce it again. It looks like I'm unable to do it when
external display (DVI connected via DMI->HDMI cable) is turned off via xrandr. 

However if display is active (LVDS turned off and HDMI1 active) it was
easy enough to reproduce this by just playing with HDMI connector. So
after 5-6 attempts to plug/unplug monitor I've got hung again:

[38440.632948] [drm:drm_edid_block_valid] *ERROR* EDID checksum is invalid, remainder is 81
[38440.632956] [drm:drm_edid_block_valid] *ERROR* Raw EDID:
[38440.632962] <3>00 ff ff ff ff ff ff 00 38 a3 8c 67 01 01 01 01  ........8..g....
[38440.632968] <3>07 14 01 03 80 34 20 78 ea fc 85 a4 55 4d 9d 25  .....4 x....UM.%
[38440.632973] <3>12 50 54 bf ef 80 81 c0 81 80 90 40 8b c0 95 00  .PT........ at ....
[38440.632978] <3>a9 40 b3 00 d1 00 28 3c 80 a0 70 b0 23 40 30 20  . at ....(<..p.#@0
[38440.632982] <3>36 00 06 44 21 00 00 1a 00 00 00 fd 00 32 55 1f  6..D!........2U.
[38440.632987] <3>5c 11 00 0a 20 20 20 20 20 20 00 00 00 fc 00 4c  \...      .....L
[38440.632992] <3>43 44 32 34 39 30 57 55 58 69 32 0a 00 00 00 ff  CD2490WUXi2.....
[38440.632997] <3>00 30 32 33 30 32 34 37 31 55 4f ff ff ff ff ff  .02302471UO.....
[38440.633000]
[38461.596072] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[38461.596083] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
[38461.597995] [drm:i915_wait_request] *ERROR* i915_wait_request returns -11 (awaiting 2888267 at 2888261, next 2888268)
[38462.100045] [drm:i915_reset] *ERROR* Failed to reset chip.

> The important question is do you see this 0x01820000 at other times?
> -Chris

You are asking about "IPEHR:" line in i915_error_state file, right? Now it was
IPEHR: 0x01800002.

I've uploaded full file to:
http://dion.org.ua/uploads/2011/05/2.6.39_intel_hung_2.txt

As about other hungs, some times ago there were hungs while playing video using
mplayer -vo xv.  I've asked about it here and you replied that it's fixed in
commit:

commit 23f9b14df7c102c1036134835dd5d1a508059858
Author: Chris Wilson <chris at chris-wilson.co.uk>
Date:   Sat Feb 12 10:42:34 2011 +0000                                                                                                                                                                                                  

    i965: Remove broken maximum base addresses from video 

So after upgrading to proper debian package everything was OK. At least
currently I don't remember any hung except display plug/unplug

-- 
WBR, Dmitry