[Bug 63914] [hsw hiz] Cycling between GL/X rendering causes a hard hang

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Thu Jun 27 14:32:48 PDT 2013


https://bugs.freedesktop.org/show_bug.cgi?id=63914

Paul Berry <stereotype441 at gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|stereotype441 at gmail.com     |intel-gfx-bugs at lists.freede
                   |                            |sktop.org

--- Comment #31 from Paul Berry <stereotype441 at gmail.com> ---
Ok, here's what I've found:

- I've reproduced the bug 7 times, with the amount of time to failure varying
wildly (I've seen 3m*, 8m*, 11m*, 16m, 2h48m, and 4h03m, and one failure where
I failed to record the amount of time).

- Note that the times shown above with "*" occurred today, after I upgraded the
BIOS on my HSW ULT from version 113 to 126 (and upgraded the KSC EC version
from 1.20 to 1.24).  The others occurred earlier in the week.  This makes me
suspicious that the problem may be BIOS-related, since the failures seem to be
more frequent since the BIOS upgrade.

- Each time I've reproduced the bug I was running with vblank_mode=0.

- I've reproduced the bug both with the stock Arch kernel (3.9.7-1) and with
drm-intel-nightly (00b224eee).

- I've reproduced the bug with "iommu=off" on the kernel command line.

- Each time the bug occurs, the computer locks up completely: you can't switch
VT's, pressing NumLock fails to toggle the NumLock light, and the machine is
unresponsive via ssh.  This is not a simple GPU hang.

- Contrary to comment 16, DPMS does not seem to be involved: I tried running
"while true; do sleep 10; xset dpms force off; sleep 10; xset dpms force on;
done" in parallel with the script, and it did not noticeably increase the rate
of failure.

- I also investigated whether this might be a thermal issue.  One of my
failures occurred with a very hot CPU (due to my not setting up fan settings
correctly after a BIOS upgrade), but another occurred at a temperature of 61C,
which is well within normal operating range.  So I believe it is not a thermal
issue.


At this point I believe this is most likely a kernel, BIOS, or hardware bug,
and it needs investigation by someone with kernel expertise.  I haven't seen
any evidence that it's related to Mesa (which is where my expertise lies).  So
I'm reassigning to intel-gfx-bugs at lists.freedesktop.org.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/intel-gfx-bugs/attachments/20130627/7ad5c832/attachment.html>


More information about the intel-gfx-bugs mailing list