[Bug 42678] [3.3-rc1] radeon stuck in kernel after lockup

bugzilla-daemon at bugzilla.kernel.org bugzilla-daemon at bugzilla.kernel.org
Fri Feb 24 00:42:57 PST 2012


https://bugzilla.kernel.org/show_bug.cgi?id=42678





--- Comment #8 from Torsten Kaiser <just.for.lkml at googlemail.com>  2012-02-24 08:42:55 ---
I think, you're not getting away with blaming userspace. ;-)

But this issue is rather complicated, because there is more then one bug /
change involved.

To summarize the issues:

* a change in xf86-video-ati-6.14.2 -> 6.14.3: That was the initial trigger for
the GPU lockup messages on my system. While this changes was partly buggy (This
has now been fixed, but I think that fix is not released yet) it was merely a
trigger for a kernel bug.
"Prove" that 6.14.3 is to blame for this:
6.14.2 + kernel 3.1 -> no GPU lockup messages
6.14.3 + kernel 3.1 -> first GPU lockup messages
also downgrading to 6.14.2 no longer showed this with later kernels
"Prove" that the real bug causing these lockups was a kernel bug:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=1b61925061660009f5b8047f93c5297e04541273
-> with this change 6.14.3 can no longer trigger GPU lockups

* the kernel bug causing GPU lockups -> wrong DESKTOP_HEIGHT setup.
That was probably always triggerable from userspace, but only the changes in
6.14.3 made this bug visible.
This is fixed with above commit 1b61925061660009f5b8047f93c5297e04541273
This bug is not this regression wrt. 3.3-rcX, as I was seeing this since 3.1

* first regression in 3.3-rc1: mutex deadlock that you have already fixed.
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=9fc04b503df9a34ec1a691225445c5b7dfd022e7

* a second, still open regression in 3.3-rc1 that had been masked by the first
regression: Even with the mutex fix applied to the kernel (i.e. 3.3-rc2) X was
still failing to recover from the GPU lockups. See comment #3
This is the issue, why I would still consider this bug (42678) to be still
open.

And I think this is a kernel regression and not a userspace issue because:
* 6.14.3 (with the GPU lockup trigger) and 3.2 (with the GPU lockup bug) will
cause the GPU lockup messages in dmesg, but I did not even notice this was
happening at all, because X was always able to recover without noticeable
effects.
* the same userspace (6.14.3 with the trigger) and 3.3-rc2 (still with the GPU
lockup bug, but without the mutex deadlock) will trigger the GPU lockup
messages in dmesg, but X will be stuck in the kernel and fail to turn my
monitors back on.
So I think the stuck X process is caused by the kernel changes between 3.2 and
3.3-rc2.

Since 3.3-rc3 X did not get stuck again, but this is because the underlying
kernel GPU lockup bug has been fixed, so there never was a need to recover and
any recovery bug could no longer be triggered.

Does this description of the issues involved make sense for you? Please ask, if
I was unclear or messed up my explanation.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.


More information about the dri-devel mailing list