Radeon 9200 hangs: How to debug?
Randall Nortman
xorglist at wonderclown.org
Thu Nov 11 17:45:13 PST 2004
As suggested, the next time this hang occured (about an hour ago), I
logged in remotely as root, ran gdb, and attached to the process. A
backtrace showed that it was sitting in ioctl() in libc6.so. Above
that on the call stack was xf86ioctl(), and everything above that was
obscured because I don't have debugging symbols compiled in. Keep in
mind that CPU utilization is at 100% (all of it in the X process) when
the hang occurs. As soon as I attached to the process in gdb, CPU
utilization dropped to normal levels.
So much for the debugger. I suppose if I compile in debugging symbols
I could get more, but I haven't gotten that far yet.
One interesting thing to note is that I was able to get the system
back without rebooting this time. Here's what I did (after detaching
gdb from the X process):
- Kill everything associated with the hung X session (startx, xinit,
etc.). Everything other than X itself dies, which won't even die
with kill -9. The screen is still locked and the console is
unusable. (Not even ctrl-alt-del.)
- Try to run startx via ssh; it complains about a file in /tmp left by
the hung server that needs to be removed.
- Remove that and try again. Screen gets corrupted (wierd colors, but
nothing moving).
- Kill that server; it dies cleanly.
- Run startx again. Screen corruption disappears, server starts
normally, desktop comes up as usual. I can now switch to other
virtual consoles and all is well.
- At some point in that process, the original hung X process finally
died, but I didn't notice exactly when.
Any new ideas? Is any of that surprising to anybody? Do I need to
compile in debugging symbols in order to get anything useful to go on?
I normally wouldn't be so uncouth as to quote my own message in full,
but in case anybody missed the history of this problem, here it is:
On Thu, Nov 04, 2004 at 01:08:47PM -0500, Randall Nortman wrote:
> I've got an Athlon64 system with a Sapphire Radeon 9200 (64MB, using
> DVI output), running X.org version 6.8.0 on Gentoo. (Gentoo has not
> released 6.8.1 packages yet.) I'm using the open-source drivers, not
> the ATI proprietary drivers.
>
> X starts just fine and works most of the time. Occasionally, and
> completely unpredictably, the X server will hang. I cannot switch to
> a virtual console with Ctrl-Alt-F1 as I ought to be able to. There's
> no cursor movement and no windows are updating. Sometimes this
> happens while the screensaver is running; sometimes it happens when
> I'm just typing into a terminal window; sometimes it happens while I'm
> moving the mouse; sometimes it happens while I'm playing ut2k4. It
> doesn't happen at any particular time of day or with any sort of
> regularity, or when system load is particularly high or particularly
> low. It seems to be completely random.
>
> When it happens, I can ssh in from another system. When I do so, I
> always find that the X process is consuming 100% of CPU time. I
> usually cannot kill it, not even with 'kill -9'. (Once, I was able to
> kill it and start a new X server, but most of the time this does not
> work.) The only solution is to reboot the system (remotely, since
> Ctrl-Alt-Del does not work from the console while X is hung). When I
> do so, the screen continues to display whatever was last on screen
> right up until the box physically resets and then the BIOS display
> comes up and the box boots normally.
>
> There is nothing unusual in the logs when this happens. There's no
> core dump, obviously. I would like to be able to create a useful bug
> report about this, but I'm afraid I can't even begin to provide any
> useful information. How do we debug problems like this? Can X be
> compiled with extra debugging code? Would that greatly slow down
> performance? (Since I cannot reproduce the bug at will, I will need
> to just continue to use the system as normal until it hangs. If the
> system is quite slow because of debugging code, that will be quite
> annoying.)
>
> Should I perhaps start turning off features until the system becomes
> stable? You can see in the log that I have quite a few
> modules/extensions loaded, including dbe, XVideo, glx, dri, drm,
> RENDER, COMPOSITE, etc. This is essentially a default configuration,
> as provided by Gentoo; I didn't really mess around much with what's
> being loaded.
More information about the xorg
mailing list