Radeon 9200 hangs: How to debug?

Randall Nortman xorglist at wonderclown.org
Thu Nov 11 17:45:13 PST 2004


As suggested, the next time this hang occured (about an hour ago), I
logged in remotely as root, ran gdb, and attached to the process.  A
backtrace showed that it was sitting in ioctl() in libc6.so.  Above
that on the call stack was xf86ioctl(), and everything above that was
obscured because I don't have debugging symbols compiled in.  Keep in
mind that CPU utilization is at 100% (all of it in the X process) when
the hang occurs.  As soon as I attached to the process in gdb, CPU
utilization dropped to normal levels.

So much for the debugger.  I suppose if I compile in debugging symbols
I could get more, but I haven't gotten that far yet.

One interesting thing to note is that I was able to get the system
back without rebooting this time.  Here's what I did (after detaching
gdb from the X process):

- Kill everything associated with the hung X session (startx, xinit,
  etc.).  Everything other than X itself dies, which won't even die
  with kill -9.  The screen is still locked and the console is
  unusable.  (Not even ctrl-alt-del.)

- Try to run startx via ssh; it complains about a file in /tmp left by
  the hung server that needs to be removed.

- Remove that and try again.  Screen gets corrupted (wierd colors, but
  nothing moving).

- Kill that server; it dies cleanly.

- Run startx again.  Screen corruption disappears, server starts
  normally, desktop comes up as usual.  I can now switch to other
  virtual consoles and all is well.

- At some point in that process, the original hung X process finally
  died, but I didn't notice exactly when.

Any new ideas?  Is any of that surprising to anybody?  Do I need to
compile in debugging symbols in order to get anything useful to go on?

I normally wouldn't be so uncouth as to quote my own message in full,
but in case anybody missed the history of this problem, here it is:

On Thu, Nov 04, 2004 at 01:08:47PM -0500, Randall Nortman wrote:
> I've got an Athlon64 system with a Sapphire Radeon 9200 (64MB, using
> DVI output), running X.org version 6.8.0 on Gentoo.  (Gentoo has not
> released 6.8.1 packages yet.)  I'm using the open-source drivers, not
> the ATI proprietary drivers.
> 
> X starts just fine and works most of the time.  Occasionally, and
> completely unpredictably, the X server will hang.  I cannot switch to
> a virtual console with Ctrl-Alt-F1 as I ought to be able to.  There's
> no cursor movement and no windows are updating.  Sometimes this
> happens while the screensaver is running; sometimes it happens when
> I'm just typing into a terminal window; sometimes it happens while I'm
> moving the mouse; sometimes it happens while I'm playing ut2k4.  It
> doesn't happen at any particular time of day or with any sort of
> regularity, or when system load is particularly high or particularly
> low.  It seems to be completely random.
> 
> When it happens, I can ssh in from another system.  When I do so, I
> always find that the X process is consuming 100% of CPU time.  I
> usually cannot kill it, not even with 'kill -9'.  (Once, I was able to
> kill it and start a new X server, but most of the time this does not
> work.)  The only solution is to reboot the system (remotely, since
> Ctrl-Alt-Del does not work from the console while X is hung).  When I
> do so, the screen continues to display whatever was last on screen
> right up until the box physically resets and then the BIOS display
> comes up and the box boots normally.
> 
> There is nothing unusual in the logs when this happens.  There's no
> core dump, obviously.  I would like to be able to create a useful bug
> report about this, but I'm afraid I can't even begin to provide any
> useful information.  How do we debug problems like this?  Can X be
> compiled with extra debugging code?  Would that greatly slow down
> performance?  (Since I cannot reproduce the bug at will, I will need
> to just continue to use the system as normal until it hangs.  If the
> system is quite slow because of debugging code, that will be quite
> annoying.)
> 
> Should I perhaps start turning off features until the system becomes
> stable?  You can see in the log that I have quite a few
> modules/extensions loaded, including dbe, XVideo, glx, dri, drm,
> RENDER, COMPOSITE, etc.  This is essentially a default configuration,
> as provided by Gentoo; I didn't really mess around much with what's
> being loaded.



More information about the xorg mailing list