X server crash recovery
Paulo Cesar Pereira de Andrade
pcpa at mandriva.com.br
Mon Oct 8 07:44:27 PDT 2007
Pavel Troller wrote:
>> Since most people use some DM, they usually aren't as affected by a
>> "random" crash as a new server will be started, but a switch to a
>> virtual console will be impossible. I am more concerned about some way
>> to "kill" the server from the keyboard when something like a driver, is
>> spinning in some infinite loop.
> I'm using ACPI for such a purpose. I've written a single script, which is
> called by pressing the POWER button. The script counts how many times the
> button is pressed and then (when there are no more presses for two seconds)
> it performs an action. Currently, the following actions are performed:
> 1 press: STD (suspend to disk/hibernation)
> 2 presses: kill the X server (it tries to use more and more powerful signals
> ending with SIGKILL until it is killed)
> 3 presses: init 6 (reboot)
> 4 presses: init 0 (power off)
> 5 and more presses: Ignored (so you can easily cancel unwanted action by
> issuing at least 5 presses)
> Every button press is signaled by the PC speaker beep with increasing pitch,
> so you have an acoustic feedback that the system is receiving your presses.
> So I can kill X without any help from it, even it's completely frozen. When
> the system is crashed so deeply that it can't deliver ACPI events (no beeps
> when pressing the power button), you need to use Reset button - it would not
> be possible to use the network anyway.
> Of course you need the ACPI subsystem compiled in and active.
> With regards,
> Pavel Troller
I was considering ACPI as a possible alternative. But I am just
studying and writing some experimental code. X should have at least the
keyboard and video driver on different threads. The "smooth" mouse would
probably also be better in a separate thread, other than using SIGIO.
A "common crash" seens to be spinning doing some system call, usually
in the dri code, at least for my ati card. Not sure how well this kind
of thing can be handled. Hopefully it will be possible to take some
action during it.
There are other things that could be done to make X more "failsafe",
like having the "hardware code" absolutely not interfere with the
"abstraction code", so that even if it crashes, it should be possible to
keep running without the need to close all clients, i.e. all resources
aren't lost because the hardware crashed. There are already several
"virtual X server" implementations that do something similar, i.e. one
can detach from the screen/display, and resume the session in another
computer at a later time, etc (like the screen program).
Something like this most likely would require changes to drivers, as
it will show all kinds of race conditions (calls to the driver can/must
be serialized, but the driver accessing X server data, or making X calls
may be tricky to properly handle). But hopefully this can lead to a
better modular interface. And/or fix/implement some things that people
know needs to be done, but it is too hard or requires too much developer
collaboration, like complete/updated documentation of drivers options.
And maybe some "dreams" like an intelligent code that can guess what
was the cause of a crash, and don't repeat it again in the next
execution; changing module parameters/options at runtime, etc.
More information about the xorg