X server crash recovery

Paulo Cesar Pereira de Andrade pcpa at mandriva.com.br
Mon Oct 8 07:44:27 PDT 2007


Pavel Troller wrote:
> Hi!
>
>   
>>  Since most people use some DM, they usually aren't as affected by a 
>> "random" crash as a new server will be started, but a switch to a 
>> virtual console will be impossible. I am more concerned about some way 
>> to "kill" the server from the keyboard when something like a driver, is 
>> spinning in some infinite loop.
>>
>>     
>
> I'm using ACPI for such a purpose. I've written a single script, which is
> called by pressing the POWER button. The script counts how many times the
> button is pressed and then (when there are no more presses for two seconds)
> it performs an action. Currently, the following actions are performed:
>
> 1 press:   STD (suspend to disk/hibernation)
> 2 presses: kill the X server (it tries to use more and more powerful signals
>            ending with SIGKILL until it is killed)
> 3 presses: init 6 (reboot)
> 4 presses: init 0 (power off)
> 5 and more presses: Ignored (so you can easily cancel unwanted action by 
>            issuing at least 5 presses)
>
> Every button press is signaled by the PC speaker beep with increasing pitch,
> so you have an acoustic feedback that the system is receiving your presses.
>
> So I can kill X without any help from it, even it's completely frozen. When
> the system is crashed so deeply that it can't deliver ACPI events (no beeps
> when pressing the power button), you need to use Reset button - it would not
> be possible to use the network anyway.
>
> Of course you need the ACPI subsystem compiled in and active.
>
> With regards,
>     Pavel Troller
>   
  I was considering ACPI as a possible alternative. But I am just 
studying and writing some experimental code. X should have at least the 
keyboard and video driver on different threads. The "smooth" mouse would 
probably also be better in a separate thread, other than using SIGIO.
  A "common crash" seens to be spinning doing some system call, usually 
in the dri code, at least for my ati card. Not sure how well this kind 
of thing can be handled. Hopefully it will be possible to take some 
action during it.

  There are other things that could be done to make X more "failsafe", 
like having the "hardware code" absolutely not interfere with the 
"abstraction code", so that even if it crashes, it should be possible to 
keep running without the need to close all clients, i.e. all resources 
aren't lost because the hardware crashed. There are already several 
"virtual X server" implementations that do something similar, i.e. one 
can detach from the screen/display, and resume the session in another 
computer at a later time, etc (like the screen program).

  Something like this most likely would require changes to drivers, as 
it will show all kinds of race conditions (calls to the driver can/must 
be serialized, but the driver accessing X server data, or making X calls 
may be tricky to properly handle). But hopefully this can lead to a 
better modular interface. And/or fix/implement some things that people 
know needs to be done, but it is too hard or requires too much developer 
collaboration, like complete/updated documentation of drivers options. 
And maybe some "dreams" like an intelligent code that can  guess what 
was the cause of a  crash, and  don't repeat it again in the next 
execution; changing module parameters/options at runtime, etc.

Paulo




More information about the xorg mailing list