GSoC: Hot-Replace Server

Wed Apr 6 01:13:58 PDT 2011

On Tue, 2011-04-05 at 11:38 +0200, Mohamed Ikbel Boulabiar wrote:
> 
> 
> Some months ago, I have discussed the Wayland Crash Scenario and how
> windows should reconnect to a newly started wayland either than being
> closed and losing the open work.
> 
> 
> I discussed with zhasha and daniels and below there is a resumé of
> some points that may be useful:
> 
> 
> ...
> <daniels>
> wayland doesn't really have any state to save, the state to save lies
> in the apps - and as you've said, well-behaved apps like chromium do
> that automatically now
> <zhasha> Wayland needs to save its state somewhere it can be quickly
> retrieved
> <daniels> wayland itself has very little state to save though
> <zhasha> true but it still needs to continuously save it
> ...
> <boulabiar> daniels, they don't need to save them, they need to tell
> apps to do a redraw
> <zhasha> and how the hell does it know its clients surfaces when it
> just lost all its memory+state?
> <daniels> boulabiar: yeah, but wayland doesn't connect to apps, the
> apps connect to wayland.  so, if wayland crashes and restarts, the
> apps are going to know wayland's crashed when they reconnect.
> <zhasha> now you're venturing into the realm of "clients must support
> it" which they can do no matter what you do server-side
> <boulabiar> zhasha, the client memory are they stored in wayland or in
> that client process ?
> <zhasha> the surface is created by the client and stored in the client
> <boulabiar> so
> <zhasha> but how does wayland know it's there if it just lost all
> state?
> <boulabiar> zhasha, how much he need to store as a log, to recover
> that later ?
> <zhasha> you need to do 2 things:
> <zhasha> 1) save state *on every change* somewhere
> <zhasha> 2) make the transport (usually a socket) persist
> <zhasha> then when your server crashes, the clients won't be
> disconnected and the state can be recovered
> <daniels> the only interesting thing you could do is come back up with
> the exact same surfaces
> <daniels> but you probably don't want to do that, as the apps
> themselves may not have saved state
> <zhasha> and personally I think it's more trouble than it's worth,
> when pretty much all crashes (especially after wayland goes stable)
> can pretty much be traced back to the 3D driver
> <daniels> so you might just be throwing up a picture of your browser
> session where your browser has in fact vanished, or is doing something
> else
> <daniels> all the interesting state is in the clients, it's them that
> need to save the state
> <zhasha> daniels: no really, you can freeze the process, or
> alternatively just queue all their protocol requests
> <daniels> if apps were more robust against servers disappearing -
> which they can be with xcb, but no-one cares enough to port their
> toolkits/apps to it - then even x crashing wouldn't be a problem,
> because the apps would just reconnect and carry on
> <boulabiar> daniels, if there would be apps running on wayland, then
> toolkits need first to be ported anyway, we can just say be more
> robust because you're doing the work in all cases
> <daniels> boulabiar: yep.
> <boulabiar> and to automatically restart wayland
> 
> 
> 
> 
> 
> 
> According to this discussion xcb handles server disappear.
> 
> 
> i

I would just like to point out that I'm in no way against a hot swap for
the display server, but someone has to mention every little annoying
detail of how it will (or rather, will not) work.
The point I was trying to get across was simple: most wayland crashes
currently lead directly back to unstable 3D drivers. I personally think
it would make more sense to simply trap the 3D driver and "reboot" it so
to speak when it blows up. Hot replacing the entire server, especially
between different servers, is logistically impossible:

* Wayland servers are in no way restricted by the core protocol. Maybe
GNOME chooses to add interfaces for their specific desktop model making
Gtk+ applications integrate more seamlessly. Replace the server with
say, the KDE server (transparent to the clients) and you have a recipe
for disaster.
* Wayland servers are basically state machines and as such you would
need to have persistent state (and persistent transport). Persistent CAN
mean that it's stored in RAM but once again, you're absolutely limited
to same-server replace making it viable ONLY for crashes. This is
because every state machine, although implementing the same protocol in
the end will almost certainly store their state in different ways.

I maintain that it would be not only easier, but also an almost
catch-all solution that you instead of replacing the entire server
simply maintain a loose binding between server and libGL, then trap any
bad behavior stemming from the driver and reload it as necessary.

The alternative of course could be to implement it in the clients. The
server already "supports" that in that it's completely up to the
clients, and furthermore, you're getting absolutely no technical benefit
over X11.