Handing clients among multiple servers

Tue Sep 27 23:33:57 UTC 2022

AFAIK, the handling of server crashes is still mostly an unresolved 
area. While in Arcan it's explicitly handled via SHMIF, there are only 
incompatible attempts to resolve this in Enlightenment and kwin_wayland, 
which allow the clients to wait for a while while a server is relaunched 
(and it needs to be the same server which knows some credentials from 
the previous run).

I got an idea when reading this: 
https://www.linux.org.ru/forum/talks/16983280?cid=16984372 While the 
exact approach described there (moving the clients between nested and 
parent compositors for the need of grouping) may sound niche, there may 
be other uses for it. One of them is simplifying the testing of Wayland 
servers, so a nested server is launched and a client is moved to it 
temporarily, instead of re-launching test clients every time.

Another, more practical, use is a backup server which the clients 
connect to in case of an emergency (the main server has crashed). Such a 
server does not even need to be presented to a user, it may run in 
background like screen/tmux, and it should be tiny, simple and robust. 
Similar examples:

• backup BIOSes in some motherboards, which can be used for booting if 
the main BIOS was corrupted due to an incorrect upgrade;

• a backup window manager in Windows 7, which is used when DWM.exe 
crashes (due to buggy GPU drivers, hardware problems in GPU, etc.);

• compiz-reloaded has a crash handler and launches some other process 
(xterm by default, or any other fallback WM) instead of itself, so the 
session is not rendered unusable (and some display managers quit the 
whole X.Org session if the WM has crashed with no replacement). In the 
X11 world, the X server takes the role of such a backup server, but 
there is no such separate entity in the Wayland world yet.

Handing over should be handled by the clients themselves, with security 
concerns (so a malicious server cannot steal the control over them). It 
seems to me though that the approaches for handing clients between 
multiple user-controlled servers, and between a main and a backup 
server, is fundamentally different.

For multiple user-controlled servers (like parent and nested, or vice 
versa):

1a) a client requests the current server to connect to a new server 
(specified), or

1b) the current server informs the client about a new server (specified) 
to connect;

2) the current server grants the reconnect;

3) the client freezes its event queue, disconnects from the current 
server and connects to the new one.

For a main server and a backup server:

1) the current servers informs the clients about the backup server 
beforehand;

2) a client determines that the main server has crashed;

3) the client freezes its event queue and connects to the backup server.

Though, if the main server is always launched by the backup one, it can 
be simplified to a common approach: the current (or backup) server 
informs the client about a new (or main) server, and the client connects 
to it whenever needed. All the client needs to know is that the server A 
trusts the server B. The source of trust is another discussion point 
(should it be merely an executable path, or something more complicated?)

What's your opinion on it? I'm not a Wayland expert, and I suppose there 
may be significant problems (like dealing with resource pointers). Do 
you have better ideas maybe?