Questions about object ID lifetimes

Sat Sep 16 16:18:35 UTC 2023

Pekka,

After thinking more about what you said, I'm no longer optimistic.

First, you are correct that my observation about opposite-side (side
A-ranged ID vs. side B destructor) only works for middleware, and then
only if the compositor and clients already handle their issues
properly.

Secondly, when thinking about the case of a message that arrives after
an object has been deleted with new_ids in it, it occurs to me that this
is a special case of a greater problem due to the existence of
speculative computation as a result of the protocol's asynchrony.  Any
time there are at least two messages that don't commute with each other
(and destruction is a case of a message that never commutes with any
other message to the same object) where the two messages can be sent
from opposite sides, at least one of them has to be undone somehow.  And
that undoing has to include state changes that preceeded it on its
sending side that didn't take into account the other (non-undone)
message.  This is bad.

It wouldn't be so bad if the protocol used some old-time mutexes or
database read-vs-write transactional consistency preservation
mechanisms. But those require quite a bit of input from higher levels
(above libwayland).  And there's deadlock to deal with.

The easiest fix I can think of is to go full-on half duplex.  Meaning
that each side doesn't send a single message until it has fully
processed all messages sent to it in the order they arrive (thankfully,
sockets preserve message order, else this would be much harder).
Have you considered half duplex?  Certainly, it would mean a loss
of some concurrency, hence a potential performance hit.  But probably
not that much in this case, as most of the message back-and-forth in
Wayland occurs at user-interaction speeds, while the speed-needing stuff
happens through fd sharing and similar things outside the protocol. I
think it can be made mostly backward compatible. It would probably
require some "all done" interaction between libwayland and higher
levels on each side, but that's probably (hopefully) not too hard.
There may even be a way to automate the "all done" interaction to make
this fully backward compatible, because libwayland knows when there are
no more messages to be processed on the wire, and it can queue-up the
messages on each side before placing them on the wire.  It might need
to do things like re-order ping/pong messages with respect to the
others to make sure the pinging side (compositor) doesn't declare the
client dead while waiting.  But that seems minor, as long as all such
ping/pong pairs are opaque to the remainder of the protocol, hence
always commute with other messages.

As for my own middleware project, I think I will try to detect message
decoding issues in all cases by keeping the most recent two types of
each ID, and attempting to decode both ways (most recent first).  There
are fortunately a bunch of internal consistency checks that can be done,
such as length of overall message vs. length of args vs. string length
vs. null string termination, etc.  But if the middleware gets a message
that passes these decoding consistency checks for both of those types,
then depending on what it is trying to do (as in one of my use cases,
securing a sandboxed application), it may have to cut off the client.