Questions about object ID lifetimes

Tue Sep 19 13:26:37 UTC 2023

On Mon, 18 Sep 2023 11:31:18 -0400
jleivent <jleivent at comcast.net> wrote:

> On Mon, 18 Sep 2023 14:06:51 +0300
> Pekka Paalanen <ppaalanen at gmail.com> wrote:
> 
> > On Sat, 16 Sep 2023 12:18:35 -0400
> > jleivent <jleivent at comcast.net> wrote:
> >   
> > > The easiest fix I can think of is to go full-on half duplex.
> > > Meaning that each side doesn't send a single message until it has
> > > fully processed all messages sent to it in the order they arrive
> > > (thankfully, sockets preserve message order, else this would be
> > > much harder). Have you considered half duplex?    
> > 
> > Never crossed my mind at least. I can't even imagine how it could be
> > implemented through a socket, because both sides must be able to
> > spontaneously send a message at any time.  
> 
> By taking turns.  Each side would, after queuing up a batch of
> messages, add an "Over!" message (from the days of half-duplex
> radio communications) to the end of that queue, and then send the whole
> queue (retaining its sequence).  Neither side would send a message
> until it receives the other side's "Over!" message, and until the
> higher levels above libwayland have had a chance to examine all
> messages prior to "Over!" in order to avoid sending an inconsistent
> message or even committing to a state incompatible with later messages.
> 
> >   
> > > Certainly, it would mean a loss
> > > of some concurrency, hence a potential performance hit.  But
> > > probably not that much in this case, as most of the message
> > > back-and-forth in Wayland occurs at user-interaction speeds, while
> > > the speed-needing stuff happens through fd sharing and similar
> > > things outside the protocol. I    
> > 
> > That user interaction speed can be in the order of a kilohertz, for
> > gaming mice, at least in one direction. In the other direction,
> > surface update rate is also unlimited, games may want to push out
> > frames even if only every tenth gets displayed to reduce latency.
> > Also truly tearing screen updates are being developed.  
> 
> But aren't those fast frame updates done through shared fds?  Hence not
> part of the wire protocol, and would not be impacted by increasing the
> length of messages on the wire?

No. They are messages sent on the wire, telling "there is a new image
on that other fd I shared with you before, use that now", and so on.
That is usually a handful of requests per frame.

Likewise, every pointer motion event is one or multiple wire events.

Shared fds are used for sharing big chunks of data mostly, that is,
shared memory. But we don't use shared memory messaging nor locks. All
messaging is Wayland messages over the socket. After all, the XML files
describe wire messages.

We want everything to be in the same protocol stream as much as
possible to reduce race possibilities. If we had shared memory
messaging in addition to the unix socket, that would be two mutually
async protocol streams in the same direction. That would be quite a
pain, as we've learnt from Xwayland (you have both Wayland and X11
connections between the same two entities; as another matter, libX11 is
really eager to have blocking roundtrips, so if libwayland would also
block for something, a deadlock is practically guaranteed eventually).

> >   
> > > think it can be made mostly backward compatible. It would probably
> > > require some "all done" interaction between libwayland and higher
> > > levels on each side, but that's probably (hopefully) not too hard.
> > > There may even be a way to automate the "all done" interaction to
> > > make this fully backward compatible, because libwayland knows when
> > > there are no more messages to be processed on the wire, and it can
> > > queue-up the messages on each side before placing them on the wire.
> > >  It might need to do things like re-order ping/pong messages with
> > > respect to the others to make sure the pinging side (compositor)
> > > doesn't declare the client dead while waiting.  But that seems
> > > minor, as long as all such ping/pong pairs are opaque to the
> > > remainder of the protocol, hence always commute with other
> > > messages.    
> > 
> > If you mean adding new ping/pong stuff, that doesn't sound very nice,
> > because Wayland also aims to be power efficient: if truly nothing is
> > happening, let the processes sleep. Anyone could still wake up any
> > time, and send a message.  
> 
> Not adding.  Dealing with the already existing (or if any new ones are
> added) ping/pong pairs.  Or any messages that really need to be timely,
> hence can't wait for messages in front of them to be fully processed.

There are no existing mandatory ping/pong messages. Some extensions
have some, but all extensions are by definition optional from the
libwayland point of view.

Wayland messages are strictly ordered per direction, there is zero
expectation or guarantee that anything could be re-ordered at
libwayland level.

> That could apply to any real-time requirements, like the gaming mice
> messages you mentioned above.  But doing this in general is hard unless
> the messages are irrelevant to the rest of the protocol (hence commute
> with everything else), like ping/pong are.
> 
> > 
> > 
> > On Sun, 17 Sep 2023 15:28:04 -0400
> > jleivent <jleivent at comcast.net> wrote:
> >   
> > > Has altering the wire format to contain all the info needed for
> > > unambiguous decoding of each message entirely within libwayland
> > > without needing to know the object ID -> type mapping been
> > > considered?    
> > 
> > Not that I can recall. The wire format is ABI, libwayland is not the
> > only implementation of it, so that would be Wayland 2 material.  
> 
> So no changes to the wire format are possible under any circumstances
> in Wayland 1?

It would be possible to introduce new message argument types, with
consideration to libwayland ABI (the size of 'union wl_argument' is the
most limiting one), but otherwise I don't think so. There is no initial
version negotiation, so something else needs to be used to discover who
supports what, which is essentially wl_registry. That gets awkward for
wire level features.

> > > It would make the messages longer, but this seems like it wouldn't
> > > be very bad for performance because wire message transfer is roughly
> > > aligned with user interaction speeds.    
> > 
> > We need to be able to deal with at least a few thousand messages per
> > second easily.
> > 
> > The overhead seems a bit bad if every message would need to carry its
> > signature.  
> 
> Encoding more into the message is only needed if there are no
> destructor request acks (the equivalent of wl_display::delete_id, but
> in the opposite direction).  But I was wondering why not do it for
> robustness.
> 
> The signature isn't very big, but it's probably not needed even for
> robustness.  What's needed is the target object type/version
> information. Since from that both sides know the signature.  The issue
> is just how to add robustness to the object ID -> type/version
> mapping, which is the source of many problems.  The signatures are not
> ambiguous after that's done.
> 
> The target type/version info could be encoded to be small if you want -
> if both sides agree on an indexing scheme pairing numbers with type
> names & versions, like they already do for opcodes.  It's hard to
> imagine it taking more than 2 bytes (leaving you room for 2^16
> type/version combos), but 4 bytes is certainly plenty.

That would be quite a trick. Maybe through first-time binding order via
wl_registry... but then how to add those to wire packets without
exploding anything.

> If you don't have destructor request acks, you may also need
> "generation" numbers. There's a problem about object IDs we haven't
> spoken about yet, related mostly to the hard case with server-side IDs
> and server-side destructor events.  It's possible (I don't know if there
> are instances in the current protocol) for a series of object creations
> (new_id args) and destructions all originating from the server and
> all involving the same object ID such that the server might receive a
> request from the client involving that object ID and agree with the
> decoding about the type of the object, but not the generation of that
> object.  These steps:
> 
> server sends:
>   event E1 with new_id N for type T  (call the new object O1)
>   ...
>   destructor for id N
>   ...
>   event E2 with new_id N for type T  (call the new object O2)
> 
> at some later point, the server receives:
>   request R involving id N (either as target or object arg) which
>   matches type T based on decoding the request
> 
> The server cannot tell, even if it has all of the decode info available
> (whether encoded in the wire, or because the request R gets decoded
> without issue otherwise), whether the object with id N in R refers to O1
> or O2.  If it refers to O1, the message is what I called "speculative"
> because it was sent prior to the client seeing the destructor for id N
> above.  That message could be a mistake if processed, because the
> current state of the compositor has O2 with id N, and that might be a
> completely distinct usage of object type T that isn't at all related to
> request R.
> 
> A generation number would keep track of which particular instance
> (generation) of the object with id N and type T a message is referring
> to.  A generation number would be associated with each id N, and would
> get incremented each time N got re-used.  It would be added to the wire
> protocol for both the target objects and the object arguments.  That's
> more (probably 4-byte) fields.
> 
> Again, note that this is ambiguity between O1 and O2 cannot occur if
> there is some destructor ack request that the client can send and
> libwayland on the server side understand (the equivalent of the
> wl_display::delete_id, but going the other way). I suspect that's
> better to add to the protocol instead of generation numbers.  It
> also cannot occur in the half-duplex scheme, which is a more complete
> solution to many problems.
> 
> I read the Protocol-next issues - so I see you've thought about
> delete_id requests already.
> 
> >   
> > > Also, for any compositor/client pair, as long as they both use the
> > > same version of libwayland, the necessary wire format change would
> > > not result in compatibility issues.  It would for static linked
> > > cases, or similar mismatching cases (flatpak, appimage, snap, etc.
> > > unless the host version is mapped in instead of the packaged one
> > > somehow). There also seem to be unused bits in the existing wire
> > > format so that one could detect an a compositor/client
> > > incompatibility at least on one end.    
> > 
> > We've never had the requirement for compositor and clients to use the
> > same minor version of libwayland. There are also completely
> > independent Wayland implementations in other languages that expect to
> > be interoperable. Breaking all that seems unacceptable.
> > 
> > What unused bits did you find?  
> 
> For instance, the length field in the header.  It seems unlikely that
> you need all 16 bits.  And, currently, you have hard-coded 4K ring
> buffers in libwayland, so you have a hard limit on 4K (12 bits) for max
> message size. Unless I'm missing some code to handle that somewhere.  I
> think there is a check such that if a message in the wire claims to
> have a length too big for your 4K ring buffer, that's an error.

Right, those "unused" bits are not ignored, they are required to be
zero, which is a good basis for re-purposing them, but we still have
the problem of negotiating that both sides actually understand the new
thing.

> > > I'm not suggesting that unambiguous decoding of all messages is a
> > > sufficient fix, but it is a necessary one.  There are still
> > > speculative computation issues that it wouldn't resolve alone.    
> > 
> > I didn't understand what is speculative. There is no roll-back of any
> > kind on anything, what's computed is final.  
> 
> Right - you have speculation without roll-back, meaning you have broken
> speculation!
> 
> One case of speculation we discussed already is a request containing
> new_ID args, where the target object has already been deleted by the
> server.  That request was speculative, since it was sent by the client
> prior to it seeing the already-processed-on-the-server destruction of
> the target object.  The client "speculated" that the target object will
> still exist to receive the request, and did so incorrectly.

I would argue that "speculative" is not the right word here, it was
never intended.

The Wayland design postulates that every message sent reaches the
receiver, and in the order they were sent, and whatever action they
trigger in the receiver will succeed (or be indistinguishable from
succeeding). Anything to the contrary causes disconnection.

> Although this type of speculation is addressed (partially) by
> the addition of a delete_id request, at least with respect to
> disambuating message decoding, other cases are not. My reason for
> suggesting going half-duplex was about trying to prevent speculation
> AND message ambiguity together with one fix.
> 
> There may be cases of speculation unrelated to object destruction.
> Consider:
> 
> Side A sends message M1 then message M2 (or at least did other
> state-change things after sending M1).
> 
> Side B processes M1 (but doesn't yet see M2) and sends message M3 which
> is not compatible with what side A did after sending M1.  But side A
> doesn't know based only on its own state that it should ignore message
> M3.  And even if it does ignore M3, side B may have done processing
> after sending M3 involving change its state, such that it's new state
> is incompatible with side A ignoring message M3.
> 
> tl;dr: protocol asynchrony leads to speculation that can result in the
> two sides disagreeing about the correct state of the world.  

We avoid that with careful protocol design in XML. There is exactly
that kind of situation in the xdg-family of extensions and it is solved
by sending a serial with the events and acking that serial when the
client acts on the events.

It's a known caveat.

> BTW - I don't recommend adding rollbacks!  It's a huge can of worms
> unless the speculation is very limited (as with machine instruction
> speculation in modern CPUs).
> 
> You can prevent speculation for each individual case within the protocol
> by having the protocol include acks, mutexes or transactional boundary
> messages specific to these cases.  Or you could design the protocol so
> that such M3 messages (and other sending-side state changes) must
> always be compatible (commute with) with the receiving state - that's
> probably very hard.
> 
> Or, you could go half-duplex, and require that neither side "commits"
> to any state changes, including sending messages, until it has processed
> ALL messages prior to the "Over!" message that was sent by the other
> side. That prevents speculation by taking all client-vs.-server
> asynchrony out of the protocol.  This half-duplex idea, as draconian
> as it sounds, has one key point in its favor: it's the easiest to get
> right.

Half-duplex seems like it could cost too much in performance and/or
battery life. It would probably also ruin the non-blocking nature of
libwayland ABI: it tries to be careful to never block the caller unless
the caller explicitly wants to block. Just waiting for a roundtrip is
too much of a risk to do unexpectedly.

> Or, you can hope that cases of speculation don't arise.  Or that when
> they do, the resulting mistakes aren't too severe.
> 

Thanks,
pq
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <https://lists.freedesktop.org/archives/wayland-devel/attachments/20230919/311fc004/attachment.sig>