Questions about object ID lifetimes

Thu Sep 14 19:10:48 UTC 2023

On Thu, 14 Sep 2023 16:32:06 +0300
Pekka Paalanen <ppaalanen at gmail.com> wrote:

> ...
> 
> congratulations, I think you may have found everything that is not
> quite right in the fundamental Wayland protocol design. :-)

Oh, you flatter me.  I'm sure there's more!

> 
> As an aside, we collect unfixable issues under
> https://gitlab.freedesktop.org/wayland/wayland/-/issues/?label_name%5B%5D=Protocol-next
> These are issues that are either impossible or very difficult or
> annoying to fix while keeping backward compatibility with both servers
> and clients.

Only 7 of them?

> 
> ------
> 
> Object ID re-use is what I would call "aggressive": in the libwayland
> C implementation, the object ID last freed is the first one to be
> allocated next. There are two separate allocation ranges each with its
> own free list: server and client allocated IDs.

After I sent the initial post, I realized that the two separate
ID ranges help in the following way:

For any object ID in the allocation range of side A, a destructor
message from side B does not need acknowledgement.  This is because B
can't introduce a new object bound to that ID, only A can.  Hence, any
new_id arg for that ID is an acknowledgement of the destruction.
However, B has to be careful to ignore messages containing that ID
until it sees one with the ID as a new_id arg.  After the destructor
message from B but before a subsequent new_id for that ID from A, B
should not use the ID as arguments to other messages (and attempts to
do so can be dropped).  And this can be automated provided the
destructor tag can be relied on.

> 
> The C implementation also poses an additional restriction: a new ID
> can be at most the largest ever allocated ID + 1.
> 
> All this is to keep the ID map as compact as possible without a hash
> table. These details are in the implementation of the private 'struct
> wl_map' in libwayland.

Obviouly, that helps middleware as well, for the same reasons.  It also
makes more automatic error detection possible.

> ...
> 
> Your whole above analysis is completely correct!

I was rather hoping things would turn out less complex than they
seemed...

> 
> > However, the other cases are not as easy to identify.
> > 
> > The other cases are:
> > 1. an object created by a client request that has destructor events
> > 2. an object created by the compositor
> > 
> > It might be true that case 1 does not exist.  Is there a general
> > rule against that such cases would never be considered in future
> > expansions of the Wayland protocol?  
> 
> Destructor events do exist. Tagging them as such in the XML was not
> done from the beginning though, it was added later in a
> backward-compatible manner which makes the tag more informational than
> something libwayland could automatically process. The foremost example
> is wl_callback.done event. This is only safe because it is guaranteed
> that the client cannot be sending a request on wl_callback at the same
> time the server is sending 'done' and destroying the object:
> wl_callback has no requests defined at all.

Fortunately, my point above about the advantage of the separate ID
ranges helps here.  If wl_callback is created by the client, then a
wl_callback.done event tagged as a destructor does not need
acknowledgement AND is always safe provided that messages involving the
wl_callback ID (other than it's eventual reuse as a new_id arg) are
ignored above libwayland.

But again, this means the destructor tag is important and not merely
informational.

I did notice that the destructor tagging was added mostly (or
solely) to help with code generation by wayland-scanner implementations
in programming languages where destructors require some specific
syntactic notation.

But maybe destructor tagging is even better than that?  Maybe it would
allow libwayland to automate more in a more robust way AND also allow
for middleware that doesn't have to simulate all of the semantic level
interactions induced by protocol messages in order to merely keep track
of how to decode messages.

> 
> It also requires that nothing passes an existing wl_callback object as
> an argument in any request. We have been merely lucky that no-one has
> done that. It's really hard to imagine a use case where you would want
> to pass an existing wl_callback to anything.

Again, the above separate ID ranges point addresses this, I think.

> 
> Extensions may have similar objects that only deliver some one-off
> events and then "self-destruct" by the final event. All this is simply
> documented and not marked in the XML.

That's what I was hoping to avoid.  If there are object types where
object lifetime can only be understood by simulating all of the
relevant semantic content of the messages involved, then that's not
good for middleware.  Isn't it also problematic towards the goals of
libwayland, because it makes it impossible for libwayland to ensure
that messages are properly decoded without trusting that the client
and/or compositor have implemented everything properly?  Because
demarshaling a message incorrectly seems like it would be bad news.

> 
> The asymmetry in the fundamental protocol bookkeeping messages
> (wl_display.delete_id) is unfortunate in hindsight. Client created and
> client destroyed objects are best supported, anything else will
> require very careful messaging design to avoid a race pitfall. We
> have a proper tear-down sequence for a client destroying an object,
> but we do not have a similar sequence for the server destroying an
> object, where the client would ack the destruction before the object
> ID is re-used.

If you add one, it would be compatible with giving importance to
destructor tagging if done as follows:  The initial even proposing the
removal of the object is NOT tagged as a destructor, but the
acknowledgement from the client IS tagged as a destructor.  That would
become a case like the ones above where a destructor is issued by the
side opposite the one owning the object ID range: such destructors can
be the final message and don't themselves need further acknowledgement.

> 
> > For objects created by the compositor, there are 2 subcases:
> > 
> > 2a. objects with only destructor events
> > 2b. objects with destructor requests
> > 
> > Again, it might be the case that 2b does not exist, as it is
> > analogous to case 1 above.  But, is there a general rule against
> > such future cases as well?  Combining 1 and 2b, is there a general
> > rule that says that only the object creator can initiate an
> > object's destruction (unprovoked by the other side of the
> > protocol)?  
> 
> There is no such rule, but nowadays the recommendation is to keep to
> client created and client destroyed objects unless there are pressing
> reasons to design something more delicate/fragile.
> 
> 2b exists as wl_data_offer, created by wl_data_device.data_offer
> event.

Again, saved by my observation above about the safety added by separate
object ID ranges for clients vs. compositor.

> 
> I'm not aware of 2a existing in the wild, but they could I guess.
> 
> Generally we try to avoid new_id in events, because it causes a race
> by default: what if the server is sending a new_id event at the same
> time the client destroys the originating object? The result is a
> mismatch in object ID tracking between server and client, causing
> unexpected failures perhaps much later.

This can occur with the sides swapped, couldn't it?  If a new_id arg
appears in a message such that the destination side for that message can
initiate destruction of the message's target object, then if things are
not handled carefully, the new_id leaks (isn't ever reused).  To
prevent this leak, you either cannot allow that side to initiate
destruction of the target object, or you have to keep at least the type
information for the target object around until the destruction is
acknowledged.  If you did keep the type information, you could continue
to decode the messages and send something like delete_id responses for
each such new_id.

> 
> When a client (or server) calls the respective API to destroy a
> protocol object, libwayland guarantees that the object's
> listeners/callbacks won't be called even if messages are still
> received on that object until it is properly torn down. That makes
> the user code easier to write, but it also throws the afterwards
> received messages away. If that message was to create a new protocol
> object, that won't happen, the ID won't be allocated. This causes the
> mismatch.

[I noticed the use of "zombies" in the client, but no parallel in the
server.  Also it seemed from a cursory analysis of the client code that
the zombies are mostly to take care of cases that arise in multithreaded
clients.  Is that correct?  Or are zombies related to this issue
instead?]

There is a domino effect as well: Messages involving that object as an
argument that also have a different new_id argument have to be involved
in the above processing as well, with delete_id sent for those new_id
arguments.  And so on for messages involving those new_id arguments.

> 
> Libwayland cannot automatically destroy objects either, because it has
> no information on how to do that properly and safely. The XML does not
> tell for sure.

Is it a goal for libwayland to be able to at least manage object IDs
properly on its own?  Because that would align with the needs of
middleware.

Also, I think it might not be too late (in backward compatibility
terms) to do something about this.

> ...
> 
> > 
> > A. Ownership vs. destruction: An object created by a client can only
> > have destructor requests, and an object created by the compositor
> > can only have destructor events.  
> 
> False, violated in the wild. Examples: wl_callback created by client,
> destroyed by server; wl_data_offer created by server, destroyed by
> client.

Again, that case addressed by the observation about separate ID ranges
way above.

> 
> > B. All destructor requests are followed by wl_display::delete_id
> > events as acknowledgements  
> 
> True, as this is built in to libwayland.
> 
> > C. All destructor events are themselves acknowledgements of
> > (implicit) destruction requests.  
> 
> Close enough I suppose, assuming the events are correctly tagged as
> destructors. As mentioned, the tag is heavily recommended but not
> technically mandatory for the libwayland C bindings, because it was
> introduced several years too late.

If the destructor tag is necessary for other language bindings, that's
sufficient for my purposes, I think.  I can make sure I'm
parsing protocol XML files that will work for languages that require
destructor notation.

But, if it isn't "technically mandatory", doesn't that risk the
introduction of protocol for which wayland-scanner does not generate
proper code for a language that requires destructor notation?

> 
> > D. No object can be the target or argument of a message issued by
> > one side after that side has issued a destructor request (explicit
> > or implicit).  
> 
> Correct, I believe. An actual destructor request should be enforced
> like that in libwayland. Allowing otherwise would defeat the point of
> a destructor request or a non-destructor request intended to trigger a
> destructor event. Nothing enforces the latter though.
> 
> > Is this correct?  Are these actual requirements that are enforced
> > for the current protocol and future expansions?
> > 
> > Can there be cases of objects created by the compositor, where the
> > compositor proposes their destruction without any prior (implicit or
> > explicit) request to do so from the client?  If so, how are these  
> 
> I don't know of anything forbidding them.

This again is the case where, as long as the ack from the client is
what is tagged as being the "destructor", all is still OK.

> Thanks,
> pq

I am still hopefull, especially after realizing the benefits of the
separate ID ranges.  The only case this leaves uncovered is object IDs
in the compositor ID range where the compositor wants to initiate
destruction of that object.  And I believe that case can be handled
automatically as well, possibly even in a backward-compatible way.   The
domimo effect case with chains of new_ids is complex, but I think there
is a way to handle it at least semi-automatically.

But, only if destructor tags (or something similar) can be relied on in
future versions and extensions to the wayland protocol.

My interests are in the possibility of robust middleware, but I
think there is a convergence between three areas such that the same
hopefully backward compatible additions yield positive results to all.
These are: middleware, wayland-scanner bindings generation for
languages with destructor notations, and completely managing the "meta"
aspects of the protocol completely within libwayland.  These meta
aspects are about proper decoding (demarhsalling), proper reuse of
object IDs and preventing ID leakage, and the prevention of bugs
and/or excessive complexity that might be induced by attempts to manage
these things above libwayland. And the ability to automatically check
for errors in these cases and prevent their spread.

Thanks,
Jonathan