cross-client surface references

Wed Jul 8 04:47:28 PDT 2015

Hi,

On 7 July 2015 at 21:15, Bill Spitzak <spitzak at gmail.com> wrote:
> On Mon, Jul 6, 2015 at 11:53 PM, Pekka Paalanen <ppaalanen at gmail.com> wrote:
>> Multiple "handles" to the same underlying server-side object from
>> completely asynchronous contexts (different processes). I can't see
>> that ending well at all, considering that *nothing* we have ever
>> designed for Wayland accounts for that.
>
>
> I think you already have that for all the global objects. Two different
> clients can create a proxy for the same global object.

No, this is absolutely not the case.

Binding in response to a global advertisement creates a totally new
object from scratch, which shares nothing with any other clients
anywhere. There are no 'global objects', only what are effectively
global extension/interface advertisements, which are invitations to
create an object.

They do not share state, they do not have any requirement to
interleave request processing, and they do not have any requirement to
multiplex event delivery.

I started to enumerate the ways in which this falls apart, but there
are just so many that it's not of much use.

> Yes this also creates
> multiple C structures in the server, to keep track of the actual client
> communication, but for most documentation you act as though there is one
> "object" that all the clients are "sharing".

Please point out that document so it can be burned to the ground and/or fixed.

>> Destruction cannot "go" anywhere. The server cannot choose to destroy a
>> protocol object of one client at arbitrary times, even if it was
>> requested by another client. We simply do not have the messaging for
>> this, nor the extensive machinery required to solve the races.
>
> There is the wl_registry.global_remove event. However you are right there is
> no matching destroyed event for objects that are created by clients.

This is nothing to do with object destruction though: it is withdrawal
of the advertisement which invites you to create objects, and a
suggestion that any objects created from that advertisement be
destroyed.

> If one was added I think it could be done the same way: rather than adding a
> destroyed event to every object, an event similar to global_remove could be
> added, perhaps to the same interface that creates and uses "keys".

Per above, the entire premise of this paragraph is wrong, so I'll just leave it.

>> What does "destruction" even mean? Destroying the protocol object of
>> one client, or the underlying server-side object? Can you even define
>> that universally?
>
> Destruction of the underlying server-side object. As you point out, it is
> not possible for clients to destroy global objects such as the
> wl_compositor.

It is totally possible for the client to destroy global objects: send,
e.g., wl_compositor_destroy(). Great, object gone.

It is not possible for the clients to destroy global _advertisements_,
because they are absolutely not in any way the same thing as objects.

>> If the server-side object disappears, what happens? How could you ever
>> define that in a way that would work for everything?
>
> It works like global objects do after the global_remove event is sent. The
> client can send requests for these objects before it gets the event, so the
> server must be prepared to handle those requests after destruction. I think
> this has already been discussed (though on the client end) as "zombie"
> objects, and the compositor is already doing this in a few cases for objects
> that depend on input devices that can disappear.

Again, the premise and reasoning behind this paragraph is incorrect.

>> Let's take this relatively simple case of one client providing a parent
>> surface and another client providing a dialog surface that should be
>> related to the parent. What should happen if the parent disappears?
>> Should the client with the dialog be notified?
>
> Ideally yes, though if there is no such event the child will probably be
> able to figure out what happened. I would expect if it made a child surface
> that the surface would unmap when the parent was unmapped.

What happens when the child's trying to reattach, except that the
parent just got destroyed?

You'll also note that we don't send any events when a window becomes
unmapped, since that's all handled by the one single-same client. So
in reality what would happen is that the client would just hang
indefinitely waiting for a frame callback which never came, because
it's now offscreen.

>> Or what if the server-side object gets destroyed only when the last
>> protocol object associated with it is destroyed? A client shares an
>> xdg_surface, stuff happens, the client destroys the xdg_surface
>> assuming the window gets unmapped, but oops, someone else is holding a
>> ref to it, so it actually doesn't unmap.
>
> Yes you are correct, the simplistic "reference count" implementation where
> the destruction only happens after all clients send the destroy request will
> cause problems. Instead the first destroy request has to work.

This in itself is fairly full of races.

>> What then, if the xdg_surface is shared but the related wl_surface is
>> not, and the wl_surface gets destroyed? Xdg-shell defines what happens
>> in a non-shared case, but how does that translate to the other client?
>
> I can't seem to find and description of what should happen if you destroy a
> wl_surface and do not destroy the xdg_surface first. I am guessing what
> happens is a protocol error for the client making the destroy request on the
> wl_surface. To avoid this the client must destroy the xdg_surface first,
> which means the destruction must work even if another client has the
> surface. So this is also an example where simple reference-counting won't
> work.

None of them will really work.

This thread has sadly degenerated into: 'what if Wayland's object
model was totally different? what if some of its explicit core design
principles were thrown out the window?'. Realistically, that is not
something that will happen in the Wayland 1.x timeframe, if ever.
Where this thread started was, 'what's a good sandboxing model for
clients which must be explicitly separated for security reasons?', and
the answer to that is the same answer to the same issue within WebKit
2 (UI process distrusts render processes), which is that your more
trusted client itself becomes a Wayland compositor. Which is exactly
what Jasper did with Wakefield, almost exactly for this usecase, a few
months ago. If there are issues with that design, then great, let's
chase that up.

But the answer isn't to redesign the entire object lifetime model on
the back of a bar coaster, especially when I would not say that an
overabundance of clarity is the biggest issue with our object lifetime
model right now. If you want to do that as an intellectual exercise,
then that's great, but I would recommend doing what Kristian did when
he started Wayland, which is to create a git repository and have a
play around with various models until something seems to stick. If
you're making such fundamental changes, then there will be a whole
host of assumptions around that which will no longer hold true, and
the overall model will make less and less sense. 'Why is thing A like
this?' 'Well, because thing B used to do this.' 'But it doesn't.' 'Hmm
... oh right, that's because thing C was designed like this. But now
it's totally different.' Even just adding event queues right before
1.0 was almost a stretch too far, as it raised a bunch of issues that
it took us quite some time to even _understand_, let alone work
through.

I don't expect this to happen, and I more or less expect to repeat the
'please don't derail' speech in the next massively-derailed thread,
but at least let the record show that I tried.

Now let's all go back to not wasting our time.

Cheers,
Daniel