[RFC wayland] Add wl_proxy destruction callbacks

Tue Jun 7 13:54:44 UTC 2016

On Tue, 7 Jun 2016 10:42:11 +0200
Miguel Angel Vico <mvicomoya at nvidia.com> wrote:

> Hi,
> 
> Sorry for the slow response. Inline.
> 
> On Tue, 31 May 2016 11:18:25 +0300
> Pekka Paalanen <ppaalanen at gmail.com> wrote:
> 
> > * PGP Signed by an unknown key
> > 
> > On Mon, 30 May 2016 13:10:42 +0200
> > Miguel Angel Vico <mvicomoya at nvidia.com> wrote:
> >   
> > > Hi all,
> > > 
> > > A few days ago, I had a little chat over IRC with Pekka about
> > > addition of proxy objects destruction callbacks to the wayland
> > > client protocol.
> > > 
> > > 
> > > Summing up, we recently ran into some applications where native
> > > objects (wl_surface, wl_egl_window, wl_display) used by EGL are
> > > destroyed/made invalid before destroying the corresponding EGL
> > > objects. This sometimes causes crashes of the EGL driver, which is
> > > not nice. We have seen this with the NVIDIA EGL implementation, but
> > > I assume the Mesa EGL implementation is similarly exposed.
> > > 
> > > I agree this is in fact an application bug, but the EGL spec states
> > > that functions such as makeCurrent or swapBuffers should return
> > > error (not crash) if the native objects become invalid. I also
> > > agree the spec should have been clearer and probably allowed
> > > "undefined behavior", but it is not the case.
> > > 
> > > Having an objects destruction notification mechanism such as
> > > destruction callbacks would allow us to satisfy the spec.
> > > 
> > > Also, such functionality would also make life way easier under
> > > certain circumstances. I'm basically thinking about multi-threaded
> > > applications, where several threads make use of the same native
> > > objects, and for some reason one of the threads has to destroy one
> > > or more of them due to some sort of error happening.
> > > 
> > > Of course, this can still be considered an application bug, and the
> > > application could still make sure none of the threads is going to
> > > use the native objects before destroying them, but again, specs
> > > allow users to do many non-recommended things.
> > > 
> > > I think we should try to do our best to gracefully handle those
> > > non-desirable API usages, and avoid crashes whenever is possible.
> > > 
> > > 
> > > Pekka did not see this as something crazy to have, but wanted to
> > > hear from some of the toolkits guys before making the decision of
> > > whether changing the wl_proxy ABI is a good idea.    
> > 
> > Hi,
> > 
> > indeed. I think we could do such an addition to the wl_proxy ABI
> > safely, if we want to. I just don't know if we want to.
> > 
> > I'm not so sure about the point of helping multithreaded applications.
> > Applications use toolkits, and toolkits' job is to support whatever
> > multi-threading ways they want, because they probably have to
> > explicitly support threading anyway. The issue is use cases where
> > there is no toolkit in between: EGL.  
> 
> True, but I think there are enough EGL applications out there to
> warrant improving wayland in such a way that lets us be nice with those
> applications as well.
> 
> > 
> > Another point that diminishes the multi-threaded use case is that one
> > cannot have multiple listeners on a wl_proxy anyway. If you need to
> > process events for the same object in more than one thread, you must
> > have the support code outside of libwayland-client.  
> 
> I didn't really gave so much thought about how the implementation would
> look like, but I think I was somehow hoping we could add some mechanism
> to have multiple listeners. However, having multiple listeners
> could cause many headaches and looking at how wayland is designed,
> maybe I was aiming too high :-(

Hi,

yes, I don't think it is feasible.

Currently, a wl_proxy is associated with exactly one wl_event_queue.
The app/toolkit is supposed to use one wl_event_queue for each
different execution context (e.g. locks held) and thread. When an event
is received, it is stored in the designated event queue. The program or
library explicitly dispatches events from a queue, separately from
receiving, so that it can control exactly in which thread and context
the callbacks will be called.

Adding multiple listeners would imply multiple wl_event_queue
assignments, so that you could ensure the calling context.

Then you'd probably need (to expose) reference counting on wl_proxy to
address lifetime and queue assignment issues.

I don't think that is a rabbit hole we want to dig for ourselves. Just
adding "simple" threading support to libwayland-client kept producing
new failure modes years after it was supposed to work, and now we have
deprecated ABIs to maintain since the original could not be fixed under
the hood, and so on.

> > Also, the thread
> > assignment with listeners (wl_event_queue) must be done on the object
> > creation. It cannot be done afterwards without potential races. Again
> > EGL is somewhat special, since it does not need to listen on
> > externally created objects.  
> 
> This is an important point I didn't consider either. I've seen the
> client API defines proxy_wrappers to make queue assignments
> thread-safe. IIUIC, they act as regular proxies in all situations but
> queue assignment, and children objects created with such wrapper will
> use the wrapper event queue.
> 
> I was wondering if wrappers could be extended in such a way that
> destruction listeners could be set, and whenever the actual proxy
> object is destroyed, send a notification to all wrappers, which would
> call into their specific destruction callback if set.

That's certainly an... "interesting" proposal. I am quite tempted to
just say no for such complexity as there are possible alternative
solutions, including the one where we say there is no problem to begin
with.

> > 
> > Then there is the question of what the locking should look like.
> > 
> > The EGL specification language is troublesome. I guess it was written
> > with X11 in mind, where anyone even outside of the current process can
> > go and destroy the native window at any time, without any
> > coordination. Obviously it is desireable that it does not lead to a
> > crash. However, that is not possible on Wayland. No external actor
> > can go and destroy the wl_display or wl_surface from behind your
> > back, only the application itself can do that. Therefore, I think if
> > the application deliberately shoots its own foot, a crash is ok, just
> > like if you pass a bad pointer through EGL API in the first place. I
> > am very tempted to say that this should be raised as a EGL spec bug,
> > to allow undefined behaviour or such. EGL platform specifications
> > could then make their own additional requirements, like on X11 you
> > must not crash if Window disappears.  
> 
> Agreed.
> 
> > 
> > Those were the cons, then the pros not yet mentioned:
> > 
> > Currently there is a problem with identifying Wayland protocol objects
> > referenced in incoming events, in case there are multiple components
> > creating them. See Section "Multiple input handlers" in
> > http://ppaalanen.blogspot.fi/2013/11/sub-surfaces-now.html
> > for an example of a problematic case. It is not the only one.
> > 
> > If wl_proxy had destroy listeners like wl_resource does, one could use
> > a wl_signal_get()/wl_resource_get_destroy_listener() kind of an
> > approach to fetch a component-specific struct, rather than assuming
> > who happens to own the singular "user data" pointer and risk a crash.  
> 
> Thanks for sharing your notes, Pekka. I've been reading through your
> article and it's really helpful.
> 
> I understand if we end up adding destruction callbacks, as a bonus we
> would get a better solution for the "Multiple input handlers" problem,
> but it doesn't seem like a strong point in favor of adding destruction
> callbacks. The "Multiple input handlers" problem could be resolved in a
> very different way not related to destruction callbacks at all.

True, it would be just a bonus, and not enough to justify on its own.

> >   
> > > 
> > > As an alternative, destruction callbacks could be hung off of
> > > wl_egl_window. In a similar way we support wl_egl_window_resize
> > > callbacks, we could support wl_egl_window_destroy callbacks.    
> > 
> > At the moment I would slightly favour the wl_egl_window approach if it
> > allows to circumvent the EGL wording and the wording cannot be fixed.  
> 
> If no one else is interested in adding wl_proxy destruction callbacks,
> I'll just go with this alternative and send an appropriate request for
> review so it can be integrated in Mesa drivers as well. I wouldn't like
> this to be a NVIDIA-specific feature.

That is a very good idea.

Would you not try to file a bug against the EGL spec, though?

> > > However, this isn't as foolproof as adding wl_proxy destruction
> > > callbacks, since destruction of wl_surface or wl_display objects
> > > before wl_egl_window would lead to same issues.
> > > 
> > > 
> > > I really think adding destruction callbacks to wl_proxy would be an
> > > improvement worth making, but others' thoughts must be heard
> > > first.    

Thanks,
pq
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 811 bytes
Desc: OpenPGP digital signature
URL: <https://lists.freedesktop.org/archives/wayland-devel/attachments/20160607/383b9a3b/attachment-0001.sig>