[RFC wayland] Add wl_proxy destruction callbacks

Tue Jun 7 08:42:11 UTC 2016

Hi,

Sorry for the slow response. Inline.

On Tue, 31 May 2016 11:18:25 +0300
Pekka Paalanen <ppaalanen at gmail.com> wrote:

> * PGP Signed by an unknown key
> 
> On Mon, 30 May 2016 13:10:42 +0200
> Miguel Angel Vico <mvicomoya at nvidia.com> wrote:
> 
> > Hi all,
> > 
> > A few days ago, I had a little chat over IRC with Pekka about
> > addition of proxy objects destruction callbacks to the wayland
> > client protocol.
> > 
> > 
> > Summing up, we recently ran into some applications where native
> > objects (wl_surface, wl_egl_window, wl_display) used by EGL are
> > destroyed/made invalid before destroying the corresponding EGL
> > objects. This sometimes causes crashes of the EGL driver, which is
> > not nice. We have seen this with the NVIDIA EGL implementation, but
> > I assume the Mesa EGL implementation is similarly exposed.
> > 
> > I agree this is in fact an application bug, but the EGL spec states
> > that functions such as makeCurrent or swapBuffers should return
> > error (not crash) if the native objects become invalid. I also
> > agree the spec should have been clearer and probably allowed
> > "undefined behavior", but it is not the case.
> > 
> > Having an objects destruction notification mechanism such as
> > destruction callbacks would allow us to satisfy the spec.
> > 
> > Also, such functionality would also make life way easier under
> > certain circumstances. I'm basically thinking about multi-threaded
> > applications, where several threads make use of the same native
> > objects, and for some reason one of the threads has to destroy one
> > or more of them due to some sort of error happening.
> > 
> > Of course, this can still be considered an application bug, and the
> > application could still make sure none of the threads is going to
> > use the native objects before destroying them, but again, specs
> > allow users to do many non-recommended things.
> > 
> > I think we should try to do our best to gracefully handle those
> > non-desirable API usages, and avoid crashes whenever is possible.
> > 
> > 
> > Pekka did not see this as something crazy to have, but wanted to
> > hear from some of the toolkits guys before making the decision of
> > whether changing the wl_proxy ABI is a good idea.  
> 
> Hi,
> 
> indeed. I think we could do such an addition to the wl_proxy ABI
> safely, if we want to. I just don't know if we want to.
> 
> I'm not so sure about the point of helping multithreaded applications.
> Applications use toolkits, and toolkits' job is to support whatever
> multi-threading ways they want, because they probably have to
> explicitly support threading anyway. The issue is use cases where
> there is no toolkit in between: EGL.

True, but I think there are enough EGL applications out there to
warrant improving wayland in such a way that lets us be nice with those
applications as well.

> 
> Another point that diminishes the multi-threaded use case is that one
> cannot have multiple listeners on a wl_proxy anyway. If you need to
> process events for the same object in more than one thread, you must
> have the support code outside of libwayland-client.

I didn't really gave so much thought about how the implementation would
look like, but I think I was somehow hoping we could add some mechanism
to have multiple listeners. However, having multiple listeners
could cause many headaches and looking at how wayland is designed,
maybe I was aiming too high :-(

> Also, the thread
> assignment with listeners (wl_event_queue) must be done on the object
> creation. It cannot be done afterwards without potential races. Again
> EGL is somewhat special, since it does not need to listen on
> externally created objects.

This is an important point I didn't consider either. I've seen the
client API defines proxy_wrappers to make queue assignments
thread-safe. IIUIC, they act as regular proxies in all situations but
queue assignment, and children objects created with such wrapper will
use the wrapper event queue.

I was wondering if wrappers could be extended in such a way that
destruction listeners could be set, and whenever the actual proxy
object is destroyed, send a notification to all wrappers, which would
call into their specific destruction callback if set.

> 
> Then there is the question of what the locking should look like.
> 
> The EGL specification language is troublesome. I guess it was written
> with X11 in mind, where anyone even outside of the current process can
> go and destroy the native window at any time, without any
> coordination. Obviously it is desireable that it does not lead to a
> crash. However, that is not possible on Wayland. No external actor
> can go and destroy the wl_display or wl_surface from behind your
> back, only the application itself can do that. Therefore, I think if
> the application deliberately shoots its own foot, a crash is ok, just
> like if you pass a bad pointer through EGL API in the first place. I
> am very tempted to say that this should be raised as a EGL spec bug,
> to allow undefined behaviour or such. EGL platform specifications
> could then make their own additional requirements, like on X11 you
> must not crash if Window disappears.

Agreed.

> 
> Those were the cons, then the pros not yet mentioned:
> 
> Currently there is a problem with identifying Wayland protocol objects
> referenced in incoming events, in case there are multiple components
> creating them. See Section "Multiple input handlers" in
> http://ppaalanen.blogspot.fi/2013/11/sub-surfaces-now.html
> for an example of a problematic case. It is not the only one.
> 
> If wl_proxy had destroy listeners like wl_resource does, one could use
> a wl_signal_get()/wl_resource_get_destroy_listener() kind of an
> approach to fetch a component-specific struct, rather than assuming
> who happens to own the singular "user data" pointer and risk a crash.

Thanks for sharing your notes, Pekka. I've been reading through your
article and it's really helpful.

I understand if we end up adding destruction callbacks, as a bonus we
would get a better solution for the "Multiple input handlers" problem,
but it doesn't seem like a strong point in favor of adding destruction
callbacks. The "Multiple input handlers" problem could be resolved in a
very different way not related to destruction callbacks at all.

> 
> > 
> > As an alternative, destruction callbacks could be hung off of
> > wl_egl_window. In a similar way we support wl_egl_window_resize
> > callbacks, we could support wl_egl_window_destroy callbacks.  
> 
> At the moment I would slightly favour the wl_egl_window approach if it
> allows to circumvent the EGL wording and the wording cannot be fixed.

If no one else is interested in adding wl_proxy destruction callbacks,
I'll just go with this alternative and send an appropriate request for
review so it can be integrated in Mesa drivers as well. I wouldn't like
this to be a NVIDIA-specific feature.

> 
> > However, this isn't as foolproof as adding wl_proxy destruction
> > callbacks, since destruction of wl_surface or wl_display objects
> > before wl_egl_window would lead to same issues.
> > 
> > 
> > I really think adding destruction callbacks to wl_proxy would be an
> > improvement worth making, but others' thoughts must be heard
> > first.  
> 
> 
> Thanks,
> pq
> 
> * Unknown Key
> * 0x11AAAAA7

Thanks,

-- 
Miguel