[PATH] core: implement a safe wl_signal_emit

Thu Feb 22 19:36:32 UTC 2018

On 2018/2月/22 12:31, Derek Foreman wrote:
> On 2018-02-22 10:48 AM, Markus Ongyerth wrote:
> > On 2018/2月/22 09:34, Derek Foreman wrote:
> > > On 2018-02-22 08:58 AM, Daniel Stone wrote:
> > > > Hi,
> > > > 
> > > > On 22 February 2018 at 14:14, Markus Ongyerth <wl at ongy.net> wrote:
> > > > > > It seems that this patch makes that assumption invalid, and we would
> > > > > > need patches to weston, enlightenment, and mutter to prevent a
> > > > > > use-after-free during the signal emit?  Now I'm seeing valgrind errors
> > > > > > on E and weston during buffer destroy.
> > > > > > 
> > > > > > Personally, I don't think we should change this assumption and declare
> > > > > > the existing code that's worked for years suddenly buggy. :/
> > > > > 
> > > > > The code was buggy the whole time. Just because it was never triggered, does
> > > > > not imply it's not a bug.
> > > > > free()ing these struct wl_list without removing them from the signal list
> > > > > leaves other struct wl_list that are outside the control of the current code
> > > > > in an invalid, prone to use-after-free, state.
> > > > 
> > > > There's a difference between something being 'buggy' and a design with
> > > > non-obvious details you might not like. If destroy handlers not
> > > > removing their list elements were buggy, we would be seeing bugs from
> > > > that. But instead it's part of the API contract: when a destroy signal
> > > > is invoked, you are guaranteed that this will be the first and only
> > > > access to your list member. This implies that anyone trying to remove
> > > > their link from the list (accessing other listeners in the list) is
> > > > buggy.
> > > > 
> > > > > Suddenly allowing this is a breaking API change (*some* struct wl_list inside
> > > > > a wl_listener) can suddenly become invalid for reasons outside the users
> > > > > control.
> > > > 
> > > > I don't know if I've quite parsed this right, but as above, not
> > > > removing elements of a destroy listener list, when the listener is
> > > > invoked, is our current API.
> > > > 
> > > > > Related to this entire thing:
> > > > > In [1] you added tests for this and promote something, that is in essence, a
> > > > > breaking change.
> > > > 
> > > > It's not a breaking change though: it's the API we've pushed on everyone so far.
> > > 
> > > Also, it doesn't prevent external libraries from doing whichever they want
> > > if they have complete control of the destroy listener list contents.
> > 
> > So you suggest we break a now mandated api and expose ourselves to funny
> > implementation detail changes that are now justified, because *we break API*?
> 
> I'm sorry, I'm have a hard time parsing this.
> 
> The suggested mandate is that libwayland internals won't touch the listener
> after you receive the notification.
Libwayland internals?
That would be fine. Then effectivly nobody can rely on this either way, if 
they ever want to have code that can be integrated with other consumers of 
libwayland.

The problem is, that a single library that relies on this will force anyone 
that uses it to adhere to it.
If we now have a listener that does things properly it will use-after-free if 
it ever shares a signal list with that library.

> 
> That on receipt of a destroy notification you can free your stuff without
> removing your listener from the list.

Which is exactly what we don't want. Since that implies whenever we share a 
destroy signal list with a listener from somewhere that's not inherently our 
code, we can't rely on us being allowed to remove ourselves form the list 
(that's everything from libwayland btw.).
And if we do, we'd have to take the blame for any integration that fails, 
since *WE* break API now.

> doing whichever they want
So if you suggest that we jsut break api here, you actually suggest we do 
something that will break as soon as someone wants to integrate a library that 
also works with another codebase that relies on this mandate.

> 
> > > 
> > > What is prevented is libwayland's destroy notifier list walk accessing an
> > > element again after it is potentially freed by external code.
> > 
> > Which could be fixed by said node removing itself from a list, instead of
> > leaving a list in invalid states for asumed behaviour.
> 
> And, of course, break the whole external world in the process.
> 
> No.
These users break interop with qt. If we ever expect a library that uses qt 
(maybe in a plugin for weston?) to be used together with the codebase then 
things break.
> 
> We're not going to break years of working code built on what seems to have
> been a quite reasonable assumption.
> 
And the asumption that memory that's pointed into (by pointers that are 
*really* easy to fix) sounds pretty reasonable as well.

I point out qt here (which afaik implies KDE) to also get a few years, though 
I don't see a reason why 5 year old code that behaves badly would be worth 
more than 3 month old code that (may or may not) behave badly, so I could also 
talk about our implementations.

While this mandate does not directly cause crashes in those (which admittably 
is a bit better), it does result in code that's no longer correct. So if 
anyone cares about code correctness, this code is now effectivly broken.

You effectivly can't choose to not break stuff. You have to pick which current 
usecase you want to break.

I don't think there's a good measure, which consumers should get priority.
(Of course we should ;) )

> > > 
> > > We can completely replace the internal data structures in libwayland with
> > > whatever we want, but we must preserve that behaviour.
> > 
> > Why can we change one implementation detail, but have to keep another one?
> 
> One is API, the other is implementation detail, so the question is
> irrelevant?

Cuold you kindly point me towards the point that made the implementation 
detail of not touching a certain (currently rather unsepcified) subset of 
wl_list elements part of the API?
I can't find the point that guarantees it. Since that would also imply, that I 
can't remove my own destruction listeners from the notify callback and I have 
been doing that so far.

> 
> I understand what you're saying, I really do, but it's not pragmatic. Again,
> we can't break all external users of our library for very little real
> benefit.

Again, you break users either way. I do see the difference between 
use-after-free and just making code incorrect, but both of those are breaking 
changes.
And iirc further down in my previous mail, I pointed out a point in the 
current api docs, that would have to get fixed, in a breaking way.

> 
> > > 
> > > It does not mean we can never rework the destroy signal emit path in
> > > libwayland to allow some items to be removed by the notification handlers
> > > and others just freed, or to allow a destroy notifier to touch the list.A
> > 
> > There is no destroy signal emit path.
> 
> Sure there is.  It's currently an implementation detail that it happens to
> be the exact same path as other signal emitters.o

:) I wouldn't mind to cooperate on this, since destroy signals are really the 
thing we worry about (with this commit).
If we have a way to specify something as destroy listener, with the added 
semantics, that every listener is expected to be removed from the list after 
the emit, we can probably get something done that works for both usecases.

This is certainly the smaller breaking of listener semantics. It may require a 
bit of a hacky thing to do, but it should be doable. I wouldn't mind having a 
whack at that.
That should probably split what we currently have into:
`wl_signal_emit`, `wl_signal_emit_safe`, `wl_signal_emit_safe_destroy`

`wl_signal_emit` would be deprecated, but kept to not surprise library users.
`wl_signal_emit_safe` would be this patch's version of emit
`wl_signal_emit_safe_destroy` would make sure to not touch the memory of 
already called listeners while keeping the semantics that allow us to remove 
arbitrary listeners from the signal list.
I could probably hack compat between the two behavious into that one as well, 
thinking about what it could look like.

> 
> > There is a signal emit path, which may be called on destroy signals, and
> > suddenly has to follow different semantics because of what exactly?
> 
> Because a large amount of software will break if we change the currently
> expected behavior.
> 
> I realize this leaves me open to all manner of ridiculous slippery slope
> arguments, such as "if my software depended on a bug in an authentication
> system that forgot to ask for a password...", so all snark aside can we back
> away from that ledge now?
I didn't want to drive it that far. No worries, I think I can justify my 
opinion with actual arguments.
> 
> As library authors we have to be pragmatic.  We have to avoid surprising our
> callers.  While this API constraint is annoying, it is harmless, and

hehehe Thanks for that one.
> demanding all old code suddenly conform to a new, different constraint that
> was never enforced before is too onerous.
Then why is that the fix you propose?

I think you have the same basic misunderstanding (from my point of view of 
course) as Daniel.
That this change does not force anyone to change their current code.
I think I layed out my reasoning why I disagree in this (and the mail to 
daniel) at multiple points.

> 
> > And how exactly do we expose that to our listeners? By the name of the signal
> > in the struct?
> 
> Why do we need to?  A notification callback written to be used as a destroy
> notifier will be written differently than one intended to be used for other
> things.
We have to in the context of sane language bindings. It is exceedingly 
annoying, if something silly like this prevents me from forcing proper 
behaviour of the wl_list type on GC.
In the context of C alone it's a triviality, in the context of languages that 
actually try to get some safety into types or the runtime, it's horrible.
> 
> Most notably, a non-destruction notifier will be capable of being called
> more than once, or a crash could occur (right now).
> 
> > > 
> > > It just makes it easy to verify that attempts to do that don't break
> > > guarantees we've always made.
> > 
> > Again, kindly point me to the point where an implementation detail made it
> > towards a guarantee.
> > And for which signals exactly. Under which circumstances, oh and what
> > guarantee exactly.
> 
> We seem to be talking in circles - as above, we can't break all that
> software.  I don't really like it either, but it's where we're at right now.
> 
> > 
> > It would be nice if I had a proper proposal to take apart for this to be
> > explictly added to the API.
> > Otherwise I can provide a (intentionaly snarky) one :)
> 
> Oh, that's good, I was hoping we could turn up the snark at some point. ;)
> 
> > Part one ammend [1] (wl_listener) with:
> > "The wl_list inside a wl_listener can be invalid (pointer towards free'd
> > memory) at any time the listener notify is called. For further details see
> > wl_signal.
> 
> NAK.
> 
Make a better one, that properly allows what you intend to allow :)
I would never suggest to actualy take this up, but that's the chagne you are 
intending to make form my viewpoint.
It is better constrained, but this is the change to the listeners, if we just 
officially allow to free them inside a signal list.

> > Part two ammend [2] (wl_signal) with:
> > Signals with the name `*_destroy` have special semantics.
> > If they are currently emitted, any wl_signal_add/wl_signal_get on the signal
> > or wl_list_remove on the link of any listener in it is invalid.
> > This is also the cause for invalid struct wl_list entries in wl_listener.
> 
> NAK.
> 
> HTH.
HTH?
> 
> Seriously though, your #2 change is defining behaviour that currently
> crashes to continue to crash.  That's an implementation detail, and nobody
> can possibly be relying on it.  We can *fix* that instead of trying to
> leverage it to start a fight on the internet. ;)
If it currently crahses (in the context of libwayland, not 
weston/gnome/whatever) then it should probably be added to the documentation.
Since breaking wl_list_get was one of the points discussed about this patch on 
irc before, so I would expect the wl_*_list_get functions to be valid at any 
point.
If it is something that crashes in weston/gnome and should now be allowed do 
to your mandate, I don't see how you can claim it's not a breaking change.

> 
> > [1] https://wayland.freedesktop.org/docs/html/apc.html#Server-structwl__listener
> > [2] https://wayland.freedesktop.org/docs/html/apc.html#Server-structwl__signal
> > 
> > Sure, the exact way they are specified here is a bit funny.
> > We could also add that to the various `wl_*_add_destroy_listener` functions.
> > Then we'd have the (from libwayland side breaking) change that e.g.
> > `wl_event_loop_get_destroy_listener` can't be called anymore under certain
> > circumstances.
> 
> I'll happily review a patch that mentions libwayland won't attempt to access
> the listeners in the destroy list more than once though.  Should probably
> write one myself.
> 
> Thanks,
> Derek
> 
> > > 
> > > Thanks,
> > > Derek
> > > 
> > > > > It also makes good wrapper implementations into managed languages annoying.
> > > > > For example (admittedly my own) [2] ensures a wl_listener can never be lost
> > > > > and leak memory. It is freed when the Handle is GC'd.
> > > > > To prevent any use-after-free into this wl_listener, it removes the listener
> > > > > from the list beforehand.
> > > > > I would very much like to keep this code (since it is perfectly valid on the
> > > > > current ABI) and is good design in managed languages.
> > > > 
> > > > Sure, that is annoying. In hindsight, it probably wasn't a good API
> > > > for particularly the new generation of managed languages. In the
> > > > meantime, probably the easiest way to do this, and come into line with
> > > > all the other users, would be to define a separate destroy-listener
> > > > type which intentionally leaks its wl_listener link after being
> > > > signaled, rather than removing it.
> > > > 
> > > > Cheers,
> > > > Daniel
> > > > _______________________________________________
> > > > wayland-devel mailing list
> > > > wayland-devel at lists.freedesktop.org
> > > > https://lists.freedesktop.org/mailman/listinfo/wayland-devel
> > > > 
> > > 
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/wayland-devel/attachments/20180222/7718454b/attachment-0001.sig>