[PATH] core: implement a safe wl_signal_emit

Thu Feb 22 22:04:12 UTC 2018

On 2018-02-22 01:36 PM, Markus Ongyerth wrote:
> On 2018/2月/22 12:31, Derek Foreman wrote:
>> On 2018-02-22 10:48 AM, Markus Ongyerth wrote:
>>> On 2018/2月/22 09:34, Derek Foreman wrote:
>>>> On 2018-02-22 08:58 AM, Daniel Stone wrote:
>>>>> Hi,
>>>>>
>>>>> On 22 February 2018 at 14:14, Markus Ongyerth <wl at ongy.net> wrote:
>>>>>>> It seems that this patch makes that assumption invalid, and we would
>>>>>>> need patches to weston, enlightenment, and mutter to prevent a
>>>>>>> use-after-free during the signal emit?  Now I'm seeing valgrind errors
>>>>>>> on E and weston during buffer destroy.
>>>>>>>
>>>>>>> Personally, I don't think we should change this assumption and declare
>>>>>>> the existing code that's worked for years suddenly buggy. :/
>>>>>>
>>>>>> The code was buggy the whole time. Just because it was never triggered, does
>>>>>> not imply it's not a bug.
>>>>>> free()ing these struct wl_list without removing them from the signal list
>>>>>> leaves other struct wl_list that are outside the control of the current code
>>>>>> in an invalid, prone to use-after-free, state.
>>>>>
>>>>> There's a difference between something being 'buggy' and a design with
>>>>> non-obvious details you might not like. If destroy handlers not
>>>>> removing their list elements were buggy, we would be seeing bugs from
>>>>> that. But instead it's part of the API contract: when a destroy signal
>>>>> is invoked, you are guaranteed that this will be the first and only
>>>>> access to your list member. This implies that anyone trying to remove
>>>>> their link from the list (accessing other listeners in the list) is
>>>>> buggy.
>>>>>
>>>>>> Suddenly allowing this is a breaking API change (*some* struct wl_list inside
>>>>>> a wl_listener) can suddenly become invalid for reasons outside the users
>>>>>> control.
>>>>>
>>>>> I don't know if I've quite parsed this right, but as above, not
>>>>> removing elements of a destroy listener list, when the listener is
>>>>> invoked, is our current API.
>>>>>
>>>>>> Related to this entire thing:
>>>>>> In [1] you added tests for this and promote something, that is in essence, a
>>>>>> breaking change.
>>>>>
>>>>> It's not a breaking change though: it's the API we've pushed on everyone so far.
>>>>
>>>> Also, it doesn't prevent external libraries from doing whichever they want
>>>> if they have complete control of the destroy listener list contents.
>>>
>>> So you suggest we break a now mandated api and expose ourselves to funny
>>> implementation detail changes that are now justified, because *we break API*?
>>
>> I'm sorry, I'm have a hard time parsing this.
>>
>> The suggested mandate is that libwayland internals won't touch the listener
>> after you receive the notification.
> Libwayland internals?
> That would be fine. Then effectivly nobody can rely on this either way, if
> they ever want to have code that can be integrated with other consumers of
> libwayland.
> 
> The problem is, that a single library that relies on this will force anyone
> that uses it to adhere to it.
> If we now have a listener that does things properly it will use-after-free if
> it ever shares a signal list with that library.

I've just sent an RFC patch to the list that resolves that issue.

>>
>> That on receipt of a destroy notification you can free your stuff without
>> removing your listener from the list.
> 
> Which is exactly what we don't want. Since that implies whenever we share a
> destroy signal list with a listener from somewhere that's not inherently our
> code, we can't rely on us being allowed to remove ourselves form the list
> (that's everything from libwayland btw.).
> And if we do, we'd have to take the blame for any integration that fails,
> since *WE* break API now.

See above.

>> doing whichever they want
> So if you suggest that we jsut break api here, you actually suggest we do
> something that will break as soon as someone wants to integrate a library that
> also works with another codebase that relies on this mandate.

See above.

>>
>>>>
>>>> What is prevented is libwayland's destroy notifier list walk accessing an
>>>> element again after it is potentially freed by external code.
>>>
>>> Which could be fixed by said node removing itself from a list, instead of
>>> leaving a list in invalid states for asumed behaviour.
>>
>> And, of course, break the whole external world in the process.
>>
>> No.
> These users break interop with qt. If we ever expect a library that uses qt
> (maybe in a plugin for weston?) to be used together with the codebase then
> things break.

See above.

>>
>> We're not going to break years of working code built on what seems to have
>> been a quite reasonable assumption.
>>
> And the asumption that memory that's pointed into (by pointers that are
> *really* easy to fix) sounds pretty reasonable as well.

The assumption that this is really easy to fix is incorrect.

Yeah, it's a one liner in every destroy handler everywhere in code we 
don't follow.  That part is trivial.

Making sure every distro that packages both wayland and something that 
calls wayland updates all those callers to new versions before they 
update wayland to something that violates these assumptions is less trivial.

While I generally despise the word "ecosystem" when used to describe 
anything that doesn't contain fish, we'd really be doing ours harm if we 
broke a basic assumption that... right, you know where I'm going with 
this as I've said it 37 times by now. ;)

> I point out qt here (which afaik implies KDE) to also get a few years, though
> I don't see a reason why 5 year old code that behaves badly would be worth
> more than 3 month old code that (may or may not) behave badly, so I could also
> talk about our implementations.

This is disingenuous though, what you're complaining about isn't 
breaking Qt, it's breaking an interoperation in a single client between 
Qt code and <other library> that makes the assumption some of us are 
claiming should be considered API.

That is, it breaks no existent code - because all that code would 
currently be crashing and someone would've complained to the list about 
it with a concrete example.

> While this mandate does not directly cause crashes in those (which admittably
> is a bit better), it does result in code that's no longer correct. So if
> anyone cares about code correctness, this code is now effectivly broken.

Disagree, it will continue to execute as it always has.  Also, see above ;)

> You effectivly can't choose to not break stuff. You have to pick which current
> usecase you want to break.

See above.

> I don't think there's a good measure, which consumers should get priority.
> (Of course we should ;) )

Unfortunately, when you see my patch you'll notice how I've assigned 
priorities. :(

What I don't see a way to do is allow a walk of the signal notifier list 
during a destruction handler - (for the folks at home, that's something 
that's never been possible, but this thread is a direct result of a 
patch that tried to change that)

>>>>
>>>> We can completely replace the internal data structures in libwayland with
>>>> whatever we want, but we must preserve that behaviour.
>>>
>>> Why can we change one implementation detail, but have to keep another one?
>>
>> One is API, the other is implementation detail, so the question is
>> irrelevant?
> 
> Cuold you kindly point me towards the point that made the implementation
> detail of not touching a certain (currently rather unsepcified) subset of
> wl_list elements part of the API?

No such requirement exists and nobody is proposing making that a 
requirement.  You're claiming it follows directly from other actions 
that it doesn't.

> I can't find the point that guarantees it. Since that would also imply, that I
> can't remove my own destruction listeners from the notify callback and I have
> been doing that so far.

These are not mutually exclusive requirements.  That is an 
implementation detail.  See above.

>>
>> I understand what you're saying, I really do, but it's not pragmatic. Again,
>> we can't break all external users of our library for very little real
>> benefit.
> 
> Again, you break users either way. I do see the difference between
> use-after-free and just making code incorrect, but both of those are breaking
> changes.
> And iirc further down in my previous mail, I pointed out a point in the
> current api docs, that would have to get fixed, in a breaking way.

No, we don't.

BTW, if you think I've flippantly dismissed a concerns that weren't 
isomorphic with "See above." do let me know, but I feel the entropy in 
this discussion is dropping rapidly.  I worried if I trimmed too 
savagely at this point you'd think I dismissed something out of hand.

But I think at this point we're spiraling around the same points over 
and over.

>>
>>>>
>>>> It does not mean we can never rework the destroy signal emit path in
>>>> libwayland to allow some items to be removed by the notification handlers
>>>> and others just freed, or to allow a destroy notifier to touch the list.A
>>>
>>> There is no destroy signal emit path.
>>
>> Sure there is.  It's currently an implementation detail that it happens to
>> be the exact same path as other signal emitters.o
> 
> :) I wouldn't mind to cooperate on this, since destroy signals are really the
> thing we worry about (with this commit).
> If we have a way to specify something as destroy listener, with the added
> semantics, that every listener is expected to be removed from the list after
> the emit, we can probably get something done that works for both usecases.

I think we can perhaps do that going forward - but it would have to be 
in addition to something like the patch I've already proposed, which 
would allow all currently functioning destroy signal users to continue 
to function and interoperate.  (and passes my proposed test case)

At that point the new api becomes more about solving the specific 
problem of allowing list walks from a destroy notifier (for presumably a 
subset of listeners that used the new api)

This turns into a class of problems that can be entirely solved in 
different ways, so it may be hard to gain traction for a solution if 
it's deemed invasive...

> This is certainly the smaller breaking of listener semantics. It may require a
> bit of a hacky thing to do, but it should be doable. I wouldn't mind having a
> whack at that.
> That should probably split what we currently have into:
> `wl_signal_emit`, `wl_signal_emit_safe`, `wl_signal_emit_safe_destroy`
> 
> `wl_signal_emit` would be deprecated, but kept to not surprise library users.
> `wl_signal_emit_safe` would be this patch's version of emit
> `wl_signal_emit_safe_destroy` would make sure to not touch the memory of
> already called listeners while keeping the semantics that allow us to remove
> arbitrary listeners from the signal list.
> I could probably hack compat between the two behavious into that one as well,
> thinking about what it could look like.

If you can, I'll happily review patches - as long as we don't require 
any existing applications/toolkits to suddenly be broken.

I'd like to see this resolved too.

I have a hard time thinking about a solution to the arbitrary removal 
case that doesn't break previous usage.  A separate destroy listener 
list that allows arbitrary walks (with an emit like emersion's code) 
seems like it would resolve that, provided we still had a separate api 
for old style users...

Whether that would land or not requires more opinions than my own.

>>
>>> There is a signal emit path, which may be called on destroy signals, and
>>> suddenly has to follow different semantics because of what exactly?
>>
>> Because a large amount of software will break if we change the currently
>> expected behavior.
>>
>> I realize this leaves me open to all manner of ridiculous slippery slope
>> arguments, such as "if my software depended on a bug in an authentication
>> system that forgot to ask for a password...", so all snark aside can we back
>> away from that ledge now?
> I didn't want to drive it that far. No worries, I think I can justify my
> opinion with actual arguments.

Much appreciated. :D

>>
>> As library authors we have to be pragmatic.  We have to avoid surprising our
>> callers.  While this API constraint is annoying, it is harmless, and
> 
> hehehe Thanks for that one.
>> demanding all old code suddenly conform to a new, different constraint that
>> was never enforced before is too onerous.
> Then why is that the fix you propose?

See above. ;)

> I think you have the same basic misunderstanding (from my point of view of
> course) as Daniel.
> That this change does not force anyone to change their current code.
> I think I layed out my reasoning why I disagree in this (and the mail to
> daniel) at multiple points.

As I hope my patch illustrates, existing code can continue to run, and 
even be made to interoperate, with no surprises.

>>
>>> And how exactly do we expose that to our listeners? By the name of the signal
>>> in the struct?
>>
>> Why do we need to?  A notification callback written to be used as a destroy
>> notifier will be written differently than one intended to be used for other
>> things.
> We have to in the context of sane language bindings. It is exceedingly
> annoying, if something silly like this prevents me from forcing proper
> behaviour of the wl_list type on GC.
> In the context of C alone it's a triviality, in the context of languages that
> actually try to get some safety into types or the runtime, it's horrible.

In the context of a C library that's been in development for almost a 
decade, there are decisions that have been made a long time ago without 
the input of developers of different languages, and we're stuck with 
some of them now. :(

>>> Part one ammend [1] (wl_listener) with:
>>> "The wl_list inside a wl_listener can be invalid (pointer towards free'd
>>> memory) at any time the listener notify is called. For further details see
>>> wl_signal.
>>
>> NAK.
>>
> Make a better one, that properly allows what you intend to allow :)

After we debate the patch I just sent out we can address that again.

> I would never suggest to actualy take this up, but that's the chagne you are
> intending to make form my viewpoint.

I feel your viewpoint was based on assuming certain implementation 
details were set in stone.

> It is better constrained, but this is the change to the listeners, if we just
> officially allow to free them inside a signal list.

Obviously we can't do that in a general sense, only for destruction 
listeners.

>>> Part two ammend [2] (wl_signal) with:
>>> Signals with the name `*_destroy` have special semantics.
>>> If they are currently emitted, any wl_signal_add/wl_signal_get on the signal
>>> or wl_list_remove on the link of any listener in it is invalid.
>>> This is also the cause for invalid struct wl_list entries in wl_listener.
>>
>> NAK.
>>
>> HTH.
> HTH?

Hope That Helps.  bonus snark for your obviously inflammatory 
suggestions. ;)

>>
>> Seriously though, your #2 change is defining behaviour that currently
>> crashes to continue to crash.  That's an implementation detail, and nobody
>> can possibly be relying on it.  We can *fix* that instead of trying to
>> leverage it to start a fight on the internet. ;)
> If it currently crahses (in the context of libwayland, not
> weston/gnome/whatever) then it should probably be added to the documentation.
> Since breaking wl_list_get was one of the points discussed about this patch on
> irc before, so I would expect the wl_*_list_get functions to be valid at any
> point.
> If it is something that crashes in weston/gnome and should now be allowed do
> to your mandate, I don't see how you can claim it's not a breaking change.

By this definition adding new API is a breaking change, as it enables 
something that didn't previously work.

Except it's not a breaking change because nothing previously knew about 
it or could depend on it and function.

I think the key issues we need to address right now are:
1) whether callers can continue to free listeners in destroy handlers 
without removing them first.
2) whether callers can continue to remove their listeners from destroy 
handlers.
3) how to make these interoperate.

I think with my proposed patch we're at yes, yes, and they do.

After that we can worry about list walks from destroy notifiers - 
something that's never ever been possible...

Thanks,
Derek

> 
>>
>>> [1] https://wayland.freedesktop.org/docs/html/apc.html#Server-structwl__listener
>>> [2] https://wayland.freedesktop.org/docs/html/apc.html#Server-structwl__signal
>>>
>>> Sure, the exact way they are specified here is a bit funny.
>>> We could also add that to the various `wl_*_add_destroy_listener` functions.
>>> Then we'd have the (from libwayland side breaking) change that e.g.
>>> `wl_event_loop_get_destroy_listener` can't be called anymore under certain
>>> circumstances.
>>
>> I'll happily review a patch that mentions libwayland won't attempt to access
>> the listeners in the destroy list more than once though.  Should probably
>> write one myself.
>>
>> Thanks,
>> Derek
>>
>>>>
>>>> Thanks,
>>>> Derek
>>>>
>>>>>> It also makes good wrapper implementations into managed languages annoying.
>>>>>> For example (admittedly my own) [2] ensures a wl_listener can never be lost
>>>>>> and leak memory. It is freed when the Handle is GC'd.
>>>>>> To prevent any use-after-free into this wl_listener, it removes the listener
>>>>>> from the list beforehand.
>>>>>> I would very much like to keep this code (since it is perfectly valid on the
>>>>>> current ABI) and is good design in managed languages.
>>>>>
>>>>> Sure, that is annoying. In hindsight, it probably wasn't a good API
>>>>> for particularly the new generation of managed languages. In the
>>>>> meantime, probably the easiest way to do this, and come into line with
>>>>> all the other users, would be to define a separate destroy-listener
>>>>> type which intentionally leaks its wl_listener link after being
>>>>> signaled, rather than removing it.
>>>>>
>>>>> Cheers,
>>>>> Daniel
>>>>> _______________________________________________
>>>>> wayland-devel mailing list
>>>>> wayland-devel at lists.freedesktop.org
>>>>> https://lists.freedesktop.org/mailman/listinfo/wayland-devel
>>>>>
>>>>
>>
>>
>>
>> _______________________________________________
>> wayland-devel mailing list
>> wayland-devel at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/wayland-devel