[PATH] core: implement a safe wl_signal_emit

Thu Feb 22 18:31:13 UTC 2018

On 2018-02-22 10:48 AM, Markus Ongyerth wrote:
> On 2018/2月/22 09:34, Derek Foreman wrote:
>> On 2018-02-22 08:58 AM, Daniel Stone wrote:
>>> Hi,
>>>
>>> On 22 February 2018 at 14:14, Markus Ongyerth <wl at ongy.net> wrote:
>>>>> It seems that this patch makes that assumption invalid, and we would
>>>>> need patches to weston, enlightenment, and mutter to prevent a
>>>>> use-after-free during the signal emit?  Now I'm seeing valgrind errors
>>>>> on E and weston during buffer destroy.
>>>>>
>>>>> Personally, I don't think we should change this assumption and declare
>>>>> the existing code that's worked for years suddenly buggy. :/
>>>>
>>>> The code was buggy the whole time. Just because it was never triggered, does
>>>> not imply it's not a bug.
>>>> free()ing these struct wl_list without removing them from the signal list
>>>> leaves other struct wl_list that are outside the control of the current code
>>>> in an invalid, prone to use-after-free, state.
>>>
>>> There's a difference between something being 'buggy' and a design with
>>> non-obvious details you might not like. If destroy handlers not
>>> removing their list elements were buggy, we would be seeing bugs from
>>> that. But instead it's part of the API contract: when a destroy signal
>>> is invoked, you are guaranteed that this will be the first and only
>>> access to your list member. This implies that anyone trying to remove
>>> their link from the list (accessing other listeners in the list) is
>>> buggy.
>>>
>>>> Suddenly allowing this is a breaking API change (*some* struct wl_list inside
>>>> a wl_listener) can suddenly become invalid for reasons outside the users
>>>> control.
>>>
>>> I don't know if I've quite parsed this right, but as above, not
>>> removing elements of a destroy listener list, when the listener is
>>> invoked, is our current API.
>>>
>>>> Related to this entire thing:
>>>> In [1] you added tests for this and promote something, that is in essence, a
>>>> breaking change.
>>>
>>> It's not a breaking change though: it's the API we've pushed on everyone so far.
>>
>> Also, it doesn't prevent external libraries from doing whichever they want
>> if they have complete control of the destroy listener list contents.
> 
> So you suggest we break a now mandated api and expose ourselves to funny
> implementation detail changes that are now justified, because *we break API*?

I'm sorry, I'm have a hard time parsing this.

The suggested mandate is that libwayland internals won't touch the 
listener after you receive the notification.

That on receipt of a destroy notification you can free your stuff 
without removing your listener from the list.

>>
>> What is prevented is libwayland's destroy notifier list walk accessing an
>> element again after it is potentially freed by external code.
> 
> Which could be fixed by said node removing itself from a list, instead of
> leaving a list in invalid states for asumed behaviour.

And, of course, break the whole external world in the process.

No.

We're not going to break years of working code built on what seems to 
have been a quite reasonable assumption.

>>
>> We can completely replace the internal data structures in libwayland with
>> whatever we want, but we must preserve that behaviour.
> 
> Why can we change one implementation detail, but have to keep another one?

One is API, the other is implementation detail, so the question is 
irrelevant?

I understand what you're saying, I really do, but it's not pragmatic. 
Again, we can't break all external users of our library for very little 
real benefit.

>>
>> It does not mean we can never rework the destroy signal emit path in
>> libwayland to allow some items to be removed by the notification handlers
>> and others just freed, or to allow a destroy notifier to touch the list.A
> 
> There is no destroy signal emit path.

Sure there is.  It's currently an implementation detail that it happens 
to be the exact same path as other signal emitters.

> There is a signal emit path, which may be called on destroy signals, and
> suddenly has to follow different semantics because of what exactly?

Because a large amount of software will break if we change the currently 
expected behavior.

I realize this leaves me open to all manner of ridiculous slippery slope 
arguments, such as "if my software depended on a bug in an 
authentication system that forgot to ask for a password...", so all 
snark aside can we back away from that ledge now?

As library authors we have to be pragmatic.  We have to avoid surprising 
our callers.  While this API constraint is annoying, it is harmless, and 
demanding all old code suddenly conform to a new, different constraint 
that was never enforced before is too onerous.

> And how exactly do we expose that to our listeners? By the name of the signal
> in the struct?

Why do we need to?  A notification callback written to be used as a 
destroy notifier will be written differently than one intended to be 
used for other things.

Most notably, a non-destruction notifier will be capable of being called 
more than once, or a crash could occur (right now).

>>
>> It just makes it easy to verify that attempts to do that don't break
>> guarantees we've always made.
> 
> Again, kindly point me to the point where an implementation detail made it
> towards a guarantee.
> And for which signals exactly. Under which circumstances, oh and what
> guarantee exactly.

We seem to be talking in circles - as above, we can't break all that 
software.  I don't really like it either, but it's where we're at right now.

> 
> It would be nice if I had a proper proposal to take apart for this to be
> explictly added to the API.
> Otherwise I can provide a (intentionaly snarky) one :)

Oh, that's good, I was hoping we could turn up the snark at some point. ;)

> Part one ammend [1] (wl_listener) with:
> "The wl_list inside a wl_listener can be invalid (pointer towards free'd
> memory) at any time the listener notify is called. For further details see
> wl_signal.

NAK.

> Part two ammend [2] (wl_signal) with:
> Signals with the name `*_destroy` have special semantics.
> If they are currently emitted, any wl_signal_add/wl_signal_get on the signal
> or wl_list_remove on the link of any listener in it is invalid.
> This is also the cause for invalid struct wl_list entries in wl_listener.

NAK.

HTH.

Seriously though, your #2 change is defining behaviour that currently 
crashes to continue to crash.  That's an implementation detail, and 
nobody can possibly be relying on it.  We can *fix* that instead of 
trying to leverage it to start a fight on the internet. ;)

> [1] https://wayland.freedesktop.org/docs/html/apc.html#Server-structwl__listener
> [2] https://wayland.freedesktop.org/docs/html/apc.html#Server-structwl__signal
> 
> Sure, the exact way they are specified here is a bit funny.
> We could also add that to the various `wl_*_add_destroy_listener` functions.
> Then we'd have the (from libwayland side breaking) change that e.g.
> `wl_event_loop_get_destroy_listener` can't be called anymore under certain
> circumstances.

I'll happily review a patch that mentions libwayland won't attempt to 
access the listeners in the destroy list more than once though.  Should 
probably write one myself.

Thanks,
Derek

>>
>> Thanks,
>> Derek
>>
>>>> It also makes good wrapper implementations into managed languages annoying.
>>>> For example (admittedly my own) [2] ensures a wl_listener can never be lost
>>>> and leak memory. It is freed when the Handle is GC'd.
>>>> To prevent any use-after-free into this wl_listener, it removes the listener
>>>> from the list beforehand.
>>>> I would very much like to keep this code (since it is perfectly valid on the
>>>> current ABI) and is good design in managed languages.
>>>
>>> Sure, that is annoying. In hindsight, it probably wasn't a good API
>>> for particularly the new generation of managed languages. In the
>>> meantime, probably the easiest way to do this, and come into line with
>>> all the other users, would be to define a separate destroy-listener
>>> type which intentionally leaks its wl_listener link after being
>>> signaled, rather than removing it.
>>>
>>> Cheers,
>>> Daniel
>>> _______________________________________________
>>> wayland-devel mailing list
>>> wayland-devel at lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/wayland-devel
>>>
>>