[PATH] core: implement a safe wl_signal_emit

Thu Feb 22 16:53:36 UTC 2018

Hi ongy,

On 22 February 2018 at 16:03, Markus Ongyerth <wl at ongy.net> wrote:
> On 2018/2月/22 02:58, Daniel Stone wrote:
>> On 22 February 2018 at 14:14, Markus Ongyerth <wl at ongy.net> wrote:
>> > The code was buggy the whole time. Just because it was never triggered, does
>> > not imply it's not a bug.
>> > free()ing these struct wl_list without removing them from the signal list
>> > leaves other struct wl_list that are outside the control of the current code
>> > in an invalid, prone to use-after-free, state.
>>
>> There's a difference between something being 'buggy' and a design with
>> non-obvious details you might not like. If destroy handlers not
>> removing their list elements were buggy, we would be seeing bugs from
>> that. But instead it's part of the API contract: when a destroy signal
>> is invoked, you are guaranteed that this will be the first and only
>> access to your list member.
>
> Could you kindly point me towards the point that states this?
> Also the point that properly distinguishes two kinds of wl_signal_emit
> semantics for wl_listener?
>
> I don't want to say it can't be made part of the API, I'm saying it is not.
> And at the current point it's a bug. If the decision is to fix these bugs, by
> changing the API or the code, is another issue.

I can't point you to a document where it is written in stone, no. But
it would break Weston, WebKit, Clutter, and some others. Changing that
would be a break of our effective API: we've exposed behaviour which
people have come to rely on, regardless of whether or not it's
documented.

Equally, you could say that wl_display_get_fd() is 'buggy' because it
should follow the established convention of duplicating FDs returned
by a library, so users can be in control of lifetime. You could 'fix'
that by having wl_display_get_fd() duplicate the FDs it returns, and
when every Wayland client using an event loop complains that we're now
causing them to leak FDs, tell them it's their own fault for relying
on behaviour which was not formally documented as API/ABI. But that's
really not a helpful answer, and it's a very good way to get people to
stop using your software.

> (duplicated)
>>when a destroy signal
>> is invoked, you are guaranteed that this will be the first and only
>> access to your list member.
>
> This is wrong btw. This would prevent us from adding to a listener to a
> signal that already has listeners.
> Or use wl_signal_get on the destroy signal as soon as it has at least one
> listener.
>
> The semantics you want to state is, that it's the last time it is accessed.
> And even that gets iffy if someone wants to wl_signal_get a destroy listener
> at any point.

Thanks, your correction is accurate.

>>This implies that anyone trying to remove
>> their link from the list (accessing other listeners in the list) is
>> buggy.
>
> This would be rather intersting.
> It would be a fair point to add this to the emit callback of destroy
> listeners, but do we have any way to determine if a wl_signal_emit is
> currently in flight for a signal?
> What about the case where we have a struct that contains multiple destroy
> listeners to different components? How can we determine, the destroy event
> isn't a transitive event of another listener we have?

Right, generally speaking, that would be something you need to
explicitly communicate between those components.

> We could have a listener on a client destroy signal to clean ourself up, and
> one on a seats.
> If such a seat is created by a virtual keyboard (I'm going on a limb here, but
> whatever), it would naturally be destroyed with the client.
> Now if we are in the seat destroy signal, how do we determine what to do with
> our client listener?
> If this is caused by a client destruction, we can't remove ourselves form the
> client anymore.
> If this is not caused by a client destruction, we have to remove ourselves.

That's arguably somewhat contrived: you're not concerned about the
lifetime of the client per se, but just the lifetime of the object. So
you can just keep your destroy listener on the object, and know that
if the client gets destroyed then the object will as well. The same
goes for related objects, e.g. wl_surface -> xdg_surface. But it is a
good point, and definitely shows reason to be really careful about
what we do here.

>> > Related to this entire thing:
>> > In [1] you added tests for this and promote something, that is in essence, a
>> > breaking change.
>>
>> It's not a breaking change though: it's the API we've pushed on everyone so far.
>
> It is a breaking change in libwayland.
> Or could you kindly point me to what currently forces me to not remove my
> listeners in libwayland?
> Current libwayland allows both versions as implementation detail, but one
> follows the wl_listener semantics, the other makes asumptions.

That's true, though in practice you have to have the entire stack of
listeners for that resource completely locked down so they do the same
thing. Given the prevalence of implementations which _don't_, we
effectively chose one to encode which precludes the other.

>> > It also makes good wrapper implementations into managed languages annoying.
>> > For example (admittedly my own) [2] ensures a wl_listener can never be lost
>> > and leak memory. It is freed when the Handle is GC'd.
>> > To prevent any use-after-free into this wl_listener, it removes the listener
>> > from the list beforehand.
>> > I would very much like to keep this code (since it is perfectly valid on the
>> > current ABI) and is good design in managed languages.
>>
>> Sure, that is annoying. In hindsight, it probably wasn't a good API
>> for particularly the new generation of managed languages. In the
>> meantime, probably the easiest way to do this, and come into line with
>> all the other users, would be to define a separate destroy-listener
>> type which intentionally leaks its wl_listener link after being
>> signaled, rather than removing it.
>
> Could you kindly point me to the point where this became an API over an
> implementation detail?
> I can't find any point in either the spec or the library documentation that
> guarantees any of the asumptions made for this to be valid.

If we change it, multiple external consumers of this API entrypoint
crash. For me, by default that enshrines something as API rather than
an implementation detail.

Cheers,
Daniel