libwayland-server ABI change: [PATCH] wayland-server: Destroy Clients upon wl_display_destroy()

Mon Jan 23 13:40:11 UTC 2017

On Tue, 20 Dec 2016 18:40:13 +0530
viveka <vivekananth20 at gmail.com> wrote:

> From: "Vivek.A" <vivekananth20 at gmail.com>
> 
>     Without this, client socket file descriptors are being kept open until the process
>     hosting the compositor gets terminated / exit. This causes compositor termination
>     goes undetected through its fd. This could be reproduced by having compositor as
>     one of the threads in a process.
> 
>     https://bugs.freedesktop.org/show_bug.cgi?id=99142
> 
>     Signed-off-by: Vivek.A <vivekananth20 at gmail.com>
> ---
>  src/wayland-server.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/src/wayland-server.c b/src/wayland-server.c
> index 9d7d9c1..8195064 100644
> --- a/src/wayland-server.c
> +++ b/src/wayland-server.c
> @@ -979,11 +979,16 @@ wl_socket_alloc(void)
>  WL_EXPORT void
>  wl_display_destroy(struct wl_display *display)
>  {
> +	struct wl_client *c, *cnext;
>  	struct wl_socket *s, *next;
>  	struct wl_global *global, *gnext;
>  
>  	wl_signal_emit(&display->destroy_signal, display);
>  
> +	wl_list_for_each_safe(c, cnext, &display->client_list, link) {
> +		wl_client_destroy(c);
> +	}
> +
>  	wl_list_for_each_safe(s, next, &display->socket_list, link) {
>  		wl_socket_destroy(s);
>  	}

Hi,

this needs an update on the documentation to say that existing clients
are destroyed.

Previously it has been illegal to have existing clients when destroying
a wl_display, because destroying a wl_client afterwards would have
potentially accessed freed memory: wl_display::client_list list head.
However, we failed to document this.

As far as I can tell, Weston also got it wrong, and never destroyed the
clients before the wl_display.

Now starts the self-rant, which could be just TL;DR and conclude as: a
good patch, fine by me if you add docs, but Weston is broken anyway.

---

(A counter-proposition to this patch could be to just unlink the client
list head, and leave the clients to be destroyed by the compositor. I
have hard time imagining when that would be useful (Graceful shutdown
waiting for clients to exit themselves? This is not a web server.) and
it would be awkward to use, too. Also, destroying the wl_display
removes the only supported way of polling and servicing the clients.
Therefore I reject the counter-proposition.)

So I started wondering what happens in Weston, as I should really see
the access to freed memory in valgrind, but I don't. Then I realized
that nothing frees the remaining clients in Weston on shutdown. Ok, I
should see a reduction in leaked memory then. But the results are
depressing.

Start weston on x11:

$ valgrind -v --leak-check=full --show-reachable=yes --track-origins=yes --num-callers=30 weston --use-pixman

Then click (twice... why twice...) the icon to start weston_terminal,
and then close weston while the terminal window is open. This is needed
to have a client remain, because the shell helper clients get torn down
anyway.

Before patch:

==23476== LEAK SUMMARY:
==23476==    definitely lost: 232 bytes in 3 blocks
==23476==    indirectly lost: 22,890 bytes in 47 blocks
==23476==      possibly lost: 163 bytes in 4 blocks
==23476==    still reachable: 72,450 bytes in 181 blocks
==23476==         suppressed: 0 bytes in 0 blocks
==23476== 
==23476== ERROR SUMMARY: 7 errors from 7 contexts (suppressed: 0 from 0)

After patch:

==26025== LEAK SUMMARY:
==26025==    definitely lost: 968 bytes in 4 blocks
==26025==    indirectly lost: 1,584 bytes in 6 blocks
==26025==      possibly lost: 163 bytes in 4 blocks
==26025==    still reachable: 72,450 bytes in 181 blocks
==26025==         suppressed: 0 bytes in 0 blocks
==26025== 
==26025== ERROR SUMMARY: 23 errors from 21 contexts (suppressed: 0 from 0)

The indirectly lost amount goes down as expected, but the number of
errors grows badly.

I suppose the thing is that weston has already pretty much shut down by
the time it calls wl_display_destroy(). It's not prepared to resources
still existing after shutdown, so when wl_display_destroy() goes
destroying the remaining clients, their resources get destroyed, which
then accesses freed memory because the tracking infrastructure has
already been freed and the resources are full of stale pointers.

But that is Weston's problem.

One might think to move wl_display_destroy() earlier, but it feels
fundamentally wrong, and would likely screw up so many places that
assume the display exists. So I won't go there.

Weston will need fixing in any case, but the question is how.

Do we apply this patch which automatically destroys remaining clients?
The alternative is to declare the opposite.

If we do, then Weston has to either destroy the clients manually before
shutting down or make all the framework proofed against clients getting
destroyed after they were shut down(*).

If we don't, Weston still leaks, and all compositors should manually
destroy all remaining clients.

As (*) would be pretty hard and probably not worth the effort, plus
kind of counter-intuitive, we need to fix weston to manually destroy
remaining clients anyway. And that is going to be more interesting than
it sounds, because there are shell helper clients that get
automatically respanwed if they disconnect. Maybe.

The conclusion: I am in favour of this patch.

It is an ABI change though, and while it was crashy to write programs
that would break due to this change, the probability of the problem
going unnoticed is high. Proof: Weston.

Second opinions would be appreciated.

Thanks,
pq
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 801 bytes
Desc: OpenPGP digital signature
URL: <https://lists.freedesktop.org/archives/wayland-devel/attachments/20170123/3563a32c/attachment-0001.sig>