libwayland-server ABI change: [PATCH] wayland-server: Destroy Clients upon wl_display_destroy()

Wed Jan 25 03:08:26 UTC 2017

On Mon, Jan 23, 2017 at 03:40:11PM +0200, Pekka Paalanen wrote:
> On Tue, 20 Dec 2016 18:40:13 +0530
> viveka <vivekananth20 at gmail.com> wrote:
> 
> > From: "Vivek.A" <vivekananth20 at gmail.com>
> > 
> >     Without this, client socket file descriptors are being kept open until the process
> >     hosting the compositor gets terminated / exit. This causes compositor termination
> >     goes undetected through its fd. This could be reproduced by having compositor as
> >     one of the threads in a process.
> > 
> >     https://bugs.freedesktop.org/show_bug.cgi?id=99142
> > 
> >     Signed-off-by: Vivek.A <vivekananth20 at gmail.com>
> > ---
> >  src/wayland-server.c | 5 +++++
> >  1 file changed, 5 insertions(+)
> > 
> > diff --git a/src/wayland-server.c b/src/wayland-server.c
> > index 9d7d9c1..8195064 100644
> > --- a/src/wayland-server.c
> > +++ b/src/wayland-server.c
> > @@ -979,11 +979,16 @@ wl_socket_alloc(void)
> >  WL_EXPORT void
> >  wl_display_destroy(struct wl_display *display)
> >  {
> > +	struct wl_client *c, *cnext;
> >  	struct wl_socket *s, *next;
> >  	struct wl_global *global, *gnext;
> >  
> >  	wl_signal_emit(&display->destroy_signal, display);
> >  
> > +	wl_list_for_each_safe(c, cnext, &display->client_list, link) {
> > +		wl_client_destroy(c);
> > +	}
> > +
> >  	wl_list_for_each_safe(s, next, &display->socket_list, link) {
> >  		wl_socket_destroy(s);
> >  	}
> 
> Hi,
> 
> this needs an update on the documentation to say that existing clients
> are destroyed.
> 
> Previously it has been illegal to have existing clients when destroying
> a wl_display, because destroying a wl_client afterwards would have
> potentially accessed freed memory: wl_display::client_list list head.
> However, we failed to document this.
> 
> As far as I can tell, Weston also got it wrong, and never destroyed the
> clients before the wl_display.
> 
> Now starts the self-rant, which could be just TL;DR and conclude as: a
> good patch, fine by me if you add docs, but Weston is broken anyway.
> 
> ---
> 
> (A counter-proposition to this patch could be to just unlink the client
> list head, and leave the clients to be destroyed by the compositor. I
> have hard time imagining when that would be useful (Graceful shutdown
> waiting for clients to exit themselves? This is not a web server.) and
> it would be awkward to use, too. Also, destroying the wl_display
> removes the only supported way of polling and servicing the clients.
> Therefore I reject the counter-proposition.)
> 
> So I started wondering what happens in Weston, as I should really see
> the access to freed memory in valgrind, but I don't. Then I realized
> that nothing frees the remaining clients in Weston on shutdown. Ok, I
> should see a reduction in leaked memory then. But the results are
> depressing.
> 
> Start weston on x11:
> 
> $ valgrind -v --leak-check=full --show-reachable=yes --track-origins=yes --num-callers=30 weston --use-pixman
> 
> Then click (twice... why twice...) the icon to start weston_terminal,
> and then close weston while the terminal window is open. This is needed
> to have a client remain, because the shell helper clients get torn down
> anyway.
> 
> 
> Before patch:
> 
> ==23476== LEAK SUMMARY:
> ==23476==    definitely lost: 232 bytes in 3 blocks
> ==23476==    indirectly lost: 22,890 bytes in 47 blocks
> ==23476==      possibly lost: 163 bytes in 4 blocks
> ==23476==    still reachable: 72,450 bytes in 181 blocks
> ==23476==         suppressed: 0 bytes in 0 blocks
> ==23476== 
> ==23476== ERROR SUMMARY: 7 errors from 7 contexts (suppressed: 0 from 0)
> 
> After patch:
> 
> ==26025== LEAK SUMMARY:
> ==26025==    definitely lost: 968 bytes in 4 blocks
> ==26025==    indirectly lost: 1,584 bytes in 6 blocks
> ==26025==      possibly lost: 163 bytes in 4 blocks
> ==26025==    still reachable: 72,450 bytes in 181 blocks
> ==26025==         suppressed: 0 bytes in 0 blocks
> ==26025== 
> ==26025== ERROR SUMMARY: 23 errors from 21 contexts (suppressed: 0 from 0)
> 
> The indirectly lost amount goes down as expected, but the number of
> errors grows badly.
> 
> I suppose the thing is that weston has already pretty much shut down by
> the time it calls wl_display_destroy(). It's not prepared to resources
> still existing after shutdown, so when wl_display_destroy() goes
> destroying the remaining clients, their resources get destroyed, which
> then accesses freed memory because the tracking infrastructure has
> already been freed and the resources are full of stale pointers.
> 
> But that is Weston's problem.
> 
> One might think to move wl_display_destroy() earlier, but it feels
> fundamentally wrong, and would likely screw up so many places that
> assume the display exists. So I won't go there.
> 
> Weston will need fixing in any case, but the question is how.
> 
> Do we apply this patch which automatically destroys remaining clients?
> The alternative is to declare the opposite.
> 
> If we do, then Weston has to either destroy the clients manually before
> shutting down or make all the framework proofed against clients getting
> destroyed after they were shut down(*).
> 
> If we don't, Weston still leaks, and all compositors should manually
> destroy all remaining clients.
> 
> As (*) would be pretty hard and probably not worth the effort, plus
> kind of counter-intuitive, we need to fix weston to manually destroy
> remaining clients anyway. And that is going to be more interesting than
> it sounds, because there are shell helper clients that get
> automatically respanwed if they disconnect. Maybe.
> 
> The conclusion: I am in favour of this patch.
> 
> It is an ABI change though, and while it was crashy to write programs
> that would break due to this change, the probability of the problem
> going unnoticed is high. Proof: Weston.
> 
> Second opinions would be appreciated.

I'm fairly neutral. FWIW, mutter doesn't handle tear down gracefully yet
anyway, so from that perspective, it doesn't matter. But from the
valgrind summary, it looks like it *could* introduce new segfaults not
seen before on non-weston, where it was just a leak before. Which is
worse, a leak or a segfault on tear-down? That being said, cleaning up
does make sense from an API perspective.

Jonas

> 
> 
> Thanks,
> pq

> _______________________________________________
> wayland-devel mailing list
> wayland-devel at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/wayland-devel