libwayland-server ABI change: [PATCH] wayland-server: Destroy Clients upon wl_display_destroy()

Pekka Paalanen ppaalanen at gmail.com
Mon Feb 27 16:08:08 UTC 2017


On Wed, 25 Jan 2017 13:29:08 +0200
Pekka Paalanen <ppaalanen at gmail.com> wrote:

> On Wed, 25 Jan 2017 11:08:26 +0800
> Jonas Ã…dahl <jadahl at gmail.com> wrote:
> 
> > On Mon, Jan 23, 2017 at 03:40:11PM +0200, Pekka Paalanen wrote:  
> > > On Tue, 20 Dec 2016 18:40:13 +0530
> > > viveka <vivekananth20 at gmail.com> wrote:
> > >     
> > > > From: "Vivek.A" <vivekananth20 at gmail.com>
> > > > 
> > > >     Without this, client socket file descriptors are being kept open until the process
> > > >     hosting the compositor gets terminated / exit. This causes compositor termination
> > > >     goes undetected through its fd. This could be reproduced by having compositor as
> > > >     one of the threads in a process.
> > > > 
> > > >     https://bugs.freedesktop.org/show_bug.cgi?id=99142
> > > > 
> > > >     Signed-off-by: Vivek.A <vivekananth20 at gmail.com>
> > > > ---
> > > >  src/wayland-server.c | 5 +++++
> > > >  1 file changed, 5 insertions(+)
> > > > 
> > > > diff --git a/src/wayland-server.c b/src/wayland-server.c
> > > > index 9d7d9c1..8195064 100644
> > > > --- a/src/wayland-server.c
> > > > +++ b/src/wayland-server.c
> > > > @@ -979,11 +979,16 @@ wl_socket_alloc(void)
> > > >  WL_EXPORT void
> > > >  wl_display_destroy(struct wl_display *display)
> > > >  {
> > > > +	struct wl_client *c, *cnext;
> > > >  	struct wl_socket *s, *next;
> > > >  	struct wl_global *global, *gnext;
> > > >  
> > > >  	wl_signal_emit(&display->destroy_signal, display);
> > > >  
> > > > +	wl_list_for_each_safe(c, cnext, &display->client_list, link) {
> > > > +		wl_client_destroy(c);
> > > > +	}
> > > > +
> > > >  	wl_list_for_each_safe(s, next, &display->socket_list, link) {
> > > >  		wl_socket_destroy(s);
> > > >  	}    
> > > 
> > > Hi,
> > > 
> > > this needs an update on the documentation to say that existing clients
> > > are destroyed.
> > > 
> > > Previously it has been illegal to have existing clients when destroying
> > > a wl_display, because destroying a wl_client afterwards would have
> > > potentially accessed freed memory: wl_display::client_list list head.
> > > However, we failed to document this.
> > > 
> > > As far as I can tell, Weston also got it wrong, and never destroyed the
> > > clients before the wl_display.
> > > 
> > > Now starts the self-rant, which could be just TL;DR and conclude as: a
> > > good patch, fine by me if you add docs, but Weston is broken anyway.
> > > 
> > > ---
> > > 
> > > (A counter-proposition to this patch could be to just unlink the client
> > > list head, and leave the clients to be destroyed by the compositor. I
> > > have hard time imagining when that would be useful (Graceful shutdown
> > > waiting for clients to exit themselves? This is not a web server.) and
> > > it would be awkward to use, too. Also, destroying the wl_display
> > > removes the only supported way of polling and servicing the clients.
> > > Therefore I reject the counter-proposition.)
> > > 
> > > So I started wondering what happens in Weston, as I should really see
> > > the access to freed memory in valgrind, but I don't. Then I realized
> > > that nothing frees the remaining clients in Weston on shutdown. Ok, I
> > > should see a reduction in leaked memory then. But the results are
> > > depressing.
> > > 
> > > Start weston on x11:
> > > 
> > > $ valgrind -v --leak-check=full --show-reachable=yes --track-origins=yes --num-callers=30 weston --use-pixman
> > > 
> > > Then click (twice... why twice...) the icon to start weston_terminal,
> > > and then close weston while the terminal window is open. This is needed
> > > to have a client remain, because the shell helper clients get torn down
> > > anyway.
> > > 
> > > 
> > > Before patch:
> > > 
> > > ==23476== LEAK SUMMARY:
> > > ==23476==    definitely lost: 232 bytes in 3 blocks
> > > ==23476==    indirectly lost: 22,890 bytes in 47 blocks
> > > ==23476==      possibly lost: 163 bytes in 4 blocks
> > > ==23476==    still reachable: 72,450 bytes in 181 blocks
> > > ==23476==         suppressed: 0 bytes in 0 blocks
> > > ==23476== 
> > > ==23476== ERROR SUMMARY: 7 errors from 7 contexts (suppressed: 0 from 0)
> > > 
> > > After patch:
> > > 
> > > ==26025== LEAK SUMMARY:
> > > ==26025==    definitely lost: 968 bytes in 4 blocks
> > > ==26025==    indirectly lost: 1,584 bytes in 6 blocks
> > > ==26025==      possibly lost: 163 bytes in 4 blocks
> > > ==26025==    still reachable: 72,450 bytes in 181 blocks
> > > ==26025==         suppressed: 0 bytes in 0 blocks
> > > ==26025== 
> > > ==26025== ERROR SUMMARY: 23 errors from 21 contexts (suppressed: 0 from 0)
> > > 
> > > The indirectly lost amount goes down as expected, but the number of
> > > errors grows badly.
> > > 
> > > I suppose the thing is that weston has already pretty much shut down by
> > > the time it calls wl_display_destroy(). It's not prepared to resources
> > > still existing after shutdown, so when wl_display_destroy() goes
> > > destroying the remaining clients, their resources get destroyed, which
> > > then accesses freed memory because the tracking infrastructure has
> > > already been freed and the resources are full of stale pointers.
> > > 
> > > But that is Weston's problem.
> > > 
> > > One might think to move wl_display_destroy() earlier, but it feels
> > > fundamentally wrong, and would likely screw up so many places that
> > > assume the display exists. So I won't go there.
> > > 
> > > Weston will need fixing in any case, but the question is how.
> > > 
> > > Do we apply this patch which automatically destroys remaining clients?
> > > The alternative is to declare the opposite.
> > > 
> > > If we do, then Weston has to either destroy the clients manually before
> > > shutting down or make all the framework proofed against clients getting
> > > destroyed after they were shut down(*).
> > > 
> > > If we don't, Weston still leaks, and all compositors should manually
> > > destroy all remaining clients.
> > > 
> > > As (*) would be pretty hard and probably not worth the effort, plus
> > > kind of counter-intuitive, we need to fix weston to manually destroy
> > > remaining clients anyway. And that is going to be more interesting than
> > > it sounds, because there are shell helper clients that get
> > > automatically respanwed if they disconnect. Maybe.
> > > 
> > > The conclusion: I am in favour of this patch.
> > > 
> > > It is an ABI change though, and while it was crashy to write programs
> > > that would break due to this change, the probability of the problem
> > > going unnoticed is high. Proof: Weston.
> > > 
> > > Second opinions would be appreciated.    
> > 
> > I'm fairly neutral. FWIW, mutter doesn't handle tear down gracefully yet
> > anyway, so from that perspective, it doesn't matter. But from the
> > valgrind summary, it looks like it *could* introduce new segfaults not
> > seen before on non-weston, where it was just a leak before. Which is
> > worse, a leak or a segfault on tear-down? That being said, cleaning up
> > does make sense from an API perspective.  
> 
> Since the alpha release just happened, I'm leaning on postponing this
> patch after the final release, to the next development cycle. There is
> evidently a considerable risk of regressions in users, IMHO. Sorry for
> the delay.

Hi,

the development cycle has started, and I'm reconsidering this. It seems
the fundamental question is how should the wl_display API be used.

First, it is indeed ok to destroy wl_clients at almost any time while
the compositor is running, because socket events may cause a wl_client
to be destroyed already.

wl_display has a destroy signal - should we destroy the remaining
wl_clients before or after emitting the signal?

If the signal is used to tear down helper clients, then destroying
wl_clients before emitting the signal may cause the helper clients to be
immediately respawned.

But, if like in this patch the destroy_signal is emitted first, then
there is no way to tear any supporting framework down after clients are
gone but before the wl_display is destroyed. That means that the
framework needs to hook up to the wl_display destroy_signal and destroy
all clients itself first.

Therefore I think the wl_clients should be destroyed *before* emitting
the wl_display destroy_signal (and clearly documented). This gives the
compositor an opportunity to do any clean-up necessary before it calls
wl_display_destroy(), then wl_display will destroy all the remaining
wl_clients, and then wl_display destroy_signal gets called which will
clean up anything that was left. Finally wl_display_destroy() frees the
wl_display itself.

That is the sequence that leaves the most freedom for the compositor to
choose what it does in which step.

Unfortunately, at least Weston will need fixing to work right with it.

Btw. the order I propose is the opposite from how wl_client_destroy()
destroyes the wl_resources. wl_client_destroy() emits the destroy
signal first, and then destroys all the wl_resources. I think the
difference is justified because wl_client_destroy() is usually called
from within libwayland-server while wl_display_destroy() is always
called by the compositor.



The wl_client destroy loop in wl_display_destroy() cannot simply use
wl_list_for_each_safe(), because callbacks from wl_client destroy
signals might create new clients. I'd recommend moving the whole client
list onto a temporary head first, initializing the original head,
iterating through the temporary head, and finally logging a loud error
message if the list at the original head is not empty. This avoids a
potential crash or busy-loop-freeze if new clients are added when they
shouldn't, while still pointing out the mistake in the compositor.



It still feels quite risky, and the regressions at downstream might be
nasty, even though it is only about shutdown. Automatically destroying
wl_clients at wl_display_destroy() is an ABI change, no matter how I
look at it. I think it breaks older Westons and we cannot have that.

How about instead of destroying wl_clients in wl_display_destroy(), we
add a wl_display_destroy_all_clients()?

Actually, that could already be implemented with
wl_display_get_client_list() from outside of libwayland-server, but it
cannot do the temporary head trick to catch adding new clients at a bad
time.


I am still somewhat confused about what would be the best course of
action, so other opinions would be valuable. I think
wl_display_destroy_all_clients() would be the safest way forward, with
a warning message from wl_display_destroy() if any wl_clients remain.


Thanks,
pq
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 801 bytes
Desc: OpenPGP digital signature
URL: <https://lists.freedesktop.org/archives/wayland-devel/attachments/20170227/284f873b/attachment.sig>


More information about the wayland-devel mailing list