Catastrophic blocking

Pekka Paalanen ppaalanen at gmail.com
Wed Feb 29 00:58:56 PST 2012


On Tue, 28 Feb 2012 14:32:21 -0500
Kristian Hoegsberg <hoegsberg at gmail.com> wrote:

> On Mon, Feb 27, 2012 at 04:57:42PM +0100, Samuel Rødal wrote:
> > Ignore previous patch, here's the correct version.
> 
> > From 4e1bedaaf05b576f5191f8fe3a34904ab9707414 Mon Sep 17 00:00:00 2001
> > From: =?UTF-8?q?Samuel=20R=C3=B8dal?= <samuel.rodal at nokia.com>
> > Date: Mon, 27 Feb 2012 15:17:20 +0100
> > Subject: [PATCH] Allow update function to not be set in wl_display_get_fd
> > 
> > The same check is done in connection_update, and now with
> > wl_display_flush() there's less need for the client to need to know the
> > connection mask.
> 
> Yeah, ok, looks good.  If you're paranoid about blocking on write,
> you need to poll for write of course, but for non-broken
> apps/compositors the write should never block.

About blocking... broken apps are a fact we must tolerate. If an app
gets stuck and does not read its fd anymore, does that make the
corresponding fd in the server count as not writable? Or will it become
non-writable only after possible kernel buffers have been filled?

Is a writable fd, as indicated by epoll, guaranteed to not block on
sendmsg()? I don't know, but I wouldn't think it is without O_NONBLOCK,
since the kernel cannot cache an arbitrary amount of data, can it?

I also don't see any of the fds getting O_NONBLOCK anywhere.

Could the whole server be blocked by a single client not reading its
fd, making the whole computer appear frozen to a user?

So, I made a test with the current Weston. I added the following patch:

--- a/clients/clickdot.c
+++ b/clients/clickdot.c
@@ -28,6 +28,7 @@
 #include <cairo.h>
 #include <math.h>
 #include <assert.h>
+#include <unistd.h>
 
 #include <linux/input.h>
 #include <wayland-client.h>
@@ -104,6 +105,10 @@ key_handler(struct window *window, struct input *input, uint32_t time,
        case XK_Escape:
                display_exit(clickdot->display);
                break;
+       case XK_h:
+               while (1)
+                       sleep(1);
+               break;
        }
 }

Started Weston in X and clickdot, and made sure it all works, then
pressed 'h' to make clickdot hang.

For a while, everything seemed good. Only clickdot was stuck and Weston
still worked. After a few minutes of mouse-waving, Weston got stuck.
The backtrace of Weston was:

#0  0x00007f13fa9ad4d0 in __sendmsg_nocancel () from /lib64/libc.so.6
#1  0x00007f13fc60e3a1 in wl_connection_data (connection=0x2776360, mask=2) at connection.c:272
#2  0x00007f13fc60e654 in wl_connection_write (connection=0x2776360, data=0x277aa80, count=20) at connection.c:335
#3  0x00007f13fc60f9cc in wl_closure_send (closure=0x277a910, connection=0x2776360) at connection.c:743
#4  0x00007f13fc609f2c in wl_resource_post_event (resource=0x26cb4f0, opcode=0) at wayland-server.c:107
#5  0x00007f13fc60ab41 in default_grab_motion (grab=0x2655f98, time=3360222569, x=292, y=213) at wayland-server.c:457
#6  0x0000000000409cc7 in notify_motion (device=0x2655f00, time=3360222569, x=380, y=305) at compositor.c:1487
#7  0x00007f13f8ce74d3 in x11_compositor_handle_event (fd=9, mask=1, data=0x25b9960) at compositor-x11.c:604
#8  0x00007f13fc60d16e in wl_event_source_fd_dispatch (source=0x27034a0, ep=0x7fffcecd9c00) at event-loop.c:76
#9  0x00007f13fc60dbeb in wl_event_loop_dispatch (loop=0x25b8900, timeout=-1) at event-loop.c:462
#10 0x00007f13fc60b7c1 in wl_display_run (display=0x25b88b0) at wayland-server.c:837
#11 0x000000000040c3cd in main (argc=3, argv=0x7fffcecda018) at compositor.c:2518

Just like I assumed, it is now blocking in sendmsg(). Killing clickdot,
Weston came back to life.

We really need to have the file descriptors in the server to be of
non-blocking kind.


Thanks,
pq


More information about the wayland-devel mailing list