[PATCH 3/4] tests: add display test that highlights another bug in read_events

Tue Sep 9 04:10:48 PDT 2014

On 4 September 2014 15:17, Pekka Paalanen <ppaalanen at gmail.com> wrote:

> On Fri, 29 Aug 2014 11:21:30 +0200
> Marek Chalupa <mchqwerty at gmail.com> wrote:
>
> > When wl_display_read_events() returns with errno == EAGAIN, we
> > naturally try to call it again. But this next call results in deadlock.
> >
> > Signed-off-by: Marek Chalupa <mchqwerty at gmail.com>
> > ---
> >  tests/display-test.c | 88
> ++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 88 insertions(+)
> >
> > diff --git a/tests/display-test.c b/tests/display-test.c
> > index 1289866..e1cf830 100644
> > --- a/tests/display-test.c
> > +++ b/tests/display-test.c
> > @@ -32,6 +32,7 @@
> >  #include <sys/types.h>
> >  #include <sys/stat.h>
> >  #include <pthread.h>
> > +#include <poll.h>
> >
> >  #include "wayland-private.h"
> >  #include "wayland-server.h"
> > @@ -480,6 +481,93 @@ TEST(threading_cancel_read_tst)
> >       display_destroy(d);
> >  }
> >
> > +/* set this to 1 when it is right that
> > + * the thread is woken up */
> > +static int threads_can_be_woken_up = 0;
> > +
> > +static void *
> > +thread_eagain_func(void *data) {
> > +     struct client *c = data;
> > +
> > +     register_reading(c->wl_display);
> > +
> > +     c->display_stopped = 1;
> > +     assert(wl_display_read_events(c->wl_display) == 0
> > +             && errno != EAGAIN);
> > +     /* if thread was sleeping in read_events, this assert
> > +      * must pass now */
> > +     assert(threads_can_be_woken_up && "Thread woke up too early");
> > +
> > +     pthread_exit(NULL);
> > +}
> > +
> > +static void
> > +threading_read_eagain(void)
> > +{
> > +     struct client *c = client_connect();
> > +     pthread_t th1, th2, th3;
> > +     struct pollfd fds;
> > +     int i;
> > +
> > +     register_reading(c->wl_display);
> > +
> > +     th1 = create_thread(c, thread_eagain_func);
> > +     th2 = create_thread(c, thread_eagain_func);
> > +     th3 = create_thread(c, thread_eagain_func);
> > +
> > +     /* All the threads are sleeping, waiting until read or cancel
> > +      * is called. Since we have no data on socket waiting,
> > +      * the wl_connection_read should end up with error and set errno
> > +      * to EAGAIN => wl_display_read_events() will return 0 and set
> errno
> > +      * to EAGAIN. It should not wake up the sleeping threads, so that
> > +      * this thread can try to read again */
> > +     alarm(3);
> > +     assert(wl_display_read_events(c->wl_display) == 0);
> > +     assert(errno == EAGAIN);
>
> I think there is a problem here. wl_display_read_events() returning 0
> means success, and in that case errno may not be set if it was a real
> success. There is no EAGAIN to be handled, it's already done inside
> read_events().
>
> In this test it won't be a real success, but you cannot code clients to
> check for errno after getting 0 from wl_display_read_events().
>

> > +
> > +     /* the same should happen next times - but there was a bug when
> > +      * the main thread got blocked! */
> > +     for (i = 0; i < 5; ++i) {
> > +             assert(wl_display_read_events(c->wl_display) == 0);
>
> Therefore you are here calling wl_display_read_events() again after it
> succeeded! And that IMO is a developer error.
>

yes

> Besides, I cannot see anywhere where it says that it is ok to call
> wl_display_read_events() again even if it failed, without calling
> wl_display_prepare_read() first.
>
>
Since original piece of code returned 0 on EAGAIN, then I suppose the author
considered this as a success, so yes, wl_display_read_events() is not
probably meant
to be called again. But consider this:
if the thread that got EAGAIN (the only thread that is not sleeping) calls
wl_display_prepare_read()
+ wl_display_read_events() again, it works. Problem is that it does not
have to call it
right after the failed wl_display_read_events and the other threads are
still sleeping.
Moreover, if there are not events on the socket for some time, this one
thread is looping in cycle
while the other threads are sleeping.

What I meant to do was to give the programmer the choice whether to loop or
let
all threads try it again (cancel read). But as you said before, 0 is wrong
return value for that.
See end of the e-mail for conclusions.

> But I see something else: wl_display_read_events() does not decrement
> reader_count if it returns due to last_error. I think that needs a fix
> to guarantee the reader_count is decremented, so that the side-effects
> are consistent.
>

Yes, we should fix this. There are cases that can cause a problem.

> > +             assert(errno == EAGAIN);
> > +     }
> > +
> > +     /* generate something to read and wait until the event comes */
> > +     wl_proxy_marshal((struct wl_proxy *) c->tc,
> > +                      0 /* STOP DISPLAY */, 1 /* number */);
> > +     assert(wl_display_flush(c->wl_display) > 0);
> > +
> > +     fds.fd = wl_display_get_fd(c->wl_display);
> > +     fds.events = POLLIN;
> > +     assert(poll(&fds, 1, 3) == 1);
>
> Is timeout of 3 milliseconds intended?
>

Nope, my bad.

>
> > +
> > +     /* threads can be woken up now */
> > +     threads_can_be_woken_up = 1;
> > +
> > +     errno = 0;
> > +     assert(wl_display_read_events(c->wl_display) == 0);
> > +     assert(errno != EAGAIN);
> > +     assert(wl_display_dispatch_pending(c->wl_display) == 1);
> > +
> > +     pthread_join(th1, NULL);
> > +     pthread_join(th2, NULL);
> > +     pthread_join(th3, NULL);
> > +
> > +     client_disconnect(c);
> > +}
> > +
> > +TEST(threading_read_eagain_tst)
> > +{
> > +     struct display *d = display_create();
> > +
> > +     client_create(d, threading_read_eagain);
> > +
> > +     display_run(d);
> > +     display_resume(d);
> > +
> > +     display_destroy(d);
> > +}
> > +
> >  static void *
> >  thread_prepare_and_read2(void *data)
> >  {
>
>
> Thanks,
> pq
>

First, thanks for review.

Summing it up, in my point of view we have these options:

* Let it as it is:
     this can lead to starving of other threads for some time. Moreover, if
the thread that
     got EAGAIN decides to exit (or simply not to read again) instead of
another trial to read, the other threads will block.
* return -1 and set EAGAIN
    programmer can either try to read again, or cancel read and let all
threads do another cycle. Good thing is
    that programmer knows that there was an error and can cancel read in
the case that the thread is exiting
    instead of another cycle.
 * wake up threads on EAGAIN
    this solution will force all threads to do another cycle (or whatever
they want) and should not have any side effects. It can lead to
    big overhead if there are no events on the socket for some time (waking
up threads again and again in cycle)

+ fix the bug with last_error, of course

Is there something that I'm missing?

Thanks,
Marek
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/wayland-devel/attachments/20140909/970ef343/attachment.html>