[PATCH 3/4] tests: add display test that highlights another bug in read_events

Tue Sep 9 06:39:35 PDT 2014

On 9 September 2014 13:10, Marek Chalupa <mchqwerty at gmail.com> wrote:

>
>
> On 4 September 2014 15:17, Pekka Paalanen <ppaalanen at gmail.com> wrote:
>
>> On Fri, 29 Aug 2014 11:21:30 +0200
>> Marek Chalupa <mchqwerty at gmail.com> wrote:
>>
>> > When wl_display_read_events() returns with errno == EAGAIN, we
>> > naturally try to call it again. But this next call results in deadlock.
>> >
>> > Signed-off-by: Marek Chalupa <mchqwerty at gmail.com>
>> > ---
>> >  tests/display-test.c | 88
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >  1 file changed, 88 insertions(+)
>> >
>> > diff --git a/tests/display-test.c b/tests/display-test.c
>> > index 1289866..e1cf830 100644
>> > --- a/tests/display-test.c
>> > +++ b/tests/display-test.c
>> > @@ -32,6 +32,7 @@
>> >  #include <sys/types.h>
>> >  #include <sys/stat.h>
>> >  #include <pthread.h>
>> > +#include <poll.h>
>> >
>> >  #include "wayland-private.h"
>> >  #include "wayland-server.h"
>> > @@ -480,6 +481,93 @@ TEST(threading_cancel_read_tst)
>> >       display_destroy(d);
>> >  }
>> >
>> > +/* set this to 1 when it is right that
>> > + * the thread is woken up */
>> > +static int threads_can_be_woken_up = 0;
>> > +
>> > +static void *
>> > +thread_eagain_func(void *data) {
>> > +     struct client *c = data;
>> > +
>> > +     register_reading(c->wl_display);
>> > +
>> > +     c->display_stopped = 1;
>> > +     assert(wl_display_read_events(c->wl_display) == 0
>> > +             && errno != EAGAIN);
>> > +     /* if thread was sleeping in read_events, this assert
>> > +      * must pass now */
>> > +     assert(threads_can_be_woken_up && "Thread woke up too early");
>> > +
>> > +     pthread_exit(NULL);
>> > +}
>> > +
>> > +static void
>> > +threading_read_eagain(void)
>> > +{
>> > +     struct client *c = client_connect();
>> > +     pthread_t th1, th2, th3;
>> > +     struct pollfd fds;
>> > +     int i;
>> > +
>> > +     register_reading(c->wl_display);
>> > +
>> > +     th1 = create_thread(c, thread_eagain_func);
>> > +     th2 = create_thread(c, thread_eagain_func);
>> > +     th3 = create_thread(c, thread_eagain_func);
>> > +
>> > +     /* All the threads are sleeping, waiting until read or cancel
>> > +      * is called. Since we have no data on socket waiting,
>> > +      * the wl_connection_read should end up with error and set errno
>> > +      * to EAGAIN => wl_display_read_events() will return 0 and set
>> errno
>> > +      * to EAGAIN. It should not wake up the sleeping threads, so that
>> > +      * this thread can try to read again */
>> > +     alarm(3);
>> > +     assert(wl_display_read_events(c->wl_display) == 0);
>> > +     assert(errno == EAGAIN);
>>
>> I think there is a problem here. wl_display_read_events() returning 0
>> means success, and in that case errno may not be set if it was a real
>> success. There is no EAGAIN to be handled, it's already done inside
>> read_events().
>>
>> In this test it won't be a real success, but you cannot code clients to
>> check for errno after getting 0 from wl_display_read_events().
>>
>
>> > +
>> > +     /* the same should happen next times - but there was a bug when
>> > +      * the main thread got blocked! */
>> > +     for (i = 0; i < 5; ++i) {
>> > +             assert(wl_display_read_events(c->wl_display) == 0);
>>
>> Therefore you are here calling wl_display_read_events() again after it
>> succeeded! And that IMO is a developer error.
>>
>
> yes
>
>
>> Besides, I cannot see anywhere where it says that it is ok to call
>> wl_display_read_events() again even if it failed, without calling
>> wl_display_prepare_read() first.
>>
>>
> Since original piece of code returned 0 on EAGAIN, then I suppose the
> author
> considered this as a success, so yes, wl_display_read_events() is not
> probably meant
> to be called again. But consider this:
> if the thread that got EAGAIN (the only thread that is not sleeping) calls
> wl_display_prepare_read()
> + wl_display_read_events() again, it works. Problem is that it does not
> have to call it
> right after the failed wl_display_read_events and the other threads are
> still sleeping.
> Moreover, if there are not events on the socket for some time, this one
> thread is looping in cycle
> while the other threads are sleeping.
>
> What I meant to do was to give the programmer the choice whether to loop
> or let
> all threads try it again (cancel read). But as you said before, 0 is wrong
> return value for that.
> See end of the e-mail for conclusions.
>
>
>> But I see something else: wl_display_read_events() does not decrement
>> reader_count if it returns due to last_error. I think that needs a fix
>> to guarantee the reader_count is decremented, so that the side-effects
>> are consistent.
>>
>
> Yes, we should fix this. There are cases that can cause a problem.
>

Actually, I can not make up any use-case where it is problem... do you have
any?

> > +             assert(errno == EAGAIN);
>> > +     }
>> > +
>> > +     /* generate something to read and wait until the event comes */
>> > +     wl_proxy_marshal((struct wl_proxy *) c->tc,
>> > +                      0 /* STOP DISPLAY */, 1 /* number */);
>> > +     assert(wl_display_flush(c->wl_display) > 0);
>> > +
>> > +     fds.fd = wl_display_get_fd(c->wl_display);
>> > +     fds.events = POLLIN;
>> > +     assert(poll(&fds, 1, 3) == 1);
>>
>> Is timeout of 3 milliseconds intended?
>>
>
> Nope, my bad.
>
>
>>
>> > +
>> > +     /* threads can be woken up now */
>> > +     threads_can_be_woken_up = 1;
>> > +
>> > +     errno = 0;
>> > +     assert(wl_display_read_events(c->wl_display) == 0);
>> > +     assert(errno != EAGAIN);
>> > +     assert(wl_display_dispatch_pending(c->wl_display) == 1);
>> > +
>> > +     pthread_join(th1, NULL);
>> > +     pthread_join(th2, NULL);
>> > +     pthread_join(th3, NULL);
>> > +
>> > +     client_disconnect(c);
>> > +}
>> > +
>> > +TEST(threading_read_eagain_tst)
>> > +{
>> > +     struct display *d = display_create();
>> > +
>> > +     client_create(d, threading_read_eagain);
>> > +
>> > +     display_run(d);
>> > +     display_resume(d);
>> > +
>> > +     display_destroy(d);
>> > +}
>> > +
>> >  static void *
>> >  thread_prepare_and_read2(void *data)
>> >  {
>>
>>
>> Thanks,
>> pq
>>
>
> First, thanks for review.
>
> Summing it up, in my point of view we have these options:
>
> * Let it as it is:
>      this can lead to starving of other threads for some time. Moreover,
> if the thread that
>      got EAGAIN decides to exit (or simply not to read again) instead of
> another trial to read, the other threads will block.
> * return -1 and set EAGAIN
>     programmer can either try to read again, or cancel read and let all
> threads do another cycle. Good thing is
>     that programmer knows that there was an error and can cancel read in
> the case that the thread is exiting
>     instead of another cycle.
>  * wake up threads on EAGAIN
>     this solution will force all threads to do another cycle (or whatever
> they want) and should not have any side effects. It can lead to
>     big overhead if there are no events on the socket for some time
> (waking up threads again and again in cycle)
>
> + fix the bug with last_error, of course
>
> Is there something that I'm missing?
>
> Thanks,
> Marek
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/wayland-devel/attachments/20140909/187551ce/attachment.html>