[PATCH 3/4] tests: add display test that highlights another bug in read_events
Marek Chalupa
mchqwerty at gmail.com
Wed Sep 10 02:32:03 PDT 2014
On 10 September 2014 11:17, Pekka Paalanen <ppaalanen at gmail.com> wrote:
> On Tue, 9 Sep 2014 13:10:48 +0200
> Marek Chalupa <mchqwerty at gmail.com> wrote:
>
> > On 4 September 2014 15:17, Pekka Paalanen <ppaalanen at gmail.com> wrote:
> >
> > > On Fri, 29 Aug 2014 11:21:30 +0200
> > > Marek Chalupa <mchqwerty at gmail.com> wrote:
> > >
> > > > When wl_display_read_events() returns with errno == EAGAIN, we
> > > > naturally try to call it again. But this next call results in
> > > > deadlock.
> > > >
> > > > Signed-off-by: Marek Chalupa <mchqwerty at gmail.com>
> > > > ---
> > > > tests/display-test.c | 88
> > > ++++++++++++++++++++++++++++++++++++++++++++++++++++
> > > > 1 file changed, 88 insertions(+)
> > > >
> > > > diff --git a/tests/display-test.c b/tests/display-test.c
> > > > index 1289866..e1cf830 100644
> > > > --- a/tests/display-test.c
> > > > +++ b/tests/display-test.c
> > > > @@ -32,6 +32,7 @@
> > > > #include <sys/types.h>
> > > > #include <sys/stat.h>
> > > > #include <pthread.h>
> > > > +#include <poll.h>
> > > >
> > > > #include "wayland-private.h"
> > > > #include "wayland-server.h"
> > > > @@ -480,6 +481,93 @@ TEST(threading_cancel_read_tst)
> > > > display_destroy(d);
> > > > }
> > > >
> > > > +/* set this to 1 when it is right that
> > > > + * the thread is woken up */
> > > > +static int threads_can_be_woken_up = 0;
> > > > +
> > > > +static void *
> > > > +thread_eagain_func(void *data) {
> > > > + struct client *c = data;
> > > > +
> > > > + register_reading(c->wl_display);
> > > > +
> > > > + c->display_stopped = 1;
> > > > + assert(wl_display_read_events(c->wl_display) == 0
> > > > + && errno != EAGAIN);
> > > > + /* if thread was sleeping in read_events, this assert
> > > > + * must pass now */
> > > > + assert(threads_can_be_woken_up && "Thread woke up too
> > > > early"); +
> > > > + pthread_exit(NULL);
> > > > +}
> > > > +
> > > > +static void
> > > > +threading_read_eagain(void)
> > > > +{
> > > > + struct client *c = client_connect();
> > > > + pthread_t th1, th2, th3;
> > > > + struct pollfd fds;
> > > > + int i;
> > > > +
> > > > + register_reading(c->wl_display);
> > > > +
> > > > + th1 = create_thread(c, thread_eagain_func);
> > > > + th2 = create_thread(c, thread_eagain_func);
> > > > + th3 = create_thread(c, thread_eagain_func);
> > > > +
> > > > + /* All the threads are sleeping, waiting until read or
> > > > cancel
> > > > + * is called. Since we have no data on socket waiting,
> > > > + * the wl_connection_read should end up with error and set
> > > > errno
> > > > + * to EAGAIN => wl_display_read_events() will return 0 and
> > > > set
> > > errno
> > > > + * to EAGAIN. It should not wake up the sleeping threads,
> > > > so that
> > > > + * this thread can try to read again */
> > > > + alarm(3);
> > > > + assert(wl_display_read_events(c->wl_display) == 0);
> > > > + assert(errno == EAGAIN);
> > >
> > > I think there is a problem here. wl_display_read_events() returning
> > > 0 means success, and in that case errno may not be set if it was a
> > > real success. There is no EAGAIN to be handled, it's already done
> > > inside read_events().
> > >
> > > In this test it won't be a real success, but you cannot code
> > > clients to check for errno after getting 0 from
> > > wl_display_read_events().
> > >
> >
> > > > +
> > > > + /* the same should happen next times - but there was a bug
> > > > when
> > > > + * the main thread got blocked! */
> > > > + for (i = 0; i < 5; ++i) {
> > > > + assert(wl_display_read_events(c->wl_display) == 0);
> > >
> > > Therefore you are here calling wl_display_read_events() again after
> > > it succeeded! And that IMO is a developer error.
> > >
> >
> > yes
> >
> >
> > > Besides, I cannot see anywhere where it says that it is ok to call
> > > wl_display_read_events() again even if it failed, without calling
> > > wl_display_prepare_read() first.
> > >
> > >
> > Since original piece of code returned 0 on EAGAIN, then I suppose the
> > author considered this as a success, so yes, wl_display_read_events()
> > is not probably meant
> > to be called again. But consider this:
> > if the thread that got EAGAIN (the only thread that is not sleeping)
> > calls wl_display_prepare_read()
> > + wl_display_read_events() again, it works. Problem is that it does
> > not have to call it
> > right after the failed wl_display_read_events and the other threads
> > are still sleeping.
>
> EAGAIN is not a failure here. It is only a spurious wakeup, and all
> threads should go back sleeping on whatever they need to sleep on,
> usually poll().
>
> > Moreover, if there are not events on the socket for some time, this
> > one thread is looping in cycle
> > while the other threads are sleeping.
>
> It's not a busy-loop, because the thread would be sleeping in poll() or
> equivalent.
>
> But yes, I see that the other threads sleeping inside their
> read_events() are not woken up, so they do not do a cycle. I'm not sure
> if that is a problem or not. I suppose waking everyone up would not
> hurt, but not waking up would risk deadlocking if the thread with
> EAGAIN never read again.
>
> > What I meant to do was to give the programmer the choice whether to
> > loop or let
> > all threads try it again (cancel read). But as you said before, 0 is
> > wrong return value for that.
> > See end of the e-mail for conclusions.
>
> Yeah, it seems to be wrong to require cancel_read() after calling
> read_events(), no matter whether it succeeded or not.
>
> > Summing it up, in my point of view we have these options:
> >
> > * Let it as it is:
> > this can lead to starving of other threads for some time.
> > Moreover, if the thread that
> > got EAGAIN decides to exit (or simply not to read again) instead
> > of another trial to read, the other threads will block.
>
> Yup, I agree with the analysis here.
>
> > * return -1 and set EAGAIN
> > programmer can either try to read again, or cancel read and let
> > all threads do another cycle. Good thing is
> > that programmer knows that there was an error and can cancel read
> > in the case that the thread is exiting
> > instead of another cycle.
>
> EAGAIN is not an error, which would make this solution an ABI break.
>
> > * wake up threads on EAGAIN
> > this solution will force all threads to do another cycle (or
> > whatever they want) and should not have any side effects. It can lead
> > to big overhead if there are no events on the socket for some time
> > (waking up threads again and again in cycle)
>
> I think we should do this. I don't really see the overhead at all,
> since every thread should have poll() in its loop, so they will not be
> busy-looping.
>
If there is a poll, then the overhead is solved :)
>
> After all, the sleep inside read_events() is only for synchronizing
> reading threads, not for waiting new data to appear.
>
> EAGAIN in read_events() is a success, even though there was nothing
> received. On success (and on failure!) all threads sleeping inside
> read_events() should be woken up. IOW, whatever thread hits the
> reader_count==0 case in read_events() should *always* end up calling
> display_wakeup_threads() before returning.
>
> Does this make sense?
>
It does and I agree this is the most safe solution.
I'll rebase the old
http://lists.freedesktop.org/archives/wayland-devel/2014-August/016932.html
that was created exactly because of this reasoning and send it here.
>
> In fact, I think wl_display_read_event() failing on last_error should
> be equivalent to calling wl_display_cancel_read(). It would be the same
> if the app manually first checked for error and then canceled.
>
>
Sounds reasonable, I'll add a patch for it to the previous one.
>
> Thanks,
> pq
>
Thanks,
Marek
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/wayland-devel/attachments/20140910/dccb6a57/attachment.html>
More information about the wayland-devel
mailing list