[PATCH] client: Add wl_display_prepare_read() API to thread model assumptions

Wed May 1 09:24:23 PDT 2013

On 01.05.2013 17:52, Kristian Høgsberg wrote:
> On Tue, Apr 30, 2013 at 11:45:15PM +0200, Uli Schlachter wrote:
>> On 30.04.2013 22:06, Kristian Høgsberg wrote:
>> [...]
>>> diff --git a/src/connection.c b/src/connection.c
>>> index dca134b..b26402f 100644
>>> --- a/src/connection.c
>>> +++ b/src/connection.c
>>> @@ -324,7 +324,7 @@ wl_connection_read(struct wl_connection *connection)
>>>  	msg.msg_flags = 0;
>>>  
>>>  	do {
>>> -		len = wl_os_recvmsg_cloexec(connection->fd, &msg, 0);
>>> +		len = wl_os_recvmsg_cloexec(connection->fd, &msg, MSG_DONTWAIT);
>>>  	} while (len < 0 && errno == EINTR);
>>
>> Uhm, what? Why?
>>
>> If I understand this correctly, this will make wl_connection_read()
>> fail with EAGAIN or EWOULDBLOCK if nothing can be read. Thus I
>> looked at a random caller of this and the first one that grep found
>> was dispatch_queue() in wayland-client.c. This one will call
>> display_fatal_error() if reading fails.
>>
>> Looking at this patch, this behavior isn't changed by this patch.
>>
>> So: Uhm, what? Why?
> 
> Yea, this is a little out-of-the blue.  The reason is that
> wl_display_read_events() can't block in wl_connetion_read().  One of
> the problems this approach also solves is that we no longer blocks
> with the display lock held.  Before, if we end up blocking in
> wl_display_dispatch(), we do so with the lock held, meaning other
> threads can't send requests.  Without this change, a thread could call
> wl_display_prepare_read() and then immediately call
> wl_display_read_events(), which would then block in recvmsg(), with
> the lock held.
> 
> wl_connection_read() is not a public entry point and read_events() is
> the only caller.  The only place this change might be visible is in
> wl_display_dispatch(), which now instead drops the lock and blocks in
> poll().

Well, according to grep, wl_connection_read() is also used by some tests and
src/wayland-server.c. I didn't look at the tests and wayland-server.c looks like
it should never manage to call wl_connection_read().

Hm... Let's just hope the best.

>>> diff --git a/src/wayland-client.c b/src/wayland-client.c
>>> index 7847370..310d054 100644
>>> --- a/src/wayland-client.c
>>> +++ b/src/wayland-client.c
>>> @@ -21,6 +21,8 @@
>>>   * OF THIS SOFTWARE.
>>>   */
>>>  
>>> +#define _GNU_SOURCE
>>> +
>>>  #include <stdlib.h>
>>>  #include <stdint.h>
>>>  #include <stddef.h>
>>> @@ -83,6 +85,9 @@ struct wl_display {
>>>  	struct wl_event_queue queue;
>>>  	struct wl_list event_queue_list;
>>>  	pthread_mutex_t mutex;
>>> +
>>> +	int reader_count;
>>> +	pthread_cond_t reader_cond;
>>>  };
>>>  
>>>  /** \endcond */
>>> @@ -522,6 +527,8 @@ wl_display_connect_to_fd(int fd)
>>>  	wl_event_queue_init(&display->queue, display);
>>>  	wl_list_init(&display->event_queue_list);
>>>  	pthread_mutex_init(&display->mutex, NULL);
>>> +	pthread_cond_init(&display->reader_cond, NULL);
>>> +	display->reader_count = 0;
>>>  
>>>  	wl_map_insert_new(&display->objects, WL_MAP_CLIENT_SIDE, NULL);
>>>  
>>> @@ -537,14 +544,19 @@ wl_display_connect_to_fd(int fd)
>>>  	display->proxy.refcount = 1;
>>>  
>>>  	display->connection = wl_connection_create(display->fd);
>>> -	if (display->connection == NULL) {
>>> -		wl_map_release(&display->objects);
>>> -		close(display->fd);
>>> -		free(display);
>>> -		return NULL;
>>> -	}
>>> +	if (display->connection == NULL)
>>> +		goto err_connection;
>>>  
>>>  	return display;
>>> +
>>> + err_connection:
>>> +	pthread_mutex_destroy(&display->mutex);
>>> +	pthread_cond_destroy(&display->reader_cond);
>>> +	wl_map_release(&display->objects);
>>> +	close(display->fd);
>>> +	free(display);
>>> +
>>> +	return NULL;
>>>  }
>>>  
>>>  /** Connect to a Wayland display
>>> @@ -599,6 +611,7 @@ wl_display_disconnect(struct wl_display *display)
>>>  	wl_map_release(&display->objects);
>>>  	wl_event_queue_release(&display->queue);
>>>  	pthread_mutex_destroy(&display->mutex);
>>> +	pthread_cond_destroy(&display->reader_cond);
>>>  	if (display->fd > 0)
>>>  		close(display->fd);
>>>  
>>> @@ -851,65 +864,202 @@ dispatch_event(struct wl_display *display, struct wl_event_queue *queue)
>>>  }
>>>  
>>>  static int
>>> -dispatch_queue(struct wl_display *display,
>>> -	       struct wl_event_queue *queue, int block)
>>> +read_events(struct wl_display *display)
>>>  {
>>> -	int len, size, count, ret;
>>> +	int len, size;
>>>  
>>> -	pthread_mutex_lock(&display->mutex);
>>> -
>>> -	if (display->last_error)
>>> -		goto err_unlock;
>>> -
>>> -	ret = wl_connection_flush(display->connection);
>>> -	if (ret < 0 && errno != EAGAIN) {
>>> -		display_fatal_error(display, errno);
>>> -		goto err_unlock;
>>> -	}
>>> -
>>> -	if (block && wl_list_empty(&queue->event_list) &&
>>> -	    pthread_equal(display->display_thread, pthread_self())) {
>>> +	display->reader_count--;
>>> +	if (display->reader_count == 0) {
>>
>> Code like this always makes me think "what happens when the reader_count becomes
>> negative?". Without having looked much at the rest of this patch yet, I would
>> suggest making reader_count an unsigned int and adding an
>> assert(display->reader_count > 0); before it is decremented.
> 
> Yeah, I thought about some kind of 'ticket' approach where
> wl_display_prepare_read() gives you some kind of ticket or token that
> you then have to pass to wl_display_read_events().  That way it's
> clear (from the types in the API) that you have to call
> wl_display_prepare_read() exactly once before calling
> wl_display_read_events() (or cancel the read).  But at the end of the
> day all we need to do is track the number of threads trying to read.
> So anything more than the current counter is just for debugging/diagnostics.

Yeah, but an assert(display->reader_count > 0); is IMHO quite cheap and if it
ever triggers, it might make it a lot easier to figure out the reason for the
following deadlock. So any reason why these assertions shouldn't be added?

> Just allocating a small struct to give back in
> wl_display_prepare_read() wouldn't be a big deal.  Or maybe allocating
> bits out of a bit mask would work well enough.

Ah, yeah. I would suggest integers (perhaps hidden behind some typedef?). From
what I read below, you are thinking about adding serial numbers to this anyway,
so those could also be used here.

I am thinking about:

  unsigned int serial_tail = 0;
  unsigned int serial_head = 0;
  (unsigned int reader_count = 0; might still be needed...)

(serial_head - serial_tail) is the number of threads which called
prepare_read(). prepare_read() just increments serial_head and hands out the old
value as the ticket. When a thread does some actual reading from the fd, it sets
serial_tail to serial_head. In read_events(), the loop around
pthread_cond_wait() could then compare serial_tail against its own ticket. At
least this is what I came up with without a lot of thinking.

Plus point is that these are really cheap to "allocate" and to me this feels
better than malloc() or some bit-mask (for speed reasons and perhaps even
scalability ones).

It should be sufficiently unlikely for this unsigned int to wrap around, so that
case can safely(?) be ignored.

For extra "API misuse protection", it might make sense to enforce that all
serials/tickets which were handed out were actually returned. However, right now
I can't come up with a sane, non-ugly way to enforce this. Only idea right now
would be: Only the thread which has ticket serial_tail can increment serial_tail
and all the threads involved have to do this in turn until serial_tail hits
serial_head again. But this is ugly and slow...

(Dunno if "serial" is a good name for this or if "ticket" would be better)

However, feel free to ignore the above, I am just thinking out loud. :-)

>>>  		len = wl_connection_read(display->connection);
>>>  		if (len == -1) {
>>>  			display_fatal_error(display, errno);
>>> -			goto err_unlock;
>>> -		} else if (len == 0) {
>>> -			display_fatal_error(display, EPIPE);
>>> -			goto err_unlock;
>>> +			return -1;
>>>  		}
>>>  		while (len >= 8) {
>>>  			size = queue_event(display, len);
>>>  			if (size == -1) {
>>>  				display_fatal_error(display, errno);
>>> -				goto err_unlock;
>>> +				return -1;
>>>  			} else if (size == 0) {
>>>  				break;
>>>  			}
>>>  			len -= size;
>>>  		}
>>> -	} else if (block && wl_list_empty(&queue->event_list)) {
>>> -		pthread_cond_wait(&queue->cond, &display->mutex);
>>> -		if (display->last_error)
>>> -			goto err_unlock;
>>> +
>>> +		pthread_cond_broadcast(&display->reader_cond);
>>> +	} else {
>>> +		while (display->reader_count > 0)
>>> +			pthread_cond_wait(&display->reader_cond,
>>> +					  &display->mutex);
>>
> 
>> Hm. Nothing makes sure that all the waiting-for-some-reader threads
>> get out of this loop before reader_count gets increased again. In
>> other words:
>>
>> Let's say we have two threads that want to read. Thread 1 ends up
>> waiting on the condition, thread 2 does all the work. Thus, thread 1
>> reads something that might be interesting to thread 2 from the fd,
>> puts it in the queue, and calls wl_display_prepare_read() again.
>>
>> Thus, at this point, reader_count is greater than 0 and thread 1
>> will spin again, but some event that is interesting to this thread
>> is waiting in the queue.
>>
>> No idea how likely this race is and dunno how ugly it would be to fix this...
> 
> Ah, good point, I see... so maybe what we can do is to have a
> read_serial, that we increment before the pthread_cond_broadcast().
> Then change the while loop to:
> 
>         serial = display->read_serial
> 	while (display->read_serial == serial)
> 		pthread_cond_wait(&display->reader_cond,
> 				  &display->mutex);
> 
>>>  	}
>>>  
>>> +	return 0;
>>> +}
>>> +
>>> +/** Read events from display file descriptor
>>> + *
>>> + * \param display The display context object
>>> + * \return 0 on success or -1 on error.  In case of error errno will
>>> + * be set accordingly
>>> + *
>>> + * This will read events from the file descriptor for the display.
>>> + * This function does not dispatch events, it only reads and queues
>>> + * events into their corresponding event queues.  If no data is
>>> + * avilable on the file descriptor, wl_display_read_events() returns
>>> + * immediately.  To dispatch events that may have been queued, call
>>> + * wl_display_dispatch_pending() or
>>> + * wl_display_dispatch_queue_pending().
>>> + *
>>> + * Before calling this function, wl_display_prepare_read() must be
>>> + * called first.
>>> + *
>>> + * \memberof wl_display
>>> + */
>>> +WL_EXPORT int
>>> +wl_display_read_events(struct wl_display *display)
>>> +{
>>> +	int ret;
>>> +
>>> +	pthread_mutex_lock(&display->mutex);
>>
>> ...if you don't like the above assert, then I would at least check for that
>> error here. However, I guess an assert() wouldn't be correct here, since this
>> wouldn't check an internal condition, but an error in the user of the API.
> 
> Yeah... with the bitmask idea it's harder to decrement twice, but it's
> still possible to call wl_display_read_events() with an old bit value
> and cancel somebody else wl_display_prepare_read().  Allocating a
> struct and putting it on a list in the display and then looking it up
> in wl_display_read_events() is the most fail-safe API, but also the
> most work (for nothing).

Yeah, I agree that this wouldn't be nice. I grow more and more fond of my
integer-tickets where this could easily be done. :-)

>>> +	ret = read_events(display);
>>> +
>>> +	pthread_mutex_unlock(&display->mutex);
>>> +
>>> +	return ret;
>>> +}
>>> +
>>> +static int
>>> +dispatch_queue(struct wl_display *display, struct wl_event_queue *queue)
>>> +{
>>> +	int count;
>>> +
>>> +	if (display->last_error)
>>> +		goto err;
>>> +
>>>  	for (count = 0; !wl_list_empty(&queue->event_list); count++) {
>>>  		dispatch_event(display, queue);
>>>  		if (display->last_error)
>>> -			goto err_unlock;
>>> +			goto err;
>>>  	}
>>>  
>>> -	pthread_mutex_unlock(&display->mutex);
>>> -
>>>  	return count;
>>>  
>>> -err_unlock:
>>> +err:
>>>  	errno = display->last_error;
>>> -	pthread_mutex_unlock(&display->mutex);
>>>  
>>>  	return -1;
>>>  }
>>>  
>>> +WL_EXPORT int
>>> +wl_display_prepare_read_queue(struct wl_display *display,
>>> +			      struct wl_event_queue *queue)
>>
>> The new wl_display_prepare_read() gets a long documentation, but
>> this function doesn't? After thinking about this for a moment, I
>> think I know why this is needed (only the "main thread" should use
>> the "main queue", other threads have to synchronize with other
>> queues), but it would still be nice to have docs.
> 
> The two functions are the same, except wl_display_prepare_read() works
> on the main display queue.  So the documentation and uses cases are
> the same, but we should probably put that in the doxygen comment.
> 
>>
>>> +{
>>> +	int ret;
>>> +
>>> +	pthread_mutex_lock(&display->mutex);
>>> +
>>> +	if (!wl_list_empty(&queue->event_list)) {
>>> +		errno = EAGAIN;
>>> +		ret = -1;
>>> +	} else {
>>> +		display->reader_count++;
>>> +		ret = 0;
>>> +	}
>>> +
>>> +	pthread_mutex_unlock(&display->mutex);
>>> +
>>> +	return ret;
>>> +}
>>> +
>>> +/** Prepare to read events after polling file descriptor
>>> + *
>>> + * \param display The display context object
>>> + * \return 0 on success or -1 if event queue was not empty
>>> + *
>>> + * This function must be called before reading from the file
>>> + * descriptor using wl_display_read_events().  Calling
>>> + * wl_display_prepare_read() announces the calling threads intention
>>> + * to read and ensures that until the thread is ready to read and
>>> + * calls wl_display_read_events(), no other thread will read from the
>>> + * file descriptor.  This only succeeds if the event queue is empty
>>> + * though, and if there are undispatched events in the queue, -1 is
>>> + * returned and errno set to EBUSY.
>>> + *
>>> + * If a thread successfully calls wl_display_prepare_read(), it must
>>> + * either call wl_display_read_events() when it's ready or cancel the
>>> + * read intention by calling wl_display_cancel_read().
>>> + *
>>> + * Use this function before polling on the display fd or to integrate
>>> + * the fd into a toolkit event loop in a race-free way.  Typically, a
>>> + * toolkit will call wl_display_dispatch_pending() before sleeping, to
>>> + * make sure it doesn't block with unhandled events.  Upon waking up,
>>> + * it will assume the file descriptor is readable and read events from
>>> + * the fd by calling wl_display_dispatch().  Simplified, we have:
>>> + *
>>> + *   wl_display_dispatch_pending(display);
>>> + *   wl_display_flush(display);
>>> + *   poll(fds, nfds, -1);
>>> + *   wl_display_dispatch(display);
>>> + *
>>> + * There are two races here: first, before blocking in poll(), the fd
>>> + * could become readable and another thread reads the events.  Some of
>>> + * these events may be for the main queue and the other thread will
>>> + * queue them there and then the main thread will go to sleep in
>>> + * poll().  This will stall the application, which could be waiting
>>> + * for a event to kick of the next animation frame, for example.
>>> + *
>>> + * The other race is immediately after poll(), where another thread
>>> + * could preempt and read events before the main thread calls
>>> + * wl_display_dispatch().  This call now blocks and starves the other
>>> + * fds in the event loop.
>>> + *
>>> + * A correct sequence would be:
>>> + *
>>> + *   while (wl_display_prepare_read(display) != 0)
>>> + *           wl_display_dispatch_pending(display);
>>> + *   wl_display_flush(display);
>>> + *   poll(fds, nfds, -1);
>>> + *   wl_display_read_events(display);
>>> + *   wl_display_dispatch_pending(display);
>>
>> I do like the more generic approach that this patch aims
>> for. However, this API looks quite complicated. There is lots of
>> implicit "this function grabs a lock for you". This part was way
>> more clear in the previous versions.
> 
> Yes, it's a complex issue and it's hard to come up with a simple
> solution.  I think the idea of returning some kind of cookie from
> wl_display_prepare_read() to be passed to wl_display_read_events()
> will make the usage clearer.
> 
>> However, the wayland-internal part of this really is a lot simpler.
>>
>> I am not sure what my conclusion is, but I just wanted to mention this...
>>
>>> + * Here we call wl_display_prepare_read(), which ensures that between
>>> + * returning from that call and eventually calling
>>> + * wl_display_read_events(), no other thread will read from the fd and
>>> + * queue events in our queue.  If the call to
>>> + * wl_display_prepare_read() fails, we dispatch the pending events and
>>> + * try again until we're successful.
>>> + *
>>> + * \memberof wl_display
>>> + */
>>> +WL_EXPORT int
>>> +wl_display_prepare_read(struct wl_display *display)
>>> +{
>>> +	return wl_display_prepare_read_queue(display, &display->queue);
>>> +}
>>> +
>>> +/** Release exclusive access to display file descriptor
>>> + *
>>> + * \param display The display context object
>>> + *
>>> + * This releases the exclusive access.  Useful for canceling the lock
>>> + * when a timed out poll returns fd not readable and we're not going
>>> + * to read from the fd anytime soon.
>>> + *
>>> + * \memberof wl_display
>>> + */
>>> +WL_EXPORT void
>>> +wl_display_cancel_read(struct wl_display *display)
>>> +{
>>> +	pthread_mutex_lock(&display->mutex);
>>> +
>>> +	display->reader_count--;
>>
>> Possible underflow here, too. (As above)
>> Let's hope no one ever has to debug why this counter gets a wrong value. ;-)
> 
> Right, we could improve this like described above.
> 
>>> +	if (display->reader_count == 0)
>>> +		pthread_cond_broadcast(&display->reader_cond);
>>
>> Uhm. So in this case, no reading was actually done and the
>> assumption is that the various readers will try again
>> later. Hm. This should actually be safe since the usual assumption
>> is that the fd is not readable, when this gets called....  Ok,
>> ignore this paragraph. :-)
> 
> Yeah, this is the case where everybody is waiting for this guy to come
> and do the read, but he then decides to cancel.  We need to unblock
> the threads in that case (and update read_serial).  Calling
> wl_display_read_events() is not guaranteed to deliver any events to
> your queue, so if you're waiting for a specific event, you have to
> spin until you get it.

Ah, yeah. Somehow I didn't think of that.

>>> +	pthread_mutex_unlock(&display->mutex);
>>> +}
>>> +
>>>  /** Dispatch events in an event queue
>>>   *
>>>   * \param display The display context object
>>> @@ -930,7 +1080,50 @@ WL_EXPORT int
>>>  wl_display_dispatch_queue(struct wl_display *display,
>>>  			  struct wl_event_queue *queue)
>>>  {
>>> -	return dispatch_queue(display, queue, 1);
>>> +	struct pollfd pfd[2];
>>> +	int ret;
>>> +
>>> +	pthread_mutex_lock(&display->mutex);
>>> +
>>> +	ret = dispatch_queue(display, queue);
>>> +	if (ret == -1)
>>> +		goto err_unlock;
>>> +	if (ret > 0) {
>>> +		pthread_mutex_unlock(&display->mutex);
>>> +		return ret;
>>> +	}
>>> +
>>> +	display->reader_count++;
>>> +
>>> +	ret = wl_connection_flush(display->connection);
>>> +	if (ret < 0 && errno != EAGAIN) {
>>> +		display_fatal_error(display, errno);
>>> +		goto err_unlock;
>>> +	}
>>> +
>>> +	pthread_mutex_unlock(&display->mutex);
>>> +
>>> +	pfd[0].fd = display->fd;
>>> +	pfd[0].events = POLLIN;
>>> +	if (poll(pfd, 1, -1) == -1)
>>> +		return -1;
>>> +
>>> +	pthread_mutex_lock(&display->mutex);
>>> +
>>> +	if (read_events(display) == -1)
>>> +		goto err_unlock;
>>> +
>>> +	ret = dispatch_queue(display, queue);
>>> +	if (ret == -1)
>>> +		goto err_unlock;
>>> +
>>> +	pthread_mutex_unlock(&display->mutex);
>>> +
>>> +	return ret;
>>> +
>>> + err_unlock:
>>> +	pthread_mutex_unlock(&display->mutex);
>>> +	return -1;
>>>  }
>>>  
>>>  /** Dispatch pending events in an event queue
>>> @@ -950,7 +1143,18 @@ WL_EXPORT int
>>>  wl_display_dispatch_queue_pending(struct wl_display *display,
>>>  				  struct wl_event_queue *queue)
>>>  {
>>> -	return dispatch_queue(display, queue, 0);
>>> +	pthread_mutex_lock(&display->mutex);
>>> +
>>> +	if (dispatch_queue(display, queue) == -1)
>>> +		goto err_unlock;
>>> +
>>> +	pthread_mutex_unlock(&display->mutex);
>>> +
>>> +	return 0;
>>> +
>>> + err_unlock:
>>> +	pthread_mutex_unlock(&display->mutex);
>>> +	return -1;
>>>  }
>>>  
>>>  /** Process incoming events
>>> @@ -969,7 +1173,8 @@ wl_display_dispatch_queue_pending(struct wl_display *display,
>>>   * or not. For dispatching main queue events without blocking, see \ref
>>>   * wl_display_dispatch_pending().
>>>   *
>>> - * \note Calling this makes the current thread the main one.
>>> + * \note Calling this will release the display file descriptor if this
>>> + * thread acquired it using wl_display_acquire_fd().
>>>   *
>>>   * \sa wl_display_dispatch_pending(), wl_display_dispatch_queue()
>>>   *
>>> @@ -978,9 +1183,7 @@ wl_display_dispatch_queue_pending(struct wl_display *display,
>>>  WL_EXPORT int
>>>  wl_display_dispatch(struct wl_display *display)
>>>  {
>>> -	display->display_thread = pthread_self();
>>> -
>>> -	return dispatch_queue(display, &display->queue, 1);
>>> +	return wl_display_dispatch_queue(display, &display->queue);
>>>  }
>>>  
>>>  /** Dispatch main queue events without reading from the display fd
>>> @@ -1024,9 +1227,7 @@ wl_display_dispatch(struct wl_display *display)
>>>  WL_EXPORT int
>>>  wl_display_dispatch_pending(struct wl_display *display)
>>>  {
>>> -	display->display_thread = pthread_self();
>>> -
>>> -	return dispatch_queue(display, &display->queue, 0);
>>> +	return wl_display_dispatch_queue_pending(display, &display->queue);
>>>  }
>>>  
>>>  /** Retrieve the last error occurred on a display
>>> diff --git a/src/wayland-client.h b/src/wayland-client.h
>>> index 578fa7e..216773a 100644
>>> --- a/src/wayland-client.h
>>> +++ b/src/wayland-client.h
>>> @@ -152,6 +152,12 @@ int wl_display_flush(struct wl_display *display);
>>>  int wl_display_roundtrip(struct wl_display *display);
>>>  struct wl_event_queue *wl_display_create_queue(struct wl_display *display);
>>>  
>>> +int wl_display_prepare_read_queue(struct wl_display *display,
>>> +				  struct wl_event_queue *queue);
>>> +int wl_display_prepare_read(struct wl_display *display);
>>> +void wl_display_cancel_read(struct wl_display *display);
>>> +int wl_display_read_events(struct wl_display *display);
>>> +
>>>  void wl_log_set_handler_client(wl_log_func_t handler);
>>
>> I will try thinking about this patch and the new functions a little
>> more tomorrow. Perhaps even draw some nice pictures. But for now,
>> this looks quite nice to me!
> 
> Cool, thanks for taking a look.
> 
> Kristian
> _______________________________________________
> wayland-devel mailing list
> wayland-devel at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/wayland-devel

-- 
"In the beginning the Universe was created. This has made a lot of
 people very angry and has been widely regarded as a bad move."