Wayland fd watching

Maksim Sisov msisov at igalia.com
Thu Jan 13 08:05:56 UTC 2022


+ wayland-devel ML.

Hi Pekka, 

Thanks for your answers! Please see my questions inlined. 

On 2022-01-12 16:16, Pekka Paalanen wrote:

> On Mon, 10 Jan 2022 08:53:50 +0200
> Maksim Sisov <msisov at igalia.com> wrote:
> 
>> Hi Pekka, 
>> 
>> Thanks for answering all my previous questions before.
> 
> Hi Maksim,
> 
> looks like that was a year ago. I had forgot. :-) 

Time flies!

> 
>> I came up with a new question and wondered what was your opinion. 
>> 
>> In Chromium, we have two modes - one is normal when a GPU service runs
>> in a separate sandboxed process and uses surfaceless path with libgbm
>> (we pass dmabuf to the browser process where Wayland connection is...),
>> and another one is --in-process-gpu when the GPU service runs in the
>> same browser process, but in a different thread. That also uses
>> surfaceless path by default. 
>> 
>> However, when either libgbm is not available or a system in question
>> doesn't support drm render nodes, the browser may fall back to use
>> Wayland EGL. However, it is only used if the browser runs with
>> --in-process-gpu flag as we need to access a wl_surface somehow. That
>> means the GPU service on a separate thread that use Wayland EGL and the
>> main UI thread that actually does all the UI stuff, manages the
>> connection, etc. 
>> 
>> Long time ago, we figured out there was a problem with the
>> --in-process-gpu + Wayland EGL. But given both Chromium as a Wayland
>> client and Wayland EGL that is another client prepare/read/dispatch
>> events (each own event queue, of course), we started to experience
>> deadlocks.
> 
> Surely they are the same client (connection)?

Yes, they are.

> 
>> The deadlock happened when Chromium was closing a native window and our
>> event loop had already prepared to read for the display, but it then
>> didn't read as the thread was closing the GPU thread, which was calling
>> eglSwapBuffers, which internally also called prepare and then read, but
>> it didn't read, but rather was blocked as another thread already
>> prepared to read, but didn't read either. So, the UI thread could make a
>> blocking wait until the GPU thread is done doing its stuff before
>> tearing it down. 
>> 
>> As you can see from my description, the UI thread prepared to read the
>> wayland display, then switched to tearing down the GPU thread, which was
>> busy with processing our gl commands, and it had to wait until its done.
>> The GPU thread used Wayland EGL, which also prepared to read and the
>> read call was blocked because there were other clients willing to read.
> 
> I see. Yes, there is the assumption that if a thread prepares to read,
> then it will either read or cancel the read in finite time (ASAP).
> 
>> To overcome this problem, it was decided to use a separate polling
>> thread. What it does now is that the thread prepares/polls/reads (we use
>> libevent or libglib, btw. this decision is based on how chromium is
>> configured), while the UI thread calls dispatch events whenever our
>> "polling" thread asks. This way, the polling thread can always call
>> wl_display_read_events and the deadlock doesn't ever happen.
> 
> Right, I can see that working.
> 
>> I wonder if this is the right way to do so. I also wonder if doing this
>> stuff on a separate thread gives any performance benefits?
> 
> It works, but it's not a "proposed" design in my opinion. Doing
> all of prepare/read/dispatch in each thread is the "intended" usage.
> 
> I think the core of the problem is the thread that prepares to read and
> then doesn't read but goes to do something else. That is basically a
> violation of the API contract of libwayland-client. Unfortunately in
> this case it leads to a solid deadlock. The documentation is very clear
> about it:
> 
> * This function (or wl_display_prepare_read()) must be called before reading
> * from the file descriptor using wl_display_read_events(). Calling
> * wl_display_prepare_read_queue() announces the calling thread's intention to
> * read and ensures that until the thread is ready to read and calls
> * wl_display_read_events(), no other thread will read from the file descriptor.
> 
> You could also call wl_display_cancel_read() instead:
> 
> * After a thread successfully called wl_display_prepare_read() it must
> * either call wl_display_read_events() or wl_display_cancel_read().
> * If the threads do not follow this rule it will lead to deadlock.
> 
> But if you cancel, then you must not read without preparing again first.
> 
> The separate polling thread does not sound bad though. It is a correct
> design. Now that I think of it, wl_display_read_events() has a
> pthread_cond_wait() in it that blocks all other reading threads until
> the last reading thread actually calls read (or cancel). The separate
> polling thread, like you described it, might avoid some of that
> blocking.
> 
> Perhaps the polling thread design is better when it is possible. You
> can use it in your own code, but it's not really possible for things
> like EGL implementations since EGL does not define an API for
> cross-thread wake-ups.

I see. The only only problem here is that we don't use poll, but rather
libevent, which is wrapped around Chromium's MessagePumpLibevent. It
constantly notifies us that a fd is ready to be read.

Here is a diagram that shows the flow without a dedicated thread -
https://drive.google.com/file/d/1Jq8aX4cWszwL5fleE8ayGOeond70xChS/view?usp=sharing
.

As you can see, it returns from OnCanRead and waits until libevent
notifies us the next time that we can actually read events (if prepared
to read previously). And if the main thread blocks because of the above
mentioned reason (libevent didn't have a chance to notify us), the
events will never be read.

> 
> Btw. shouldn't we perhaps discuss these on the Wayland mailing list?
> Feel free to CC all this there.
> 
> Thanks,
> pq


-- 
Best Regards,
Maksim Sisov
* Usual work time - 08:00-16:00 (EEST).


More information about the wayland-devel mailing list