gtk3-demo dies with EAGAIN when running under Weston

Jonas Ådahl jadahl at gmail.com
Fri Mar 20 02:03:14 PDT 2015


On Fri, Mar 20, 2015 at 10:53:42AM +0200, Pekka Paalanen wrote:
> On Thu, 19 Mar 2015 23:45:00 -0400
> Lyude <thatslyude at gmail.com> wrote:
> 
> > On Fri, 2015-03-20 at 11:37 +0800, Jonas Ådahl wrote:
> > > 
> > > Try to apply this patch http://patchwork.freedesktop.org/patch/44994/ .
> > > 
> > > 
> > > Jonas
> > 
> > I just tried the patch and it fixed the issue. Thanks a ton for the
> > quick reply to my e-mail and the patch :). Now I can finally get back to
> > work.
> 
> Now that seems odd.
> 
> AFAIU, basically you get EAGAIN when the kernel doesn't have the space
> to buffer your sent data. This means that either the receiver (the
> compositor) is not reading the socket, or the client is flooding the
> socket.
> 
> I would suggest there may be two bugs in GTK+ here:
> 
> 1. Flooding the socket to begin with. I really don't understand why
>    "input: Make setting the same pointer cursor state again a no-op"
>    would be fixing this issue. Does GTK+ send tons of requests for
>    every pointer enter/leave or something?

When weston got a wl_pointer.set_cursor with the same cursor and the
same hot spot, it unmapped the surface, sent wl_surfac.leave on the
cursor surface, then mapped it again in the exact same position, sending
a wl_surface.enter to that same surface. When GTK+ receives an
enter/leave on a cursor surface it calculates the buffer scale it should
use and and lazilly just sets the same cursor again because it doesn't
keep track of that itself. In other words, we'd get a feedback loop that
eventually fills the buffer. One could argue that GTK+ shouldn't be as
lazy, but I don't think setting a cursor suface with an identical state
should make the cursor leave the output and then enter immediately
either.

> 
> 2. Not dealing with EAGAIN. See the documentation for
>    wl_display_flush[1]. When hitting EAGAIN there, the event loop
>    should poll for writable and wait before issuing more Wayland
>    requests. Does GTK+ do this already?

AFAICS GTK+ doesn't check for EAGIN, but in this particular sitation
we'd just get an eternal busy loop instead of an abort.

Jonas

> 
> Now, 2. is if the error comes back from wl_display_flush(). The other
> case is when calling a request function attempts to buffer the message
> but wl_closure_send() fails because wl_connection_flush() fails. These
> are libwayland-client internal functions. When wl_connection_flush()
> fails, instead of failing everything, I think we should just allocate
> more space until implicit flush succeeds or we can return EAGAIN from
> wl_display_flush().
> 
> Can you check if the failure comes from wl_display_flush() or the
> internal failure of the implicit flush? A backtrace would tell.
> 
> In any case, I think it might good to look into removing the abort()
> from wl_proxy_marshal_array_constructor(), unless someone makes a case
> the app being very broken if it sends that much data without spinning
> the event loop.
> 
> However, growing the send buffer unlimited is not a good idea, because
> the bigger it gets, it means the more behind the app (the compositor
> actually) is, and at some point that starts to indicate an app bug.
> 
> 
> Thanks,
> pq
> 
> [1]
> http://wayland.freedesktop.org/docs/html/apb.html#Client-classwl__display_1a8463b6e5f4cf9a2a3ad2d543aedcf429


More information about the wayland-devel mailing list