gtk3-demo dies with EAGAIN when running under Weston

Jonas Ådahl jadahl at gmail.com
Fri Mar 20 03:09:07 PDT 2015


On Fri, Mar 20, 2015 at 11:53:24AM +0200, Pekka Paalanen wrote:
> On Fri, 20 Mar 2015 17:03:14 +0800
> Jonas Ådahl <jadahl at gmail.com> wrote:
> 
> > On Fri, Mar 20, 2015 at 10:53:42AM +0200, Pekka Paalanen wrote:
> > > On Thu, 19 Mar 2015 23:45:00 -0400
> > > Lyude <thatslyude at gmail.com> wrote:
> > > 
> > > > On Fri, 2015-03-20 at 11:37 +0800, Jonas Ådahl wrote:
> > > > > 
> > > > > Try to apply this patch http://patchwork.freedesktop.org/patch/44994/ .
> > > > > 
> > > > > 
> > > > > Jonas
> > > > 
> > > > I just tried the patch and it fixed the issue. Thanks a ton for the
> > > > quick reply to my e-mail and the patch :). Now I can finally get back to
> > > > work.
> > > 
> > > Now that seems odd.
> > > 
> > > AFAIU, basically you get EAGAIN when the kernel doesn't have the space
> > > to buffer your sent data. This means that either the receiver (the
> > > compositor) is not reading the socket, or the client is flooding the
> > > socket.
> > > 
> > > I would suggest there may be two bugs in GTK+ here:
> > > 
> > > 1. Flooding the socket to begin with. I really don't understand why
> > >    "input: Make setting the same pointer cursor state again a no-op"
> > >    would be fixing this issue. Does GTK+ send tons of requests for
> > >    every pointer enter/leave or something?
> > 
> > When weston got a wl_pointer.set_cursor with the same cursor and the
> > same hot spot, it unmapped the surface, sent wl_surfac.leave on the
> > cursor surface, then mapped it again in the exact same position, sending
> > a wl_surface.enter to that same surface. When GTK+ receives an
> > enter/leave on a cursor surface it calculates the buffer scale it should
> > use and and lazilly just sets the same cursor again because it doesn't
> > keep track of that itself. In other words, we'd get a feedback loop that
> > eventually fills the buffer. One could argue that GTK+ shouldn't be as
> > lazy, but I don't think setting a cursor suface with an identical state
> > should make the cursor leave the output and then enter immediately
> > either.
> 
> I am not questioning your patch at all, it sounds totally the right
> thing to do.
> 
> But doesn't that feedback loop require a roundtrip on every cycle? How
> can that fill the send buffer?
> 
> Is it like some cascade where one no-op change causes one leave/enter,
> which then causes two no-op changes, which causes two leave/enter and
> so on exploding exponentially?

That would be my theory as well.

> 
> Btw. how can you ever get enter or leave on a *cursor* surface?
> Is that another Weston bug?

Why wouldn't you want enter or leave on a cursor surface? They need to
know what buffer scale they should render in to get a perfect result.
They could piggy-back on wherever the focused surface is on, but nothing
in the protocol stops us from just using the wl_surface of the pointer
cursor for this.

Jonas

> 
> > > 
> > > 2. Not dealing with EAGAIN. See the documentation for
> > >    wl_display_flush[1]. When hitting EAGAIN there, the event loop
> > >    should poll for writable and wait before issuing more Wayland
> > >    requests. Does GTK+ do this already?
> > 
> > AFAICS GTK+ doesn't check for EAGIN, but in this particular sitation
> > we'd just get an eternal busy loop instead of an abort.
> 
> Still, an item to add in GTK+'s todo.

True.


Jonas

> 
> 
> Thanks,
> pq
> 
> > > Now, 2. is if the error comes back from wl_display_flush(). The other
> > > case is when calling a request function attempts to buffer the message
> > > but wl_closure_send() fails because wl_connection_flush() fails. These
> > > are libwayland-client internal functions. When wl_connection_flush()
> > > fails, instead of failing everything, I think we should just allocate
> > > more space until implicit flush succeeds or we can return EAGAIN from
> > > wl_display_flush().
> > > 
> > > Can you check if the failure comes from wl_display_flush() or the
> > > internal failure of the implicit flush? A backtrace would tell.
> > > 
> > > In any case, I think it might good to look into removing the abort()
> > > from wl_proxy_marshal_array_constructor(), unless someone makes a case
> > > the app being very broken if it sends that much data without spinning
> > > the event loop.
> > > 
> > > However, growing the send buffer unlimited is not a good idea, because
> > > the bigger it gets, it means the more behind the app (the compositor
> > > actually) is, and at some point that starts to indicate an app bug.
> > > 
> > > 
> > > Thanks,
> > > pq
> > > 
> > > [1]
> > > http://wayland.freedesktop.org/docs/html/apb.html#Client-classwl__display_1a8463b6e5f4cf9a2a3ad2d543aedcf429
> 


More information about the wayland-devel mailing list