Wayland Message Behaviour
Turner, LewisX
lewisx.turner at intel.com
Thu Nov 9 13:17:34 UTC 2017
Hi pq,
I will respond again once we have done some further investigation regarding your comments.
Regards,
Lewis
-----Original Message-----
From: Pekka Paalanen [mailto:ppaalanen at gmail.com]
Sent: Thursday, November 9, 2017 8:29 AM
To: Turner, LewisX <lewisx.turner at intel.com>
Cc: wayland-devel at lists.freedesktop.org
Subject: Re: Wayland Message Behaviour
On Wed, 8 Nov 2017 16:02:11 +0000
"Turner, LewisX" <lewisx.turner at intel.com> wrote:
> Hi all,
>
> I have come across some strange behavior with Wayland and I wondered
> if you could share your thoughts on it.
>
> The trace below is a sanitized version of what I captured of the
> behavior of Wayland messages between our client and Weston. Towards
> the end of the trace, we see an encoded buffer handle message sent out
> once but for some reason the client ends up receiving this message
> twice. At this point, the bridge between Weston and the client is
> broken and although both can send messages to each other, the messages
> themselves are never getting to their destinations.
>
> We only see this after calling wl_client_flush() within Weston
> immediately after sending a message to a client. On top of this, the
> client can run at 60FPS for anywhere between 2-30 seconds before
> encountering it. The wl_client_flush() is called immediately after the
> call to encoded_buffer_handle() on every frame. There is no obvious
> difference between the frame on which the issue occurs and the other
> frames that work fine.
>
> [472508.841] -> hmi at 3.encoded_buffer_handle(10, 2073600, 409148, 8,
> 3559793324) [472509.269] hmi @3.encoded_buffer_handle(10, 2073600,
> 409148, 8, 3559793324) [472517.952] -> hmi
> @3.release_buffer_handle(8, 0, 1) [472518.146] hmi
> @3.release_buffer_handle(8, 0, 1) [473005.451] -> hmi
> @3.encoded_buffer_handle(10, 2073600, 427086, 8, 3559838348)
> [473005.848] hmi @3.encoded_buffer_handle(10, 2073600, 427086, 8,
> 3559838348) [473014.914] -> hmi @3.release_buffer_handle(8, 0, 1)
> [473015.112] hmi @3.release_buffer_handle(8, 0, 1) [473505.149] ->
> hmi @3.encoded_buffer_handle(10, 2073600, 558179, 8, 3559883345)
> [473506.794] hmi @3.encoded_buffer_handle(10, 2073600, 558179, 8,
> 3559883345) [473517.277] -> hmi @3.release_buffer_handle(8, 0, 1)
> [473517.427] hmi @3.release_buffer_handle(8, 0, 1) [473517.524] hmi
> @3.encoded_buffer_handle(10, 2073600, 558179, 8, 3559883345)
> [474002.087] -> hmi @3.encoded_buffer_handle(10, 2073600, 400377, 8,
> 3559928007) [474507.144] -> hmi @3.encoded_buffer_handle(7, 2073600,
> 391711, 4, 3559972997) [475002.930] -> hmi
> @3.encoded_buffer_handle(11, 2073600, 399621, 5, 3560018014)
> [475503.604] -> hmi @3.encoded_buffer_handle(12, 2073600, 408610, 6,
> 3560063010) [476002.867] -> hmi @3.encoded_buffer_handle(13, 2073600,
> 422290, 7, 3560107997)
>
Hi,
ok, I see the duplicated message. It could be a bug in libwayland.
How exactly did you get this trace? It's not obvious which one is client and which one is server as I don't know your protocol extension.
Have you tried a middle-man tool to print the protocol traces?
See https://wayland.freedesktop.org/extras.html and please read the "is not enough" link for the caveats of WAYLAND_DEBUG dumping.
Explicitly calling wl_client_flush() should never be necessary in Weston, because wl_display_run() does it internally when control returns to the main event loop. Maybe you are not returning the main event loop often enough? Or is the server side using wl_resource_queue_event()?
But that is also irrelevant to the message duplication.
> Also, when this happens it's worth nothing that
> wl_display_roundtrip() reports an error, and wl_display_get_error()
> returns an error code. The documentation online suggests errors here
> are fatal and we cannot use the display again.
Yes, it is fatal. Have you hooked up the client side logging from libwayland? That should print the error, but the error should also be present in the protocol dump if it is a protocol error. But it might be a socket error as well.
> Also note that commenting out the call to wl_client_flush() makes the
> issue go away. However, we then see an unacceptable delay before the
> event is received on the client side.
That seems like an unrelated issue worth looking into as well. Weston should be quite fast in returning to its main event loop.
Are you sure you are not calling wl_client_flush() on a destroyed wl_client, btw?
If you ever store a wl_client pointer anywhere, you must also register a destroy listener for it, because libwayland-server will automatically destroy clients, e.g. due to protocol or socket errors.
> The main question at this point is whether anyone else has seen
> similar behavior. We do not want to spend more time investigating this
> if the cause is a known issue.
No, I have never seen a duplicated message before.
Maybe it's triggered by a failure path, since you say the connection fails.
A common way to cause arbitrary connection failures is a client (or
server) that never throttles. A client that continuously spams requests without waiting for a single event from the compositor in between is likely to overflow the message buffers, causing a fatal error and disconnection. You will not know about the error until you call a client function that can return it, e.g. roundtrip, dispatch, etc.
Request sending functions never return errors even if they raise one, because it would be far too much for the caller to check every request call.
Now, if anything I said was news to you, I would be happy to hear suggestions on where you would have expected the thing to have been documented. I know our documentation is somewhat sparse.
Thanks,
pq
Intel Deutschland GmbH
Registered Address: Am Campeon 10-12, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, www.intel.de
Managing Directors: Christin Eisenschmid, Christian Lamprechter
Chairperson of the Supervisory Board: Nicole Lau
Registered Office: Munich
Commercial Register: Amtsgericht Muenchen HRB 186928
More information about the wayland-devel
mailing list