Benchmark of Wayland

Wed Nov 17 09:50:35 PST 2010

2010/11/13 Chris Wilson <chris at chris-wilson.co.uk>:
> Take a step back. It's time to review the system architecture once more.
> Wayland is a input/output [de-]multiplexer. It does no rendering on the
> behalf of the client, only compositing the many clients onto the scanouts.
> The clients must prepare for themselves the shared memory buffers that
> they pass to Wayland for compositing. (Under GEM those shared memory
> buffers are merely GEM objects and therefore can also be used with
> hardware accelerated rendering.)

That's right, we are talking about compositing, not rendering. If I
guess correctly, Wayland server should start to composite each window
whenever a client pass a GEM buffer (via some IPC such as Unix domain
socket?) to it. However, the client and the Wayland server are two
different processes. The compositing action is not performed until the
Linux scheduler context switches from other process to the process of
Wayland server.

As a result, there is delay between client sending the GEM buffer and
the GEM buffer actually being composited by Wayland server. This is
one of latency due to client-server architecture. And in my opinion,
the delay time should depend on kernel scheduling policy and the
number of processes running on the system. I guess that the more
processes running on the system, the more delay will be observed.
(since more processes need more context switches to have each process
a chance to be run) And the delay time could be as long as several
milliseconds or even more. That means when a client wants to update
its window content, it must wait probably several msecs so that the
change can be seen on the screen.

Recall that the stated goal of Wayland is:

"every frame is perfect, by which I mean that applications will be
able to control the rendering enough that we'll never see tearing,
lag, redrawing or flicker"

However, if the latency exists, we should see lag in client-server
architecture. Which violates the stated goal.

This kind of latency due to OS scheduling could be eliminated by
direct-procedure call. That is, the client passes the GEM buffer to
the server just like it passes the GEM buffer as an argument to a
function. In this approach, the compositor is not a process. Rather,
the compositor is just a function live in the client program. So there
will be no context switches.