Should Weston wait for client buffers to finish rendering?

Thu Jan 24 09:47:30 UTC 2019

On Wed, 23 Jan 2019 18:43:52 +0000
"Singh, Satyeshwar" <satyeshwar.singh at intel.com> wrote:

> Hey guys,
> As you know, Weston doesn't wait for client buffers to finish
> rendering. That is typically left as an exercise for the kernel mode
> graphics driver. I am wondering if anyone knows why this policy
> decision was made? More importantly, is there any harm (or any side
> effect) that I am not thinking of if this policy were to change such
> that the compositor indeed started waiting for client buffers to
> finish rendering first and only used those buffers for composition
> that had finished?

Hi,

the reason is simple: no other option existed when Weston's DRM backend
was written, and no-one has got around to implement anything else yet.

Implicit fencing existed first, because it was required for Xorg and
X11 apps AFAIU. Display/GPU drivers were monolithic, so everything was
driver-internal and implicit fencing was easy to do.

Then came up the systems where the display controller and the GPU are
separate devices under separate drivers, and the need to communicate
between drivers came up, which then lead to the design of dmabufs and
explicit fencing.

Explicit fencing is not mandatory to be able to wait for finish
nowadays, but I think dmabuf might well be.

Making the compositor wait for client buffers to finish rendering will
need some infrastructure work in the compositor, because the compositor
cannot stop to wait, it must keep on running and updating the displays
with the previous client content that is ready.

There is also a timing trade-off to consider. The compositor must start
its own compositing job on the GPU well before vblank if it aims to
make it for the vblank. With implicit fencing policy, a client's GPU
job is allowed to be unfinished at the time the compositor queues the
compositing job, which means the client has more time to finish its GPU
job. OTOH, with a wait-for-finished policy, the client's GPU job must
have finished already at the time the compositor decides to start the
composition for the next vblank.

The timing aspect is fairly complicated, this blog post offers some
examples:
https://ppaalanen.blogspot.com/2015/02/weston-repaint-scheduling.html

The things to consider are the time available for a client to draw, and
the client's latency to output.

> Imagine a benchmark case where the client renders for example 800
> frames and attaches their buffer ids to a surface, the compositor
> uses the last one that came in before its repaint cycle started for
> composition and display on the screen. This buffer may not have been
> rendered by the GPU yet because it is working on previous buffers.
> However, it may not finish before the next Vblank and if it doesn't
> finish, then the compositor's scan out buffer also isn't going to be
> displayed by the kernel driver. If we change the policy for the
> compositor to always use the last finished buffer, then at least the
> compositor's scan out buffer will be displayed for the next Vblank
> even if it's not showing the last frame from the client. Thoughts?

You are correct. This is an existing caveat.

Mind that wait-for-finished policy is not the only factor here in
practise. Another big factor is the task scheduling for the GPU. If the
GPU has a 800 jobs from the client in its queue and then the compositor
adds its compositing job on top as number 801, it is still quite likely
that the GPU will have to crunch through all those 800 before getting
to the compositing job. This is where the GPU driver task scheduling
needs more features: it needs to know that the compositor's job is more
important than the 800 other ones (there are e.g. EGL extensions for
this). Another thing is that instead of the 800 client jobs, there
could be just one client job that takes 2 seconds to finish; in this
case the GPU driver and hardware would need to support pre-emption to
get the compositing job running, which is a relatively new feature.

Solutions to these issue would be good to have, but the timings
trade-off probably needs investigation and maybe some extensions here
and there to reach an optimal solution.

Thanks,
pq
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <https://lists.freedesktop.org/archives/wayland-devel/attachments/20190124/5de4851f/attachment-0001.sig>