[Mesa-dev] [RFC] Linux Graphics Next: Explicit fences everywhere and no BO fences - initial proposal

Tue Apr 20 18:15:34 UTC 2021

On Tue, 20 Apr 2021 at 19:00, Christian König <
ckoenig.leichtzumerken at gmail.com> wrote:

> Am 20.04.21 um 19:44 schrieb Daniel Stone:
>
> But winsys is something _completely_ different. Yes, you're using the GPU
> to do things with buffers A, B, and C to produce buffer Z. Yes, you're
> using vkQueuePresentKHR to schedule that work. Yes, Mutter's composition
> job might depend on a Chromium composition job which depends on GTA's
> render job which depends on GTA's compute job which might take a year to
> complete. Mutter's composition job needs to complete in 'reasonable'
> (again, FSVO) time, no matter what. The two are compatible.
>
> How? Don't lump them together. Isolate them aggressively, and
> _predictably_ in a way that you can reason about.
>
> What clients do in their own process space is their own business. Games
> can deadlock themselves if they get wait-before-signal wrong. Compute jobs
> can run for a year. Their problem. Winsys is not that, because you're
> crossing every isolation boundary possible. Process, user, container, VM -
> every kind of privilege boundary. Thus far, dma_fence has protected us from
> the most egregious abuses by guaranteeing bounded-time completion; it also
> acts as a sequencing primitive, but from the perspective of a winsys person
> that's of secondary importance, which is probably one of the bigger
> disconnects between winsys people and GPU driver people.
>
>
> Finally somebody who understands me :)
>
> Well the question is then how do we get winsys and your own process space
> together then?
>

It's a jarring transition. If you take a very narrow view and say 'it's all
GPU work in shared buffers so it should all work the same', then
client<->winsys looks the same as client<->client gbuffer. But this is a
trap.

Just because you can mmap() a file on an NFS server in New Zealand doesn't
mean that you should have the same expectations of memory access to that
file as you do to of a pointer from alloca(). Even if the primitives look
the same, you are crossing significant boundaries, and those do not come
without a compromise and a penalty.

> Anyway, one of the great things about winsys (there are some! trust me) is
> we don't need to be as hopelessly general as for game engines, nor as
> hyperoptimised. We place strict demands on our clients, and we literally
> kill them every single time they get something wrong in a way that's
> visible to us. Our demands on the GPU are so embarrassingly simple that you
> can run every modern desktop environment on GPUs which don't have unified
> shaders. And on certain platforms who don't share tiling formats between
> texture/render-target/scanout ... and it all still runs fast enough that
> people don't complain.
>
>
> Ignoring everything below since that is the display pipeline I'm not
> really interested in. My concern is how to get the buffer from the client
> to the server without allowing the client to get the server into trouble?
>
> My thinking is still to use timeouts to acquire texture locks. E.g. when
> the compositor needs to access texture it grabs a lock and if that lock
> isn't available in less than 20ms whoever is holding it is killed hard and
> the lock given to the compositor.
>
> It's perfectly fine if a process has a hung queue, but if it tries to send
> buffers which should be filled by that queue to the compositor it just gets
> a corrupted window content.
>

Kill the client hard. If the compositor has speculatively queued sampling
against rendering which never completed, let it access garbage. You'll have
one frame of garbage (outdated content, all black, random pattern; the
failure mode is equally imperfect, because there is no perfect answer),
then the compositor will notice the client has disappeared and remove all
its resources.

It's not possible to completely prevent this situation if the compositor
wants to speculatively pipeline work, only ameliorate it. From a
system-global point of view, just expose the situation and let it bubble
up. Watch the number of fences which failed to retire in time, and destroy
the context if there are enough of them (maybe 1, maybe 100). Watch the
number of contexts the file description get forcibly destroyed, and destroy
the file description if there are enough of them. Watch the number of
descriptions which get forcibly destroyed, and destroy the process if there
are enough of them. Watch the number of processes in a cgroup/pidns which
get forcibly destroyed, and destroy the ... etc. Whether it's the DRM
driver or an external monitor such as systemd/Flatpak/podman/Docker doing
that is pretty immaterial, as long as the concept of failure bubbling up
remains.

(20ms is objectively the wrong answer FWIW, because we're not a hard RTOS.
But if our biggest point of disagreement is 20 vs. 200 vs. 2000 vs. 20000
ms, then this thread has been a huge success!)

Cheers,
Daniel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20210420/cabe884d/attachment.htm>