<div dir="ltr"><div dir="ltr"><span style="">On Tue, 20 Apr 2021 at 19:00, Christian König <<a href="mailto:ckoenig.leichtzumerken@gmail.com">ckoenig.leichtzumerken@gmail.com</a>> wrote:</span><br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div><div>Am 20.04.21 um 19:44 schrieb Daniel
Stone:</div><blockquote type="cite"><div dir="ltr"><div class="gmail_quote">
<div>But winsys is something _completely_ different. Yes,
you're using the GPU to do things with buffers A, B, and C
to produce buffer Z. Yes, you're using vkQueuePresentKHR to
schedule that work. Yes, Mutter's composition job might
depend on a Chromium composition job which depends on GTA's
render job which depends on GTA's compute job which might
take a year to complete. Mutter's composition job needs to
complete in 'reasonable' (again, FSVO) time, no matter what.
The two are compatible.</div>
<div><br>
</div>
<div>How? Don't lump them together. Isolate them
aggressively, and _predictably_ in a way that you can reason
about.</div>
<div><br>
</div>
<div>What clients do in their own process space is their own
business. Games can deadlock themselves if they get
wait-before-signal wrong. Compute jobs can run for a year.
Their problem. Winsys is not that, because you're crossing
every isolation boundary possible. Process, user, container,
VM - every kind of privilege boundary. Thus far, dma_fence
has protected us from the most egregious abuses by
guaranteeing bounded-time completion; it also acts as a
sequencing primitive, but from the perspective of a winsys
person that's of secondary importance, which is probably one
of the bigger disconnects between winsys people and GPU
driver people.</div>
</div>
</div>
</blockquote>
<br>
Finally somebody who understands me :)<br>
<br>
Well the question is then how do we get winsys and your own process
space together then?<br></div></blockquote><div><br></div><div>It's a jarring transition. If you take a very narrow view and say 'it's all GPU work in shared buffers so it should all work the same', then client<->winsys looks the same as client<->client gbuffer. But this is a trap.</div><div><br></div><div>Just because you can mmap() a file on an NFS server in New Zealand doesn't mean that you should have the same expectations of memory access to that file as you do to of a pointer from alloca(). Even if the primitives look the same, you are crossing significant boundaries, and those do not come without a compromise and a penalty.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><blockquote type="cite">
<div dir="ltr">
<div class="gmail_quote">
<div>Anyway, one of the great things about winsys (there are
some! trust me) is we don't need to be as hopelessly general
as for game engines, nor as hyperoptimised. We place strict
demands on our clients, and we literally kill them every
single time they get something wrong in a way that's visible
to us. Our demands on the GPU are so embarrassingly simple
that you can run every modern desktop environment on GPUs
which don't have unified shaders. And on certain platforms
who don't share tiling formats between
texture/render-target/scanout ... and it all still runs fast
enough that people don't complain.</div>
</div>
</div>
</blockquote>
<br>
Ignoring everything below since that is the display pipeline I'm not
really interested in. My concern is how to get the buffer from the
client to the server without allowing the client to get the server
into trouble?<br>
<br>
My thinking is still to use timeouts to acquire texture locks. E.g.
when the compositor needs to access texture it grabs a lock and if
that lock isn't available in less than 20ms whoever is holding it is
killed hard and the lock given to the compositor.<br>
<br>
It's perfectly fine if a process has a hung queue, but if it tries
to send buffers which should be filled by that queue to the
compositor it just gets a corrupted window content.<br></div></blockquote><div><br></div><div>Kill the client hard. If the compositor has speculatively queued sampling against rendering which never completed, let it access garbage. You'll have one frame of garbage (outdated content, all black, random pattern; the failure mode is equally imperfect, because there is no perfect answer), then the compositor will notice the client has disappeared and remove all its resources.</div><div><br></div><div>It's not possible to completely prevent this situation if the compositor wants to speculatively pipeline work, only ameliorate it. From a system-global point of view, just expose the situation and let it bubble up. Watch the number of fences which failed to retire in time, and destroy the context if there are enough of them (maybe 1, maybe 100). Watch the number of contexts the file description get forcibly destroyed, and destroy the file description if there are enough of them. Watch the number of descriptions which get forcibly destroyed, and destroy the process if there are enough of them. Watch the number of processes in a cgroup/pidns which get forcibly destroyed, and destroy the ... etc. Whether it's the DRM driver or an external monitor such as systemd/Flatpak/podman/Docker doing that is pretty immaterial, as long as the concept of failure bubbling up remains.</div><div><br></div><div>(20ms is objectively the wrong answer FWIW, because we're not a hard RTOS. But if our biggest point of disagreement is 20 vs. 200 vs. 2000 vs. 20000 ms, then this thread has been a huge success!)</div><div><br></div><div>Cheers,</div><div>Daniel </div></div></div>