<div dir="ltr"><div dir="ltr"><span style="">On Tue, 20 Apr 2021 at 19:00, Christian König <<a href="mailto:ckoenig.leichtzumerken@gmail.com">ckoenig.leichtzumerken@gmail.com</a>> wrote:</span><br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> <div><div>Am 20.04.21 um 19:44 schrieb Daniel Stone:</div><blockquote type="cite"><div dir="ltr"><div class="gmail_quote"> <div>But winsys is something _completely_ different. Yes, you're using the GPU to do things with buffers A, B, and C to produce buffer Z. Yes, you're using vkQueuePresentKHR to schedule that work. Yes, Mutter's composition job might depend on a Chromium composition job which depends on GTA's render job which depends on GTA's compute job which might take a year to complete. Mutter's composition job needs to complete in 'reasonable' (again, FSVO) time, no matter what. The two are compatible.</div> <div><br> </div> <div>How? Don't lump them together. Isolate them aggressively, and _predictably_ in a way that you can reason about.</div> <div><br> </div> <div>What clients do in their own process space is their own business. Games can deadlock themselves if they get wait-before-signal wrong. Compute jobs can run for a year. Their problem. Winsys is not that, because you're crossing every isolation boundary possible. Process, user, container, VM - every kind of privilege boundary. Thus far, dma_fence has protected us from the most egregious abuses by guaranteeing bounded-time completion; it also acts as a sequencing primitive, but from the perspective of a winsys person that's of secondary importance, which is probably one of the bigger disconnects between winsys people and GPU driver people.</div> </div> </div> </blockquote> <br> Finally somebody who understands me :)<br> <br> Well the question is then how do we get winsys and your own process space together then?<br></div></blockquote><div><br></div><div>It's a jarring transition. If you take a very narrow view and say 'it's all GPU work in shared buffers so it should all work the same', then client<->winsys looks the same as client<->client gbuffer. But this is a trap.</div><div><br></div><div>Just because you can mmap() a file on an NFS server in New Zealand doesn't mean that you should have the same expectations of memory access to that file as you do to of a pointer from alloca(). Even if the primitives look the same, you are crossing significant boundaries, and those do not come without a compromise and a penalty.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><blockquote type="cite"> <div dir="ltr"> <div class="gmail_quote"> <div>Anyway, one of the great things about winsys (there are some! trust me) is we don't need to be as hopelessly general as for game engines, nor as hyperoptimised. We place strict demands on our clients, and we literally kill them every single time they get something wrong in a way that's visible to us. Our demands on the GPU are so embarrassingly simple that you can run every modern desktop environment on GPUs which don't have unified shaders. And on certain platforms who don't share tiling formats between texture/render-target/scanout ... and it all still runs fast enough that people don't complain.</div> </div> </div> </blockquote> <br> Ignoring everything below since that is the display pipeline I'm not really interested in. My concern is how to get the buffer from the client to the server without allowing the client to get the server into trouble?<br> <br> My thinking is still to use timeouts to acquire texture locks. E.g. when the compositor needs to access texture it grabs a lock and if that lock isn't available in less than 20ms whoever is holding it is killed hard and the lock given to the compositor.<br> <br> It's perfectly fine if a process has a hung queue, but if it tries to send buffers which should be filled by that queue to the compositor it just gets a corrupted window content.<br></div></blockquote><div><br></div><div>Kill the client hard. If the compositor has speculatively queued sampling against rendering which never completed, let it access garbage. You'll have one frame of garbage (outdated content, all black, random pattern; the failure mode is equally imperfect, because there is no perfect answer), then the compositor will notice the client has disappeared and remove all its resources.</div><div><br></div><div>It's not possible to completely prevent this situation if the compositor wants to speculatively pipeline work, only ameliorate it. From a system-global point of view, just expose the situation and let it bubble up. Watch the number of fences which failed to retire in time, and destroy the context if there are enough of them (maybe 1, maybe 100). Watch the number of contexts the file description get forcibly destroyed, and destroy the file description if there are enough of them. Watch the number of descriptions which get forcibly destroyed, and destroy the process if there are enough of them. Watch the number of processes in a cgroup/pidns which get forcibly destroyed, and destroy the ... etc. Whether it's the DRM driver or an external monitor such as systemd/Flatpak/podman/Docker doing that is pretty immaterial, as long as the concept of failure bubbling up remains.</div><div><br></div><div>(20ms is objectively the wrong answer FWIW, because we're not a hard RTOS. But if our biggest point of disagreement is 20 vs. 200 vs. 2000 vs. 20000 ms, then this thread has been a huge success!)</div><div><br></div><div>Cheers,</div><div>Daniel </div></div></div>