<div dir="ltr"><div>Thanks everybody. The initial proposal is dead. Here are some thoughts on how to do it differently.<br></div><div><br></div><div>I think we can have direct command submission from userspace via memory-mapped queues ("user queues") without changing window systems.</div><div><br></div><div>The memory management doesn't have to use GPU page faults like HMM. Instead, it can wait for user queues of a specific process to go idle and then unmap the queues, so that userspace can't submit anything. Buffer evictions, pinning, etc. can be executed when all queues are unmapped (suspended). Thus, no BO fences and page faults are needed.<br></div><div><br></div><div>Inter-process synchronization can use timeline semaphores. Userspace will query the wait and signal value for a shared buffer from the kernel. The kernel will keep a history of those queries to know which process is responsible for signalling which buffer. There is only the wait-timeout issue and how to identify the culprit. One of the solutions is to have the GPU send all GPU signal commands and all timed out wait commands via an interrupt to the kernel driver to monitor and validate userspace behavior. With that, it can be identified whether the culprit is the waiting process or the signalling process and which one. Invalid signal/wait parameters can also be detected. The kernel can force-signal only the semaphores that time out, and punish the processes which caused the timeout or used invalid signal/wait parameters.</div><div><br></div><div>The question is whether this synchronization solution is robust enough for dma_fence and whatever the kernel and window systems need.<br></div><div><br></div><div>Marek<br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Apr 20, 2021 at 4:34 PM Daniel Stone <<a href="mailto:daniel@fooishbar.org" target="_blank">daniel@fooishbar.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr">Hi,</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, 20 Apr 2021 at 20:30, Daniel Vetter <<a href="mailto:daniel@ffwll.ch" target="_blank">daniel@ffwll.ch</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">The thing is, you can't do this in drm/scheduler. At least not without<br> splitting up the dma_fence in the kernel into separate memory fences<br> and sync fences</blockquote><div> </div><div><span>I'm starting to think this thread needs its own glossary ...</span></div><div><span><br></span></div><div><span>I propose we use 'residency fence' for execution fences which enact memory-residency operations, e.g. faulting in a page ultimately depending on GPU work retiring.</span></div><div><span><br></span></div><div><span>And 'value fence' for the pure-userspace model suggested by timeline semaphores, i.e. fences being (*addr == val) rather than being able to look at ctx seqno.</span></div><div><span><br></span></div><div><span>Cheers,</span></div><div><span>Daniel</span></div></div></div> _______________________________________________<br> mesa-dev mailing list<br> <a href="mailto:mesa-dev@lists.freedesktop.org" target="_blank">mesa-dev@lists.freedesktop.org</a><br> <a href="https://lists.freedesktop.org/mailman/listinfo/mesa-dev" rel="noreferrer" target="_blank">https://lists.freedesktop.org/mailman/listinfo/mesa-dev</a><br> </blockquote></div>