<div dir="ltr"><div dir="ltr"><span style="">On Tue, 20 Apr 2021 at 15:58, Christian König <<a href="mailto:ckoenig.leichtzumerken@gmail.com">ckoenig.leichtzumerken@gmail.com</a>> wrote:</span></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div> <div>Am 20.04.21 um 16:53 schrieb Daniel Stone:</div><blockquote type="cite"> <div dir="ltr"> <div class="gmail_quote"> <div dir="ltr" class="gmail_attr">On Mon, 19 Apr 2021 at 11:48, Marek Olšák <<a href="mailto:maraeo@gmail.com" target="_blank">maraeo@gmail.com</a>> wrote:</div> <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> <div dir="ltr"> <div><span>Deadlock mitigation to recover from segfaults:</span><br> </div> <div>- The kernel knows which process is obliged to signal which fence. This information is part of the Present request and supplied by userspace.<br> </div> <div>- If the producer crashes, the kernel signals the submit fence, so that the consumer can make forward progress.</div> <div>- If the consumer crashes, the kernel signals the return fence, so that the producer can reclaim the buffer.</div> <div>- A GPU hang signals all fences. Other deadlocks will be handled like GPU hangs.</div> </div> </blockquote> <div><br> </div> <div>Another thought: with completely arbitrary userspace fencing, none of this is helpful either. If the compositor can't guarantee that a hostile client has submitted a fence which will never be signaled, then it won't be waiting on it, so it already needs infrastructure to handle something like this. </div> </div> </div> </blockquote> <br> <blockquote type="cite"> <div dir="ltr"> <div class="gmail_quote"> <div>That already handles the crashed-client case, because if the client crashes, then its connection will be dropped, which will trigger the compositor to destroy all its resources anyway, including any pending waits.</div> </div> </div> </blockquote> <br> Exactly that's the problem. A compositor isn't immediately informed that the client crashed, instead it is still referencing the buffer and trying to use it for compositing.<br></div></blockquote><div><br></div><div>If the compositor no longer has a guarantee that the buffer will be ready for composition in a reasonable amount of time (which dma_fence gives us, and this proposal does not appear to give us), then the compositor isn't trying to use the buffer for compositing, it's waiting asynchronously on a notification that the fence has signaled before it attempts to use the buffer.</div><div><br></div><div>Marek's initial suggestion is that the kernel signal the fence, which would unblock composition (and presumably show garbage on screen, or at best jump back to old content).</div><div><br></div><div>My position is that the compositor will know the process has crashed anyway - because its socket has been closed - at which point we destroy all the client's resources including its windows and buffers regardless. Signaling the fence doesn't give us any value here, _unless_ the compositor is just blindly waiting for the fence to signal ... which it can't do because there's no guarantee the fence will ever signal.</div><div> </div><div>Cheers,</div><div>Daniel</div></div></div>