<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<div class="moz-cite-prefix">On 15.01.24 18:54, Michel Dänzer wrote:<br>
</div>
<blockquote type="cite"
cite="mid:7194a09a-afe8-4eae-8288-c72e2ac7d0a6@daenzer.net">
<pre class="moz-quote-pre" wrap="">On 2024-01-15 18:26, Friedrich Vock wrote:
[snip]
</pre>
<blockquote type="cite"><span style="white-space: pre-wrap">
</span>
<pre class="moz-quote-pre" wrap="">The fundamental problem here is that not telling applications that
something went wrong when you just canceled their work midway is an
out-of-spec hack.
When there is a report of real-world apps breaking because of that hack,
reports of different apps working (even if it's convenient that they
work) doesn't justify keeping the broken code.
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">
If the breaking apps hit multiple soft resets in a row, I've laid out a pragmatic solution which covers both cases.
</pre>
</blockquote>
Hitting soft reset every time is the lucky path. Once GPU work is
interrupted out of nowhere, all bets are off and it might as well
trigger a full system hang next time. No hang recovery should be
able to cause that under any circumstance.<br>
<blockquote type="cite"
cite="mid:7194a09a-afe8-4eae-8288-c72e2ac7d0a6@daenzer.net">
<pre class="moz-quote-pre" wrap="">
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">If mutter needs to be robust against faults it caused itself, it should be robust
against GPU resets.
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">
It's unlikely that the hangs I've seen were caused by mutter itself, more likely Mesa or amdgpu.
Anyway, this will happen at some point, the reality is it hasn't yet though.
</pre>
</blockquote>
</body>
</html>