<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
Am 13.04.21 um 07:36 schrieb Andrey Grodzovsky:<br>
<blockquote type="cite" cite="mid:d7a44895-d6c8-7528-51be-ae08188ff1f6@amd.com">
[SNIP]</blockquote>
<br>
<blockquote type="cite" cite="mid:d7a44895-d6c8-7528-51be-ae08188ff1f6@amd.com">
emit_fence(fence);
<blockquote type="cite" cite="mid:2894bf97-8c39-6610-c479-b089c46513e7@amd.com">
<blockquote type="cite" cite="mid:ecf465a2-d4fc-1cbf-a9d5-39c3844f23bb@amd.com">
<blockquote type="cite" cite="mid:a970101f-89f1-8bdf-51d9-4a4e5e0f9e9a@amd.com">
<blockquote type="cite" cite="mid:aaa2b266-f091-dd9c-e49d-5e528decfbd7@amd.com">
<blockquote type="cite" cite="mid:cd94e02c-11c8-0198-ab70-0ceee54d437b@amd.com">
<blockquote type="cite" cite="mid:80713dbe-411c-d79b-34ba-b67bc3a50dc5@amd.com">
<p> <b>/* We can't wait forever as the HW
might be gone at any point*/</b><b><br>
dma_fence_wait_timeout(old_fence, 5S);</b><br>
</p>
</blockquote>
<br>
You can pretty much ignore this wait here. It is only as
a last resort so that we never overwrite the ring
buffers.<br>
</blockquote>
<p><br>
</p>
<p>If device is present how can I ignore this ?</p>
</blockquote>
</blockquote>
<p><br>
</p>
<p>I think you missed my question here <br>
</p>
</blockquote>
<br>
Sorry I thought I answered that below.<br>
<br>
See this is just the last resort so that we don't need to worry
about ring buffer overflows during testing.<br>
<br>
We should not get here in practice and if we get here generating
a deadlock might actually be the best handling.<br>
<br>
The alternative would be to call BUG().<br>
</blockquote>
<p><br>
</p>
<p>BTW, I am not sure it's so improbable to get here in case of
sudden device remove, if you are during rapid commands
submission to the ring during this time you could easily get to
ring buffer overrun because EOP interrupts are gone and fences
are not removed anymore but new ones keep arriving from new
submissions which don't stop yet.</p>
</blockquote>
<br>
During normal operation hardware fences are only created by two code
paths:<br>
1. The scheduler when it pushes jobs to the hardware.<br>
2. The KIQ when it does register access on SRIOV.<br>
<br>
Both are limited in how many submissions could be made.<br>
<br>
The only case where this here becomes necessary is during GPU reset
when we do direct submission bypassing the scheduler for IB and
other tests.<br>
<br>
Christian.<br>
<br>
<blockquote type="cite" cite="mid:d7a44895-d6c8-7528-51be-ae08188ff1f6@amd.com">
<p>Andrey</p>
</blockquote>
<br>
</body>
</html>