<!DOCTYPE html><html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
Am 15.11.24 um 17:07 schrieb Philipp Stanner:<br>
<blockquote type="cite" cite="mid:85a29addcc1c42a339292b536d46c397677d5729.camel@redhat.com">
<pre class="moz-quote-pre" wrap="">On Fri, 2024-11-15 at 15:39 +0100, Christian König wrote:
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap=""> Am 15.11.24 um 14:55 schrieb Philipp Stanner:
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">[SNIP]
</pre>
<span style="white-space: pre-wrap">
</span></blockquote>
</blockquote>
<blockquote type="cite">
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">But you wouldn't want to aim for getting rid of struct
drm_sched_job
completely, or would you?
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">
No, the drm_sched_job structure was added to aid the single producer
single consumer queue and so made it easier to put the jobs into a
container.
In my mind the full solution for running jobs looks like this:
1. Driver enqueues with drm_sched_job_arm()
2. job ends up in pending_list
3. Sooner or later scheduler calls run_job()
4. In return scheduler gets a dma_fence representing the resulting
hardware operation.
5, This fence is put into a container to keep around what the hw is
actually executing.
6. At some point the fence gets signaled informing the scheduler
that the next job can be pushed if enough credits are now available.
There is no free_job callback any more because after run_job is
called the scheduler is done with the job and only the dma_fence
which represents the actually HW operation is the object the
scheduler now works with.
We would still need something like a kill_job callback which is used
when an entity is released without flushing all jobs (see
drm_sched_entity_kill()), but that is then only used for a specific
corner case.
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">
Can't we just limit the scheduler's responsibility to telling the
driver that it has to flush, and if it doesn't it's a bug?</pre>
</blockquote>
<br>
We still need to remove the pending jobs from the scheduler if
flushing times out.<br>
<br>
Without timing out flushing and/or aborting when the process was
killed we run into the same problem again that we don't want ti
block on _fini().<br>
<span style="white-space: pre-wrap"> </span>
<blockquote type="cite" cite="mid:85a29addcc1c42a339292b536d46c397677d5729.camel@redhat.com">
<blockquote type="cite">
<pre class="moz-quote-pre" wrap=""> Blocking for the cleanup in drm_sched_fini() then becomes a trivial
dma_fence_wait() on the remaining unsignaled HW fences and eventually
on the latest killed fence.
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">
But that results in exactly the same situation as my waitque solution,
doesn't it?</pre>
</blockquote>
<br>
The key part is that dma_fence's have a documented requirement to
never block infinitely.<br>
<br>
<blockquote type="cite" cite="mid:85a29addcc1c42a339292b536d46c397677d5729.camel@redhat.com">
<pre class="moz-quote-pre" wrap="">Blocking infinitely in drm_sched_fini():
If the driver doesn't guarantee that all fences will get signaled, then
wait_event() in the waitque solution will block forever. The same with
dma_fence_wait().
Or are you aiming at an interruptible dma_fence_wait(), thereby not
blocking SIGKILL?</pre>
</blockquote>
<br>
That is basically what drm_sched_entity_flush() is already doing.<br>
<br>
<blockquote type="cite" cite="mid:85a29addcc1c42a339292b536d46c397677d5729.camel@redhat.com">
<pre class="moz-quote-pre" wrap="">That then would still result in leaks. So basically the same situation
as with an interruptible drm_sched_flush() function.
(Although I agree that would probably be more elegant)</pre>
</blockquote>
<br>
If the wait_event really would just waiting for dma_fences then yes.<br>
<br>
The problem with the waitqueue approach is that we need to wait for
the free_job callback and that callback is normally called from a
work item without holding any additional locks.<br>
<br>
When drm_sched_fini starts to block for that we create a rat-tail of
new dependencies since that one is most likely called from a file
descriptor destruction.<br>
<br>
<blockquote type="cite" cite="mid:85a29addcc1c42a339292b536d46c397677d5729.camel@redhat.com">
<pre class="moz-quote-pre" wrap="">
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">
The problem with this approach is that the idea of re-submitting
jobs in a GPU reset by the scheduler is then basically dead. But to
be honest that never worked correctly.
See the deprecated warning I already put on
drm_sched_resubmit_jobs(). The job lifetime is not really well
defined because of this, see the free_guilty hack as well.
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">
drm_sched_resubmit_jobs() is being used by 5 drivers currently. So if
we go for this approach we have to port them, first. That doesn't sound
trivial to me</pre>
</blockquote>
<br>
Yeah, completely agree. I think the scheduler should provide
something like drm_sched_for_each_pending_fence() which helps to
iterate over all the pending HW fences.<br>
<br>
The individual drivers could then device by themself what to do,
e.g. upcast the HW fence into the job and push it again or similar.<br>
<br>
But what we really need to get away from are those fence
pre-requisite violations Sima pointed out. For example we can't
allocate memory for run_job.<br>
<br>
Regards,<br>
Christian.<br>
<br>
<blockquote type="cite" cite="mid:85a29addcc1c42a339292b536d46c397677d5729.camel@redhat.com">
<pre class="moz-quote-pre" wrap="">
P.
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">
Regards,
Christian.
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">
Grüße,
P.
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">
</pre>
</blockquote>
<br>
</body>
</html>