[PATCH 1/2] drm/sched: add WARN_ON and BUG_ON to drm_sched_fini

Mon Nov 18 11:48:10 UTC 2024

Am 15.11.24 um 17:07 schrieb Philipp Stanner:
> On Fri, 2024-11-15 at 15:39 +0100, Christian König wrote:
>>   Am 15.11.24 um 14:55 schrieb Philipp Stanner:
>>   
>>> [SNIP]
>>> But you wouldn't want to aim for getting rid of struct
>>> drm_sched_job
>>> completely, or would you?
>>>   
>>   
>>   No, the drm_sched_job structure was added to aid the single producer
>> single consumer queue and so made it easier to put the jobs into a
>> container.
>>   
>>   
>>   In my mind the full solution for running jobs looks like this:
>>   
>>   1. Driver enqueues with drm_sched_job_arm()
>>   2. job ends up in pending_list
>>   3. Sooner or later scheduler calls run_job()
>>   4. In return scheduler gets a dma_fence representing the resulting
>> hardware operation.
>>   5, This fence is put into a container to keep around what the hw is
>> actually executing.
>>   6. At some point the fence gets signaled informing the scheduler
>> that the next job can be pushed if enough credits are now available.
>>   
>>   There is no free_job callback any more because after run_job is
>> called the scheduler is done with the job and only the dma_fence
>> which represents the actually HW operation is the object the
>> scheduler now works with.
>>   
>>   We would still need something like a kill_job callback which is used
>> when an entity is released without flushing all jobs (see
>> drm_sched_entity_kill()), but that is then only used for a specific
>> corner case.
> Can't we just limit the scheduler's responsibility to telling the
> driver that it has to flush, and if it doesn't it's a bug?

We still need to remove the pending jobs from the scheduler if flushing 
times out.

Without timing out flushing and/or aborting when the process was killed 
we run into the same problem again that we don't want ti block on _fini().
>>   Blocking for the cleanup in drm_sched_fini() then becomes a trivial
>> dma_fence_wait() on the remaining unsignaled HW fences and eventually
>> on the latest killed fence.
> But that results in exactly the same situation as my waitque solution,
> doesn't it?

The key part is that dma_fence's have a documented requirement to never 
block infinitely.

> Blocking infinitely in drm_sched_fini():
>
> If the driver doesn't guarantee that all fences will get signaled, then
> wait_event() in the waitque solution will block forever. The same with
> dma_fence_wait().
>
> Or are you aiming at an interruptible dma_fence_wait(), thereby not
> blocking SIGKILL?

That is basically what drm_sched_entity_flush() is already doing.

> That then would still result in leaks. So basically the same situation
> as with an interruptible drm_sched_flush() function.
>
> (Although I agree that would probably be more elegant)

If the wait_event really would just waiting for dma_fences then yes.

The problem with the waitqueue approach is that we need to wait for the 
free_job callback and that callback is normally called from a work item 
without holding any additional locks.

When drm_sched_fini starts to block for that we create a rat-tail of new 
dependencies since that one is most likely called from a file descriptor 
destruction.

>
>>   
>>   The problem with this approach is that the idea of re-submitting
>> jobs in a GPU reset by the scheduler is then basically dead. But to
>> be honest that never worked correctly.
>>   
>>   See the deprecated warning I already put on
>> drm_sched_resubmit_jobs(). The job lifetime is not really well
>> defined because of this, see the free_guilty hack as well.
> drm_sched_resubmit_jobs() is being used by 5 drivers currently. So if
> we go for this approach we have to port them, first. That doesn't sound
> trivial to me

Yeah, completely agree. I think the scheduler should provide something 
like drm_sched_for_each_pending_fence() which helps to iterate over all 
the pending HW fences.

The individual drivers could then device by themself what to do, e.g. 
upcast the HW fence into the job and push it again or similar.

But what we really need to get away from are those fence pre-requisite 
violations Sima pointed out. For example we can't allocate memory for 
run_job.

Regards,
Christian.

>
>
> P.
>
>>   
>>   Regards,
>>   Christian.
>>   
>>   
>>>   
>>>
>>>
>>> Grüße,
>>> P.
>>>
>>>   
>>   
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20241118/5c581169/attachment-0001.htm>