[PATCH RFC 10/18] drm/scheduler: Add can_run_job callback

Thu Mar 9 08:05:57 UTC 2023

Am 09.03.23 um 07:30 schrieb Asahi Lina:
> On 09/03/2023 05.14, Christian König wrote:
>>> I think you mean wake_up_interruptible(). That would be
>>> drm_sched_job_done(), on the fence callback when a job completes, which
>>> as I keep saying is the same logic used for
>>> hw_rq_count/hw_submission_limit tracking.
>> As the documentation to wait_event says:
>>
>>    * wake_up() has to be called after changing any variable that could
>>    * change the result of the wait condition.
>>
>> So what you essentially try to do here is to skip that and say
>> drm_sched_job_done() would call that anyway, but when you read any
>> variable to determine that state then as far as I can see nothing is
>> guarantying that order.
> The driver needs to guarantee that any changes to that state precede a
> job completion fence signal of course, that's the entire idea of the
> API. It's supposed to represent a check for per-scheduler (or more
> specific, but not more global) resources that are released on job
> completion. Of course if you misuse the API you could cause a problem,
> but what I'm trying to say is that the API as designed and when used as
> intended does work properly.
>
> Put another way: job completions always need to cause the sched main
> loop to run an iteration anyway (otherwise we wouldn't make forward
> progress), and job completions are exactly the signal that the
> can_run_job() condition may have changed.
>
>> The only other possibility how you could use the callback correctly
>> would be to call drm_fence_is_signaled() to query the state of your hw
>> submission from the same fence which is then signaled. But then the
>> question is once more why you don't give that fence directly to the
>> scheduler?
> But the driver is supposed to guarantee that the ordering is always 1.
> resources freed, 2. fence signaled. So you don't need to check for the
> fence, you can just check for the resource state.

Yeah, but this is exactly what the dma_fence framework tried to prevent. 
We try very hard to avoid such side channel signaling :)

But putting that issue aside for a moment. What I don't get is when you 
have such intra queue dependencies, then why can't you check that at a 
much higher level?

In other words even userspace should be able to predict that for it's 
submissions X amount of resources are needed and when all of my 
submissions run in parallel that won't work.

Asking the firmware for a status is usually a magnitudes slower than 
just computing it before submission.

Regards,
Christian.

> If the callback
> returns false then by definition the fence wasn't yet signaled at some
> point during its execution (because the resources weren't yet freed),
> and since it would be in the wait_event_interruptible() check path, by
> definition the fence signaling at any point during or after the check
> would cause the thread to wake up again and re-check.
>
> Thread 1                                          Thread 2
> 1. wait_event_interruptible() arms wq             1. Free resources
> 2. can_run_job() checks resources                 2. Signal fence
> 3. wait_event_interruptible() sleeps on wq        3. Fence wakes up wq
> 4. loop
>
> There is no possible interleaving of those sequences that leads to a
> lost event and the thread not waking up:
> - If T2.3 happens before T1.1, that means T2.1 happened earlier and T1.2
> must return true.
> - If T2.3 happens after T1.1 but before T1.3, the wq code will ensure
> the wq does not sleep (or immediately wakes up) at T1.3 since it was
> signaled during the condition check, after the wq was armed. At the next
> check loop, T1.2 will then return true, since T2.1 already happened
> before T2.3.
> - If T2.3 happens during T1.3, the wq wakes up normally and does another
> check, and at that point T1.2 returns true.
>
> QED.
>
> ~~ Lina