[PATCH RFC 10/18] drm/scheduler: Add can_run_job callback

Wed Mar 8 20:14:34 UTC 2023

Am 08.03.23 um 20:45 schrieb Asahi Lina:
> On 09/03/2023 04.12, Christian König wrote:
>> Am 08.03.23 um 20:05 schrieb Asahi Lina:
>>> [SNIP]
>>>> Well it's not the better way, it's the only way that works.
>>>>
>>>> I have to admit that my bet on your intentions was wrong, but even that
>>>> use case doesn't work correctly.
>>>>
>>>> See when your callback returns false it is perfectly possible that all
>>>> hw fences are signaled between returning that information and processing it.
>>>>
>>>> The result would be that the scheduler goes to sleep and never wakes up
>>>> again.
>>> That can't happen, because it will just go into another iteration of the
>>> drm_sched main loop since there is an entity available still.
>>>
>>> Rather there is probably the opposite bug in this patch: the can_run_job
>>> logic should be moved into the wait_event_interruptible() condition
>>> check, otherwise I think it can end up busy-looping since the condition
>>> itself can be true even when the can_run_job check blocks it.
>>>
>>> But there is no risk of it going to sleep and never waking up because
>>> job completions will wake up the waitqueue by definition, and that
>>> happens after the driver-side queues are popped. If this problem could
>>> happen, then the existing hw_submission_limit logic would be broken in
>>> the same way. It is logically equivalent in how it works.
>>>
>>> Basically, if properly done in wait_event_interruptible, it is exactly
>>> the logic of that macro that prevents this race condition and makes
>>> everything work at all. Without it, drm_sched would be completely broken.
>>>
>>>> As I said we exercised those ideas before and yes this approach here
>>>> came up before as well and no it doesn't work.
>>> It can never deadlock with this patch as it stands (though it could busy
>>> loop), and if properly moved into the wait_event_interruptible(), it
>>> would also never busy loop and work entirely as intended. The actual API
>>> change is sound.
>>>
>>> I don't know why you're trying so hard to convince everyone that this
>>> approach is fundamentally broken... It might be a bad idea for other
>>> reasons, it might encourage incorrect usage, it might not be the best
>>> option, there are plenty of arguments you can make... but you just keep
>>> trying to make an argument that it just can't work at all for some
>>> reason. Why? I already said I'm happy dropping it in favor of the fences...
>> Well because it is broken.
>>
>> When you move the check into the wait_event_interruptible condition then
>> who is going to call wait_event_interruptible when the condition changes?
> I think you mean wake_up_interruptible(). That would be
> drm_sched_job_done(), on the fence callback when a job completes, which
> as I keep saying is the same logic used for
> hw_rq_count/hw_submission_limit tracking.

As the documentation to wait_event says:

  * wake_up() has to be called after changing any variable that could
  * change the result of the wait condition.

So what you essentially try to do here is to skip that and say 
drm_sched_job_done() would call that anyway, but when you read any 
variable to determine that state then as far as I can see nothing is 
guarantying that order.

The only other possibility how you could use the callback correctly 
would be to call drm_fence_is_signaled() to query the state of your hw 
submission from the same fence which is then signaled. But then the 
question is once more why you don't give that fence directly to the 
scheduler?

> Please think about it for a second,

Yeah, I'm trying to really follow your intentions here. But that doesn't 
really makes sense.

Either you are trying to do something invalid or you are trying to 
circumvent the object model somehow and add a shortcut for the signaling 
API. Both would be more than fishy.

Regards,
Christian.

>   it's really not that complicated to
> see why it works:
>
> - Driver pops off completed commands <-- can_run_job condition satisfied
> - Driver signals fence
>   - drm_sched_job_done_cb()
>    - drm_sched_job_done()
>     - atomic_dec(&sched->hw_rq_count); <-- hw_submission_limit satisfied
>     - ...
>     - wake_up_interruptible(&sched->wake_up_worker);
>        ^- happens after both conditions are potentially satisfied
>
> It really is completely equivalent to just making the hw_rq_count logic
> customizable by the driver. The actual flow is the same. As long as the
> driver guarantees it satisfies the can_run_job() condition before
> signaling the completion fence that triggered that change, it works fine.
>
>> As I said this idea came up before and was rejected multiple times.
> Maybe it was a different idea, or maybe it was rejected for other
> reasons, or maybe it was wrongly rejected for being broken when it isn't ^^
>
> ~~ Lina