[RFC 16/18] drm/sched: Connect with dma-fence deadlines
Tvrtko Ursulin
tvrtko.ursulin at igalia.com
Thu Jan 9 16:03:27 UTC 2025
On 09/01/2025 13:51, Christian König wrote:
> Am 09.01.25 um 14:41 schrieb Tvrtko Ursulin:
>>
>> On 09/01/2025 13:07, Christian König wrote:
>>> Am 08.01.25 um 19:35 schrieb Tvrtko Ursulin:
>>>> Now that the scheduling policy is deadline based it feels completely
>>>> natural to allow propagating externaly set deadlines to the scheduler.
>>>>
>>>> Scheduler deadlines are not a guarantee but as the dma-fence
>>>> facility is
>>>> already in use by userspace lets wire it up.
>>>>
>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at igalia.com>
>>>> Cc: Christian König <christian.koenig at amd.com>
>>>> Cc: Danilo Krummrich <dakr at redhat.com>
>>>> Cc: Matthew Brost <matthew.brost at intel.com>
>>>> Cc: Philipp Stanner <pstanner at redhat.com>
>>>> Cc: Rob Clark <robdclark at gmail.com>
>>>> ---
>>>> drivers/gpu/drm/scheduler/sched_entity.c | 24
>>>> ++++++++++++++++++++++--
>>>> 1 file changed, 22 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c
>>>> b/drivers/gpu/drm/scheduler/sched_entity.c
>>>> index 98c78d1373d8..db5d34310b18 100644
>>>> --- a/drivers/gpu/drm/scheduler/sched_entity.c
>>>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
>>>> @@ -410,7 +410,16 @@ ktime_t
>>>> drm_sched_entity_get_job_deadline(struct drm_sched_entity *entity,
>>>> struct drm_sched_job *job)
>>>> {
>>>> - return __drm_sched_entity_get_job_deadline(entity,
>>>> job->submit_ts);
>>>> + struct drm_sched_fence *s_fence = job->s_fence;
>>>> + struct dma_fence *fence = &s_fence->finished;
>>>> + ktime_t deadline;
>>>> +
>>>> + deadline = __drm_sched_entity_get_job_deadline(entity,
>>>> job->submit_ts);
>>>> + if (test_bit(DRM_SCHED_FENCE_FLAG_HAS_DEADLINE_BIT,
>>>> &fence->flags) &&
>>>> + ktime_before(s_fence->deadline, deadline))
>>>> + deadline = s_fence->deadline;
>>>> +
>>>> + return deadline;
>>>> }
>>>> /*
>>>> @@ -579,9 +588,12 @@ void drm_sched_entity_select_rq(struct
>>>> drm_sched_entity *entity)
>>>> */
>>>> void drm_sched_entity_push_job(struct drm_sched_job *sched_job)
>>>> {
>>>> + struct drm_sched_fence *s_fence = sched_job->s_fence;
>>>> struct drm_sched_entity *entity = sched_job->entity;
>>>> - bool first;
>>>> + struct dma_fence *fence = &s_fence->finished;
>>>> + ktime_t fence_deadline;
>>>> ktime_t submit_ts;
>>>> + bool first;
>>>> trace_drm_sched_job(sched_job, entity);
>>>> atomic_inc(entity->rq->sched->score);
>>>> @@ -593,6 +605,11 @@ void drm_sched_entity_push_job(struct
>>>> drm_sched_job *sched_job)
>>>> * Make sure to set the submit_ts first, to avoid a race.
>>>> */
>>>> sched_job->submit_ts = submit_ts = ktime_get();
>>>> + if (test_bit(DRM_SCHED_FENCE_FLAG_HAS_DEADLINE_BIT,
>>>> &fence->flags))
>>>> + fence_deadline = s_fence->deadline;
>>>> + else
>>>> + fence_deadline = KTIME_MAX;
>>>> +
>>>
>>> That makes no sense. When the job is pushed the fence is not made
>>> public yet.
>>>
>>> So no deadline can be set on the fence.
>>
>> You are correct, the push side of things was a mistake a laziness that
>> I did not remove it from the RFC.
>>
>>>> first = spsc_queue_push(&entity->job_queue,
>>>> &sched_job->queue_node);
>>>> /* first job wakes up scheduler */
>>>> @@ -601,6 +618,9 @@ void drm_sched_entity_push_job(struct
>>>> drm_sched_job *sched_job)
>>>> submit_ts = __drm_sched_entity_get_job_deadline(entity,
>>>> submit_ts);
>>>> + if (ktime_before(fence_deadline, submit_ts))
>>>> + submit_ts = fence_deadline;
>>>> +
>>>
>>> Yeah, that won't work at all as far as I can see.
>>
>> It works from the pop side though.
>
> Yeah, but only partially.
>
>>
>> When job N is scheduled, deadline is taken from N+1 and tree
>> re-balanced. At the point of N scheduling N+1 can definitely have a
>> real deadline set.
>
> The fundamental design problem with the fence deadline approach is that
> it sets the deadline only on the last submission instead of the first one.
>
> E.g. unigine heaven for example uses 3 submissions on a typical system.
>
> We would somehow need to propagate a deadline to previous submissions
> for this to work halve way correctly.
I played with this in one of my branches but was lacking a good test
case. In any case clean solution(s) would not be easy with the current
scheduler design.
Regards,
Tvrtko
>> What does not work is for queue depth of one. No way at the moment to
>> "bump" the entity in the tree while N is waiting for submission
>> because we cannot dereference the entity from the job. (I had that in
>> v1 of the series and realized it was unsafe.)
>>
>> I (very) briefly though about reference counting entities but quickly
>> had a feeling it would be annoying. So for now this patch only offers
>> a partial solution.
>
> Nah, please not.
>
> Regards,
> Christian.
>
>>
>> Regards,
>>
>> Tvrtko
>>
>>>> sched = drm_sched_rq_add_entity(entity->rq, entity,
>>>> submit_ts);
>>>> if (sched)
>>>> drm_sched_wakeup(sched);
>>>
>
More information about the dri-devel
mailing list