[PATCH 3/8] drm/sched: Always increment correct scheduler score
Christian König
christian.koenig at amd.com
Mon Sep 30 13:07:11 UTC 2024
Am 30.09.24 um 15:01 schrieb Tvrtko Ursulin:
>
> On 13/09/2024 17:05, Tvrtko Ursulin wrote:
>> From: Tvrtko Ursulin <tvrtko.ursulin at igalia.com>
>>
>> Entities run queue can change during drm_sched_entity_push_job() so make
>> sure to update the score consistently.
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at igalia.com>
>> Fixes: d41a39dda140 ("drm/scheduler: improve job distribution with
>> multiple queues")
>> Cc: Nirmoy Das <nirmoy.das at amd.com>
>> Cc: Christian König <christian.koenig at amd.com>
>> Cc: Luben Tuikov <ltuikov89 at gmail.com>
>> Cc: Matthew Brost <matthew.brost at intel.com>
>> Cc: David Airlie <airlied at gmail.com>
>> Cc: Daniel Vetter <daniel at ffwll.ch>
>> Cc: dri-devel at lists.freedesktop.org
>> Cc: <stable at vger.kernel.org> # v5.9+
>> Reviewed-by: Christian König <christian.koenig at amd.com>
>> Reviewed-by: Nirmoy Das <nirmoy.das at intel.com>
>> ---
>> drivers/gpu/drm/scheduler/sched_entity.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c
>> b/drivers/gpu/drm/scheduler/sched_entity.c
>> index 76e422548d40..6645a8524699 100644
>> --- a/drivers/gpu/drm/scheduler/sched_entity.c
>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
>> @@ -586,7 +586,6 @@ void drm_sched_entity_push_job(struct
>> drm_sched_job *sched_job)
>> ktime_t submit_ts;
>> trace_drm_sched_job(sched_job, entity);
>> - atomic_inc(entity->rq->sched->score);
>> WRITE_ONCE(entity->last_user, current->group_leader);
>> /*
>> @@ -614,6 +613,7 @@ void drm_sched_entity_push_job(struct
>> drm_sched_job *sched_job)
>> rq = entity->rq;
>> sched = rq->sched;
>> + atomic_inc(sched->score);
>
> Ugh this is wrong. :(
>
> I was working on some further consolidation and realised this.
>
> It will create an imbalance in score since score is currently supposed
> to be accounted twice:
>
> 1. +/- 1 for each entity (de-)queued
> 2. +/- 1 for each job queued/completed
>
> By moving it into the "if (first) branch" it unbalances it.
>
> But it is still true the original placement is racy. It looks like
> what is required is an unconditional entity->lock section after
> spsc_queue_push. AFAICT that's the only way to be sure entity->rq is
> set for the submission at hand.
>
> Question also is, why +/- score in entity add/remove and not just for
> jobs?
>
> In the meantime patch will need to get reverted.
Ok going to revert that.
I also just realized that we don't need to change anything. The rq can't
change as soon as there is a job armed for it.
So having the increment right before pushing the armed job to the entity
was actually correct in the first place.
Regards,
Christian.
>
> Regards,
>
> Tvrtko
>
>> drm_sched_rq_add_entity(rq, entity);
>> spin_unlock(&entity->rq_lock);
More information about the amd-gfx
mailing list