[PATCH 3/8] drm/sched: Always increment correct scheduler score

Christian König christian.koenig at amd.com
Mon Sep 30 13:07:11 UTC 2024


Am 30.09.24 um 15:01 schrieb Tvrtko Ursulin:
>
> On 13/09/2024 17:05, Tvrtko Ursulin wrote:
>> From: Tvrtko Ursulin <tvrtko.ursulin at igalia.com>
>>
>> Entities run queue can change during drm_sched_entity_push_job() so make
>> sure to update the score consistently.
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at igalia.com>
>> Fixes: d41a39dda140 ("drm/scheduler: improve job distribution with 
>> multiple queues")
>> Cc: Nirmoy Das <nirmoy.das at amd.com>
>> Cc: Christian König <christian.koenig at amd.com>
>> Cc: Luben Tuikov <ltuikov89 at gmail.com>
>> Cc: Matthew Brost <matthew.brost at intel.com>
>> Cc: David Airlie <airlied at gmail.com>
>> Cc: Daniel Vetter <daniel at ffwll.ch>
>> Cc: dri-devel at lists.freedesktop.org
>> Cc: <stable at vger.kernel.org> # v5.9+
>> Reviewed-by: Christian König <christian.koenig at amd.com>
>> Reviewed-by: Nirmoy Das <nirmoy.das at intel.com>
>> ---
>>   drivers/gpu/drm/scheduler/sched_entity.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
>> b/drivers/gpu/drm/scheduler/sched_entity.c
>> index 76e422548d40..6645a8524699 100644
>> --- a/drivers/gpu/drm/scheduler/sched_entity.c
>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
>> @@ -586,7 +586,6 @@ void drm_sched_entity_push_job(struct 
>> drm_sched_job *sched_job)
>>       ktime_t submit_ts;
>>         trace_drm_sched_job(sched_job, entity);
>> -    atomic_inc(entity->rq->sched->score);
>>       WRITE_ONCE(entity->last_user, current->group_leader);
>>         /*
>> @@ -614,6 +613,7 @@ void drm_sched_entity_push_job(struct 
>> drm_sched_job *sched_job)
>>           rq = entity->rq;
>>           sched = rq->sched;
>>   +        atomic_inc(sched->score);
>
> Ugh this is wrong. :(
>
> I was working on some further consolidation and realised this.
>
> It will create an imbalance in score since score is currently supposed 
> to be accounted twice:
>
>  1. +/- 1 for each entity (de-)queued
>  2. +/- 1 for each job queued/completed
>
> By moving it into the "if (first) branch" it unbalances it.
>
> But it is still true the original placement is racy. It looks like 
> what is required is an unconditional entity->lock section after 
> spsc_queue_push. AFAICT that's the only way to be sure entity->rq is 
> set for the submission at hand.
>
> Question also is, why +/- score in entity add/remove and not just for 
> jobs?
>
> In the meantime patch will need to get reverted.

Ok going to revert that.

I also just realized that we don't need to change anything. The rq can't 
change as soon as there is a job armed for it.

So having the increment right before pushing the armed job to the entity 
was actually correct in the first place.

Regards,
Christian.

>
> Regards,
>
> Tvrtko
>
>>           drm_sched_rq_add_entity(rq, entity);
>>           spin_unlock(&entity->rq_lock);



More information about the amd-gfx mailing list