[PATCH 3/8] drm/sched: Always increment correct scheduler score

Tvrtko Ursulin tvrtko.ursulin at igalia.com
Mon Sep 30 13:01:27 UTC 2024


On 13/09/2024 17:05, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin at igalia.com>
> 
> Entities run queue can change during drm_sched_entity_push_job() so make
> sure to update the score consistently.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at igalia.com>
> Fixes: d41a39dda140 ("drm/scheduler: improve job distribution with multiple queues")
> Cc: Nirmoy Das <nirmoy.das at amd.com>
> Cc: Christian König <christian.koenig at amd.com>
> Cc: Luben Tuikov <ltuikov89 at gmail.com>
> Cc: Matthew Brost <matthew.brost at intel.com>
> Cc: David Airlie <airlied at gmail.com>
> Cc: Daniel Vetter <daniel at ffwll.ch>
> Cc: dri-devel at lists.freedesktop.org
> Cc: <stable at vger.kernel.org> # v5.9+
> Reviewed-by: Christian König <christian.koenig at amd.com>
> Reviewed-by: Nirmoy Das <nirmoy.das at intel.com>
> ---
>   drivers/gpu/drm/scheduler/sched_entity.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
> index 76e422548d40..6645a8524699 100644
> --- a/drivers/gpu/drm/scheduler/sched_entity.c
> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> @@ -586,7 +586,6 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job)
>   	ktime_t submit_ts;
>   
>   	trace_drm_sched_job(sched_job, entity);
> -	atomic_inc(entity->rq->sched->score);
>   	WRITE_ONCE(entity->last_user, current->group_leader);
>   
>   	/*
> @@ -614,6 +613,7 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job)
>   		rq = entity->rq;
>   		sched = rq->sched;
>   
> +		atomic_inc(sched->score);

Ugh this is wrong. :(

I was working on some further consolidation and realised this.

It will create an imbalance in score since score is currently supposed 
to be accounted twice:

  1. +/- 1 for each entity (de-)queued
  2. +/- 1 for each job queued/completed

By moving it into the "if (first) branch" it unbalances it.

But it is still true the original placement is racy. It looks like what 
is required is an unconditional entity->lock section after 
spsc_queue_push. AFAICT that's the only way to be sure entity->rq is set 
for the submission at hand.

Question also is, why +/- score in entity add/remove and not just for jobs?

In the meantime patch will need to get reverted.

Regards,

Tvrtko

>   		drm_sched_rq_add_entity(rq, entity);
>   		spin_unlock(&entity->rq_lock);
>   


More information about the dri-devel mailing list