[Intel-xe] [PATCH] drm/sched: Eliminate drm_sched_run_job_queue_if_ready()

Fri Nov 3 10:39:15 UTC 2023

On 02/11/2023 22:46, Luben Tuikov wrote:
> Eliminate drm_sched_run_job_queue_if_ready() and instead just call
> drm_sched_run_job_queue() in drm_sched_free_job_work(). The problem is that
> the former function uses drm_sched_select_entity() to determine if the
> scheduler had an entity ready in one of its run-queues, and in the case of the
> Round-Robin (RR) scheduling, the function drm_sched_rq_select_entity_rr() does
> just that, selects the _next_ entity which is ready, sets up the run-queue and
> completion and returns that entity. The FIFO scheduling algorithm is unaffected.
> 
> Now, since drm_sched_run_job_work() also calls drm_sched_select_entity(), then
> in the case of RR scheduling, that would result in calling select_entity()
> twice, which may result in skipping a ready entity if more than one entity is
> ready. This commit fixes this by eliminating the if_ready() variant.

Fixes: is missing since the regression already landed.

> 
> Signed-off-by: Luben Tuikov <ltuikov89 at gmail.com>
> ---
>   drivers/gpu/drm/scheduler/sched_main.c | 14 ++------------
>   1 file changed, 2 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index 98b2ad54fc7071..05816e7cae8c8b 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -1040,16 +1040,6 @@ drm_sched_pick_best(struct drm_gpu_scheduler **sched_list,
>   }
>   EXPORT_SYMBOL(drm_sched_pick_best);
>   
> -/**
> - * drm_sched_run_job_queue_if_ready - enqueue run-job work if ready
> - * @sched: scheduler instance
> - */
> -static void drm_sched_run_job_queue_if_ready(struct drm_gpu_scheduler *sched)
> -{
> -	if (drm_sched_select_entity(sched))
> -		drm_sched_run_job_queue(sched);
> -}
> -
>   /**
>    * drm_sched_free_job_work - worker to call free_job
>    *
> @@ -1069,7 +1059,7 @@ static void drm_sched_free_job_work(struct work_struct *w)
>   		sched->ops->free_job(cleanup_job);
>   
>   		drm_sched_free_job_queue_if_done(sched);
> -		drm_sched_run_job_queue_if_ready(sched);
> +		drm_sched_run_job_queue(sched);

It works but is a bit wasteful causing needless CPU wake ups with a 
potentially empty queue, both here and in drm_sched_run_job_work below.

What would be the problem in having a "peek" type helper? It would be 
easy to do it in a single spin lock section instead of drop and re-acquire.

What is even the point of having the re-queue here _inside_ the if 
(cleanup_job) block? See 
https://lists.freedesktop.org/archives/dri-devel/2023-November/429037.html. 
Because of the lock drop and re-acquire I don't see that it makes sense 
to make potential re-queue depend on the existence of current finished job.

Also the point of doing the re-queue of the run job queue from the free 
worker?

(I suppose re-queuing the _free_ worker itself is needed in the current 
design, albeit inefficient.)

Regards,

Tvrtko

>   	}
>   }
>   
> @@ -1127,7 +1117,7 @@ static void drm_sched_run_job_work(struct work_struct *w)
>   	}
>   
>   	wake_up(&sched->job_scheduled);
> -	drm_sched_run_job_queue_if_ready(sched);
> +	drm_sched_run_job_queue(sched);
>   }
>   
>   /**
> 
> base-commit: 6fd9487147c4f18ad77eea00bd8c9189eec74a3e