[Intel-xe] [PATCH] drm/sched: Eliminate drm_sched_run_job_queue_if_ready()
Tvrtko Ursulin
tvrtko.ursulin at linux.intel.com
Fri Nov 3 10:39:15 UTC 2023
On 02/11/2023 22:46, Luben Tuikov wrote:
> Eliminate drm_sched_run_job_queue_if_ready() and instead just call
> drm_sched_run_job_queue() in drm_sched_free_job_work(). The problem is that
> the former function uses drm_sched_select_entity() to determine if the
> scheduler had an entity ready in one of its run-queues, and in the case of the
> Round-Robin (RR) scheduling, the function drm_sched_rq_select_entity_rr() does
> just that, selects the _next_ entity which is ready, sets up the run-queue and
> completion and returns that entity. The FIFO scheduling algorithm is unaffected.
>
> Now, since drm_sched_run_job_work() also calls drm_sched_select_entity(), then
> in the case of RR scheduling, that would result in calling select_entity()
> twice, which may result in skipping a ready entity if more than one entity is
> ready. This commit fixes this by eliminating the if_ready() variant.
Fixes: is missing since the regression already landed.
>
> Signed-off-by: Luben Tuikov <ltuikov89 at gmail.com>
> ---
> drivers/gpu/drm/scheduler/sched_main.c | 14 ++------------
> 1 file changed, 2 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index 98b2ad54fc7071..05816e7cae8c8b 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -1040,16 +1040,6 @@ drm_sched_pick_best(struct drm_gpu_scheduler **sched_list,
> }
> EXPORT_SYMBOL(drm_sched_pick_best);
>
> -/**
> - * drm_sched_run_job_queue_if_ready - enqueue run-job work if ready
> - * @sched: scheduler instance
> - */
> -static void drm_sched_run_job_queue_if_ready(struct drm_gpu_scheduler *sched)
> -{
> - if (drm_sched_select_entity(sched))
> - drm_sched_run_job_queue(sched);
> -}
> -
> /**
> * drm_sched_free_job_work - worker to call free_job
> *
> @@ -1069,7 +1059,7 @@ static void drm_sched_free_job_work(struct work_struct *w)
> sched->ops->free_job(cleanup_job);
>
> drm_sched_free_job_queue_if_done(sched);
> - drm_sched_run_job_queue_if_ready(sched);
> + drm_sched_run_job_queue(sched);
It works but is a bit wasteful causing needless CPU wake ups with a
potentially empty queue, both here and in drm_sched_run_job_work below.
What would be the problem in having a "peek" type helper? It would be
easy to do it in a single spin lock section instead of drop and re-acquire.
What is even the point of having the re-queue here _inside_ the if
(cleanup_job) block? See
https://lists.freedesktop.org/archives/dri-devel/2023-November/429037.html.
Because of the lock drop and re-acquire I don't see that it makes sense
to make potential re-queue depend on the existence of current finished job.
Also the point of doing the re-queue of the run job queue from the free
worker?
(I suppose re-queuing the _free_ worker itself is needed in the current
design, albeit inefficient.)
Regards,
Tvrtko
> }
> }
>
> @@ -1127,7 +1117,7 @@ static void drm_sched_run_job_work(struct work_struct *w)
> }
>
> wake_up(&sched->job_scheduled);
> - drm_sched_run_job_queue_if_ready(sched);
> + drm_sched_run_job_queue(sched);
> }
>
> /**
>
> base-commit: 6fd9487147c4f18ad77eea00bd8c9189eec74a3e
More information about the Intel-xe
mailing list