[PATCH] drm/sched: Eliminate drm_sched_run_job_queue_if_ready()

Mon Nov 6 12:54:34 UTC 2023

On 04/11/2023 00:25, Luben Tuikov wrote:
> Hi Tvrtko,
> 
> On 2023-11-03 06:39, Tvrtko Ursulin wrote:
>>
>> On 02/11/2023 22:46, Luben Tuikov wrote:
>>> Eliminate drm_sched_run_job_queue_if_ready() and instead just call
>>> drm_sched_run_job_queue() in drm_sched_free_job_work(). The problem is that
>>> the former function uses drm_sched_select_entity() to determine if the
>>> scheduler had an entity ready in one of its run-queues, and in the case of the
>>> Round-Robin (RR) scheduling, the function drm_sched_rq_select_entity_rr() does
>>> just that, selects the _next_ entity which is ready, sets up the run-queue and
>>> completion and returns that entity. The FIFO scheduling algorithm is unaffected.
>>>
>>> Now, since drm_sched_run_job_work() also calls drm_sched_select_entity(), then
>>> in the case of RR scheduling, that would result in calling select_entity()
>>> twice, which may result in skipping a ready entity if more than one entity is
>>> ready. This commit fixes this by eliminating the if_ready() variant.
>>
>> Fixes: is missing since the regression already landed.
> 
> Ah, yes, thank you for pointing that out. :-)
> I'll add one.
> 
>>
>>>
>>> Signed-off-by: Luben Tuikov <ltuikov89 at gmail.com>
>>> ---
>>>    drivers/gpu/drm/scheduler/sched_main.c | 14 ++------------
>>>    1 file changed, 2 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
>>> index 98b2ad54fc7071..05816e7cae8c8b 100644
>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>> @@ -1040,16 +1040,6 @@ drm_sched_pick_best(struct drm_gpu_scheduler **sched_list,
>>>    }
>>>    EXPORT_SYMBOL(drm_sched_pick_best);
>>>    
>>> -/**
>>> - * drm_sched_run_job_queue_if_ready - enqueue run-job work if ready
>>> - * @sched: scheduler instance
>>> - */
>>> -static void drm_sched_run_job_queue_if_ready(struct drm_gpu_scheduler *sched)
>>> -{
>>> -	if (drm_sched_select_entity(sched))
>>> -		drm_sched_run_job_queue(sched);
>>> -}
>>> -
>>>    /**
>>>     * drm_sched_free_job_work - worker to call free_job
>>>     *
>>> @@ -1069,7 +1059,7 @@ static void drm_sched_free_job_work(struct work_struct *w)
>>>    		sched->ops->free_job(cleanup_job);
>>>    
>>>    		drm_sched_free_job_queue_if_done(sched);
>>> -		drm_sched_run_job_queue_if_ready(sched);
>>> +		drm_sched_run_job_queue(sched);
>>
>> It works but is a bit wasteful causing needless CPU wake ups with a
> 
> I'd not worry about "waking up the CPU" as the CPU scheduler would most likely
> put the wq on the same CPU by instruction cache locality.
> 
>> potentially empty queue, both here and in drm_sched_run_job_work below.
> 
> That's true, but if you were to look at the typical execution of
> this code you'd see we get a string of function entry when the incoming queue
> is non-empty, followed by one empty entry only to be taken off the CPU. So,
> it really isn't a breaker.
> 
> So, there's a way to mitigate this in drm_sched_run_job_work(). I'll see that it
> makes it in the next version of the patch.

Okay, I will be keeping an eye on that.

Separately, I might send a patch to do the "re-queue if more pending" in 
one atomic section. (Instead of re-acquiring the lock.)

And also as a heads up, at some point in the next few months I will 
start looking at the latency and power effects of the "do just one and 
re-queue" conversion. In ChromeOS milli-Watts matter and some things 
like media playback do have a lot of inter-engine dependencies. So 
keeping the CPU C state residency low might matter. Well, it might 
matter for server media transcode stream density workloads too, both 
from power and stream capacity per socket metrics.

Regards,

Tvrtko