[PATCH] drm/sched: Eliminate drm_sched_run_job_queue_if_ready()

Tvrtko Ursulin tvrtko.ursulin at linux.intel.com
Mon Nov 6 12:54:34 UTC 2023


On 04/11/2023 00:25, Luben Tuikov wrote:
> Hi Tvrtko,
> 
> On 2023-11-03 06:39, Tvrtko Ursulin wrote:
>>
>> On 02/11/2023 22:46, Luben Tuikov wrote:
>>> Eliminate drm_sched_run_job_queue_if_ready() and instead just call
>>> drm_sched_run_job_queue() in drm_sched_free_job_work(). The problem is that
>>> the former function uses drm_sched_select_entity() to determine if the
>>> scheduler had an entity ready in one of its run-queues, and in the case of the
>>> Round-Robin (RR) scheduling, the function drm_sched_rq_select_entity_rr() does
>>> just that, selects the _next_ entity which is ready, sets up the run-queue and
>>> completion and returns that entity. The FIFO scheduling algorithm is unaffected.
>>>
>>> Now, since drm_sched_run_job_work() also calls drm_sched_select_entity(), then
>>> in the case of RR scheduling, that would result in calling select_entity()
>>> twice, which may result in skipping a ready entity if more than one entity is
>>> ready. This commit fixes this by eliminating the if_ready() variant.
>>
>> Fixes: is missing since the regression already landed.
> 
> Ah, yes, thank you for pointing that out. :-)
> I'll add one.
> 
>>
>>>
>>> Signed-off-by: Luben Tuikov <ltuikov89 at gmail.com>
>>> ---
>>>    drivers/gpu/drm/scheduler/sched_main.c | 14 ++------------
>>>    1 file changed, 2 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
>>> index 98b2ad54fc7071..05816e7cae8c8b 100644
>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>> @@ -1040,16 +1040,6 @@ drm_sched_pick_best(struct drm_gpu_scheduler **sched_list,
>>>    }
>>>    EXPORT_SYMBOL(drm_sched_pick_best);
>>>    
>>> -/**
>>> - * drm_sched_run_job_queue_if_ready - enqueue run-job work if ready
>>> - * @sched: scheduler instance
>>> - */
>>> -static void drm_sched_run_job_queue_if_ready(struct drm_gpu_scheduler *sched)
>>> -{
>>> -	if (drm_sched_select_entity(sched))
>>> -		drm_sched_run_job_queue(sched);
>>> -}
>>> -
>>>    /**
>>>     * drm_sched_free_job_work - worker to call free_job
>>>     *
>>> @@ -1069,7 +1059,7 @@ static void drm_sched_free_job_work(struct work_struct *w)
>>>    		sched->ops->free_job(cleanup_job);
>>>    
>>>    		drm_sched_free_job_queue_if_done(sched);
>>> -		drm_sched_run_job_queue_if_ready(sched);
>>> +		drm_sched_run_job_queue(sched);
>>
>> It works but is a bit wasteful causing needless CPU wake ups with a
> 
> I'd not worry about "waking up the CPU" as the CPU scheduler would most likely
> put the wq on the same CPU by instruction cache locality.
> 
>> potentially empty queue, both here and in drm_sched_run_job_work below.
> 
> That's true, but if you were to look at the typical execution of
> this code you'd see we get a string of function entry when the incoming queue
> is non-empty, followed by one empty entry only to be taken off the CPU. So,
> it really isn't a breaker.
> 
> So, there's a way to mitigate this in drm_sched_run_job_work(). I'll see that it
> makes it in the next version of the patch.

Okay, I will be keeping an eye on that.

Separately, I might send a patch to do the "re-queue if more pending" in 
one atomic section. (Instead of re-acquiring the lock.)

And also as a heads up, at some point in the next few months I will 
start looking at the latency and power effects of the "do just one and 
re-queue" conversion. In ChromeOS milli-Watts matter and some things 
like media playback do have a lot of inter-engine dependencies. So 
keeping the CPU C state residency low might matter. Well, it might 
matter for server media transcode stream density workloads too, both 
from power and stream capacity per socket metrics.

Regards,

Tvrtko


More information about the dri-devel mailing list