[RFC PATCH] drm/scheduler: Set the FIFO schedulig policy as the default

Mon Oct 24 17:25:06 UTC 2022

Hi Christian,

Can you add an R-B to V1 of this patch, as the v1 is what I'd like pushed.
I sent it right after this one.

Regards,
Luben

On 2022-10-24 06:42, Christian König wrote:
> Am 22.10.22 um 03:09 schrieb Luben Tuikov:
>> The currently default Round-Robin GPU scheduling can result in starvation
>> of entities which have a large number of jobs, over entities which have
>> a very small number of jobs (single digit).
>>
>> This can be illustrated in the following diagram, where jobs are
>> alphabetized to show their chronological order of arrival, where job A is
>> the oldest, B is the second oldest, and so on, to J, the most recent job to
>> arrive.
>>
>>      ---> entities
>> j | H-F-----A--E--I--
>> o | --G-----B-----J--
>> b | --------C--------
>> s\/ --------D--------
>>
>> WLOG, asuming all jobs are "ready", then a R-R scheduling will execute them
>> in the following order (a slice off of the top of the entities' list),
>>
>> H, F, A, E, I, G, B, J, C, D.
>>
>> However, to mitigate job starvation, we'd rather execute C and D before E,
>> and so on, given, of course, that they're all ready to be executed.
>>
>> So, if all jobs are ready at this instant, the order of execution for this
>> and the next 9 instances of picking the next job to execute, should really
>> be,
>>
>> A, B, C, D, E, F, G, H, I, J,
>>
>> which is their chronological order. The only reason for this order to be
>> broken, is if an older job is not yet ready, but a younger job is ready, at
>> an instant of picking a new job to execute. For instance if job C wasn't
>> ready at time 2, but job D was ready, then we'd pick job D, like this:
>>
>> 0 +1 +2  ...
>> A, B, D, ...
>>
>> And from then on, C would be preferred before all other jobs, if it is ready
>> at the time when a new job for execution is picked. So, if C became ready
>> two steps later, the execution order would look like this:
>>
>> ......0 +1 +2  ...
>> A, B, D, E, C, F, G, H, I, J
>>
>> This is what the FIFO GPU scheduling algorithm achieves. It uses a
>> Red-Black tree to keep jobs sorted in chronological order, where picking
>> the oldest job is O(1) (we use the "cached" structure), and balancing the
>> tree is O(log n). IOW, it picks the *oldest ready* job to execute now.
>>
>> The implemntation is already in the kernel, and this commit only changes
>> the default GPU scheduling algorithm to use.
>>
>> This was tested and achieves about 1% faster performance over the Round
>> Robin algorithm.
>>
>> Cc: Christian König <christian.koenig at amd.com>
>> Cc: Alex Deucher <Alexander.Deucher at amd.com>
>> Cc: Direct Rendering Infrastructure - Development <dri-devel at lists.freedesktop.org>
>> Signed-off-by: Luben Tuikov <luben.tuikov at amd.com>
> 
> Reviewed-by: Christian König <christian.koenig at amd.com>
> 
>> ---
>>   drivers/gpu/drm/scheduler/sched_main.c | 4 ++--
>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
>> index 2fab218d708279..d0ff9e11cb69fa 100644
>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>> @@ -62,13 +62,13 @@
>>   #define to_drm_sched_job(sched_job)		\
>>   		container_of((sched_job), struct drm_sched_job, queue_node)
>>   
>> -int drm_sched_policy = DRM_SCHED_POLICY_RR;
>> +int drm_sched_policy = DRM_SCHED_POLICY_FIFO;
>>   
>>   /**
>>    * DOC: sched_policy (int)
>>    * Used to override default entities scheduling policy in a run queue.
>>    */
>> -MODULE_PARM_DESC(sched_policy, "Specify schedule policy for entities on a runqueue, " __stringify(DRM_SCHED_POLICY_RR) " = Round Robin (default), " __stringify(DRM_SCHED_POLICY_FIFO) " = use FIFO.");
>> +MODULE_PARM_DESC(sched_policy, "Specify the scheduling policy for entities on a run-queue, " __stringify(DRM_SCHED_POLICY_RR) " = Round Robin, " __stringify(DRM_SCHED_POLICY_FIFO) " = FIFO (default).");
>>   module_param_named(sched_policy, drm_sched_policy, int, 0444);
>>   
>>   static __always_inline bool drm_sched_entity_compare_before(struct rb_node *a,
>>
>> base-commit: 16d2a3f2ad1d2b95bf9122c910c63b0efe74179d
>