[PATCH] drm/sced: Add FIFO policy for scheduler rq

Tue Aug 23 18:13:27 UTC 2022

On 2022-08-23 12:58, Luben Tuikov wrote:
> Inlined:
>
> On 2022-08-22 16:09, Andrey Grodzovsky wrote:
>> Poblem: Given many entities competing for same rq on
> ^Problem
>
>> same scheduler an uncceptabliy long wait time for some
> ^unacceptably
>
>> jobs waiting stuck in rq before being picked up are
>> observed (seen using  GPUVis).
>> The issue is due to Round Robin policy used by scheduler
>> to pick up the next entity for execution. Under stress
>> of many entities and long job queus within entity some
> ^queues
>
>> jobs could be stack for very long time in it's entity's
>> queue before being popped from the queue and executed
>> while for other entites with samller job queues a job
> ^entities; smaller
>
>> might execute ealier even though that job arrived later
> ^earlier
>
>> then the job in the long queue.
>>
>> Fix:
>> Add FIFO selection policy to entites in RQ, chose next enitity
>> on rq in such order that if job on one entity arrived
>> ealrier then job on another entity the first job will start
>> executing ealier regardless of the length of the entity's job
>> queue.
>>
>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky at amd.com>
>> Tested-by: Li Yunxiang (Teddy) <Yunxiang.Li at amd.com>
>> ---
>>   drivers/gpu/drm/scheduler/sched_entity.c |  2 +
>>   drivers/gpu/drm/scheduler/sched_main.c   | 65 ++++++++++++++++++++++--
>>   include/drm/gpu_scheduler.h              |  8 +++
>>   3 files changed, 71 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
>> index 6b25b2f4f5a3..3bb7f69306ef 100644
>> --- a/drivers/gpu/drm/scheduler/sched_entity.c
>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
>> @@ -507,6 +507,8 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job)
>>   	atomic_inc(entity->rq->sched->score);
>>   	WRITE_ONCE(entity->last_user, current->group_leader);
>>   	first = spsc_queue_push(&entity->job_queue, &sched_job->queue_node);
>> +	sched_job->submit_ts = ktime_get();
>> +
>>   
>>   	/* first job wakes up scheduler */
>>   	if (first) {
>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
>> index 68317d3a7a27..c123aa120d06 100644
>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>> @@ -59,6 +59,19 @@
>>   #define CREATE_TRACE_POINTS
>>   #include "gpu_scheduler_trace.h"
>>   
>> +
>> +
>> +int drm_sched_policy = -1;
>> +
>> +/**
>> + * DOC: sched_policy (int)
>> + * Used to override default entites scheduling policy in a run queue.
>> + */
>> +MODULE_PARM_DESC(sched_policy,
>> +		"specify schedule policy for entites on a runqueue (-1 = auto(default) value, 0 = Round Robin,1  = use FIFO");
>> +module_param_named(sched_policy, drm_sched_policy, int, 0444);
> As per Christian's comments, you can drop the "auto" and perhaps leave one as the default,
> say the RR.
>
> I do think it is beneficial to have a module parameter control the scheduling policy, as shown above.

Christian is not against it, just against adding 'auto' here - like the 
default.

>
>> +
>> +
>>   #define to_drm_sched_job(sched_job)		\
>>   		container_of((sched_job), struct drm_sched_job, queue_node)
>>   
>> @@ -120,14 +133,16 @@ void drm_sched_rq_remove_entity(struct drm_sched_rq *rq,
>>   }
>>   
>>   /**
>> - * drm_sched_rq_select_entity - Select an entity which could provide a job to run
>> + * drm_sched_rq_select_entity_rr - Select an entity which could provide a job to run
>>    *
>>    * @rq: scheduler run queue to check.
>>    *
>> - * Try to find a ready entity, returns NULL if none found.
>> + * Try to find a ready entity, in round robin manner.
>> + *
>> + * Returns NULL if none found.
>>    */
>>   static struct drm_sched_entity *
>> -drm_sched_rq_select_entity(struct drm_sched_rq *rq)
>> +drm_sched_rq_select_entity_rr(struct drm_sched_rq *rq)
>>   {
>>   	struct drm_sched_entity *entity;
>>   
>> @@ -163,6 +178,45 @@ drm_sched_rq_select_entity(struct drm_sched_rq *rq)
>>   	return NULL;
>>   }
>>   
>> +/**
>> + * drm_sched_rq_select_entity_fifo - Select an entity which could provide a job to run
>> + *
>> + * @rq: scheduler run queue to check.
>> + *
>> + * Try to find a ready entity, based on FIFO order of jobs arrivals.
>> + *
>> + * Returns NULL if none found.
>> + */
>> +static struct drm_sched_entity *
>> +drm_sched_rq_select_entity_fifo(struct drm_sched_rq *rq)
>> +{
>> +	struct drm_sched_entity *tmp, *entity = NULL;
>> +	ktime_t oldest_ts = KTIME_MAX;
>> +	struct drm_sched_job *sched_job;
>> +
>> +	spin_lock(&rq->lock);
>> +
>> +	list_for_each_entry(tmp, &rq->entities, list) {
>> +
>> +		if (drm_sched_entity_is_ready(tmp)) {
>> +			sched_job = to_drm_sched_job(spsc_queue_peek(&tmp->job_queue));
>> +
>> +			if (ktime_before(sched_job->submit_ts, oldest_ts)) {
>> +				oldest_ts = sched_job->submit_ts;
>> +				entity = tmp;
>> +			}
>> +		}
>> +	}
> Here I think we need an O(1) lookup of the next job to pick out to run.
> I see a number of optimizations, for instance keeping the current/oldest
> timestamp in the rq struct itself,

This was my original design with rb tree based min heap per rq based on 
time stamp of
the oldest job waiting in each entity's job queue (head of entity's SPCP 
job queue). But how in this
case you record the timestamps of all the jobs waiting in entity's the 
SPCP queue behind the head job ?
If you record only the oldest job and more jobs come you have no place 
to store their timestamps and so
on next job select after current HEAD how you will know who came before 
or after between 2 job queues of
of 2 entities ?

> or better yet keeping the next job
> to pick out to run at a head of list (a la timer wheel implementation).
> For suck an optimization to work, you'd prep things up on job insertion, rather
> than on job removal/pick to run.

I looked at timer wheel and I don't see how this applies here - it 
assumes you know before
job submission your deadline time for job's execution to start - which 
we don't so I don't see
how we can use it. This seems more suitable for real time scheduler 
implementation where you
have a hard requirement to execute a job by some specific time T.

I also mentioned a list, obviously there is the brute force solution of 
just ordering all jobs in one giant list and get
naturally a FIFO ordering this way with O(1) insertions and extractions 
but I assume we don't want one giant jobs queue
for all the entities to avoid more dependeies between them (like locking 
the entire list when one specific entity is killed and
needs to remove it's jobs from SW queue).

>
> I'm also surprised that there is no job transition from one queue to another,
> as it is picked to run next--for the above optimizations to implemented, you'd
> want a state transition from (state) queue to queue.

Not sure what you have in mind here ? How this helps ?

Andrey

>
> Regards,
> Luben
>

In my origianl design

>> +
>> +	if (entity) {
>> +		rq->current_entity = entity;
>> +		reinit_completion(&entity->entity_idle);
>> +	}
>> +
>> +	spin_unlock(&rq->lock);
>> +	return entity;
>> +}
>> +
>>   /**
>>    * drm_sched_job_done - complete a job
>>    * @s_job: pointer to the job which is done
>> @@ -804,7 +858,10 @@ drm_sched_select_entity(struct drm_gpu_scheduler *sched)
>>   
>>   	/* Kernel run queue has higher priority than normal run queue*/
>>   	for (i = DRM_SCHED_PRIORITY_COUNT - 1; i >= DRM_SCHED_PRIORITY_MIN; i--) {
>> -		entity = drm_sched_rq_select_entity(&sched->sched_rq[i]);
>> +		entity = drm_sched_policy != 1 ?
>> +				drm_sched_rq_select_entity_rr(&sched->sched_rq[i]) :
>> +				drm_sched_rq_select_entity_fifo(&sched->sched_rq[i]);
>> +
>>   		if (entity)
>>   			break;
>>   	}
>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
>> index addb135eeea6..95865881bfcf 100644
>> --- a/include/drm/gpu_scheduler.h
>> +++ b/include/drm/gpu_scheduler.h
>> @@ -314,6 +314,14 @@ struct drm_sched_job {
>>   
>>   	/** @last_dependency: tracks @dependencies as they signal */
>>   	unsigned long			last_dependency;
>> +
>> +       /**
>> +	* @submit_ts:
>> +	*
>> +	* Marks job submit time
>> +	*/
>> +       ktime_t				submit_ts;
>> +
>>   };
>>   
>>   static inline bool drm_sched_invalidate_job(struct drm_sched_job *s_job,