[PATCH] drm/sced: Add FIFO policy for scheduler rq
Luben Tuikov
luben.tuikov at amd.com
Tue Aug 23 18:30:26 UTC 2022
On 2022-08-23 14:13, Andrey Grodzovsky wrote:
>
> On 2022-08-23 12:58, Luben Tuikov wrote:
>> Inlined:
>>
>> On 2022-08-22 16:09, Andrey Grodzovsky wrote:
>>> Poblem: Given many entities competing for same rq on
>> ^Problem
>>
>>> same scheduler an uncceptabliy long wait time for some
>> ^unacceptably
>>
>>> jobs waiting stuck in rq before being picked up are
>>> observed (seen using GPUVis).
>>> The issue is due to Round Robin policy used by scheduler
>>> to pick up the next entity for execution. Under stress
>>> of many entities and long job queus within entity some
>> ^queues
>>
>>> jobs could be stack for very long time in it's entity's
>>> queue before being popped from the queue and executed
>>> while for other entites with samller job queues a job
>> ^entities; smaller
>>
>>> might execute ealier even though that job arrived later
>> ^earlier
>>
>>> then the job in the long queue.
>>>
>>> Fix:
>>> Add FIFO selection policy to entites in RQ, chose next enitity
>>> on rq in such order that if job on one entity arrived
>>> ealrier then job on another entity the first job will start
>>> executing ealier regardless of the length of the entity's job
>>> queue.
>>>
>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky at amd.com>
>>> Tested-by: Li Yunxiang (Teddy) <Yunxiang.Li at amd.com>
>>> ---
>>> drivers/gpu/drm/scheduler/sched_entity.c | 2 +
>>> drivers/gpu/drm/scheduler/sched_main.c | 65 ++++++++++++++++++++++--
>>> include/drm/gpu_scheduler.h | 8 +++
>>> 3 files changed, 71 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
>>> index 6b25b2f4f5a3..3bb7f69306ef 100644
>>> --- a/drivers/gpu/drm/scheduler/sched_entity.c
>>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
>>> @@ -507,6 +507,8 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job)
>>> atomic_inc(entity->rq->sched->score);
>>> WRITE_ONCE(entity->last_user, current->group_leader);
>>> first = spsc_queue_push(&entity->job_queue, &sched_job->queue_node);
>>> + sched_job->submit_ts = ktime_get();
>>> +
>>>
>>> /* first job wakes up scheduler */
>>> if (first) {
>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
>>> index 68317d3a7a27..c123aa120d06 100644
>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>> @@ -59,6 +59,19 @@
>>> #define CREATE_TRACE_POINTS
>>> #include "gpu_scheduler_trace.h"
>>>
>>> +
>>> +
>>> +int drm_sched_policy = -1;
>>> +
>>> +/**
>>> + * DOC: sched_policy (int)
>>> + * Used to override default entites scheduling policy in a run queue.
>>> + */
>>> +MODULE_PARM_DESC(sched_policy,
>>> + "specify schedule policy for entites on a runqueue (-1 = auto(default) value, 0 = Round Robin,1 = use FIFO");
>>> +module_param_named(sched_policy, drm_sched_policy, int, 0444);
>> As per Christian's comments, you can drop the "auto" and perhaps leave one as the default,
>> say the RR.
>>
>> I do think it is beneficial to have a module parameter control the scheduling policy, as shown above.
>
>
> Christian is not against it, just against adding 'auto' here - like the
> default.
Exactly what I said.
Also, I still think an O(1) scheduling (picking next to run) should be
what we strive for in such a FIFO patch implementation.
A FIFO mechanism is by it's nature an O(1) mechanism for picking the next
element.
Regards,
Luben
>
>
>>
>>> +
>>> +
>>> #define to_drm_sched_job(sched_job) \
>>> container_of((sched_job), struct drm_sched_job, queue_node)
>>>
>>> @@ -120,14 +133,16 @@ void drm_sched_rq_remove_entity(struct drm_sched_rq *rq,
>>> }
>>>
>>> /**
>>> - * drm_sched_rq_select_entity - Select an entity which could provide a job to run
>>> + * drm_sched_rq_select_entity_rr - Select an entity which could provide a job to run
>>> *
>>> * @rq: scheduler run queue to check.
>>> *
>>> - * Try to find a ready entity, returns NULL if none found.
>>> + * Try to find a ready entity, in round robin manner.
>>> + *
>>> + * Returns NULL if none found.
>>> */
>>> static struct drm_sched_entity *
>>> -drm_sched_rq_select_entity(struct drm_sched_rq *rq)
>>> +drm_sched_rq_select_entity_rr(struct drm_sched_rq *rq)
>>> {
>>> struct drm_sched_entity *entity;
>>>
>>> @@ -163,6 +178,45 @@ drm_sched_rq_select_entity(struct drm_sched_rq *rq)
>>> return NULL;
>>> }
>>>
>>> +/**
>>> + * drm_sched_rq_select_entity_fifo - Select an entity which could provide a job to run
>>> + *
>>> + * @rq: scheduler run queue to check.
>>> + *
>>> + * Try to find a ready entity, based on FIFO order of jobs arrivals.
>>> + *
>>> + * Returns NULL if none found.
>>> + */
>>> +static struct drm_sched_entity *
>>> +drm_sched_rq_select_entity_fifo(struct drm_sched_rq *rq)
>>> +{
>>> + struct drm_sched_entity *tmp, *entity = NULL;
>>> + ktime_t oldest_ts = KTIME_MAX;
>>> + struct drm_sched_job *sched_job;
>>> +
>>> + spin_lock(&rq->lock);
>>> +
>>> + list_for_each_entry(tmp, &rq->entities, list) {
>>> +
>>> + if (drm_sched_entity_is_ready(tmp)) {
>>> + sched_job = to_drm_sched_job(spsc_queue_peek(&tmp->job_queue));
>>> +
>>> + if (ktime_before(sched_job->submit_ts, oldest_ts)) {
>>> + oldest_ts = sched_job->submit_ts;
>>> + entity = tmp;
>>> + }
>>> + }
>>> + }
>> Here I think we need an O(1) lookup of the next job to pick out to run.
>> I see a number of optimizations, for instance keeping the current/oldest
>> timestamp in the rq struct itself,
>
>
> This was my original design with rb tree based min heap per rq based on
> time stamp of
> the oldest job waiting in each entity's job queue (head of entity's SPCP
> job queue). But how in this
> case you record the timestamps of all the jobs waiting in entity's the
> SPCP queue behind the head job ?
> If you record only the oldest job and more jobs come you have no place
> to store their timestamps and so
> on next job select after current HEAD how you will know who came before
> or after between 2 job queues of
> of 2 entities ?
>
>
>> or better yet keeping the next job
>> to pick out to run at a head of list (a la timer wheel implementation).
>> For suck an optimization to work, you'd prep things up on job insertion, rather
>> than on job removal/pick to run.
>
>
> I looked at timer wheel and I don't see how this applies here - it
> assumes you know before
> job submission your deadline time for job's execution to start - which
> we don't so I don't see
> how we can use it. This seems more suitable for real time scheduler
> implementation where you
> have a hard requirement to execute a job by some specific time T.
>
> I also mentioned a list, obviously there is the brute force solution of
> just ordering all jobs in one giant list and get
> naturally a FIFO ordering this way with O(1) insertions and extractions
> but I assume we don't want one giant jobs queue
> for all the entities to avoid more dependeies between them (like locking
> the entire list when one specific entity is killed and
> needs to remove it's jobs from SW queue).
>
>>
>> I'm also surprised that there is no job transition from one queue to another,
>> as it is picked to run next--for the above optimizations to implemented, you'd
>> want a state transition from (state) queue to queue.
>
>
> Not sure what you have in mind here ? How this helps ?
>
> Andrey
>
>
>>
>> Regards,
>> Luben
>>
>
> In my origianl design
>
>>> +
>>> + if (entity) {
>>> + rq->current_entity = entity;
>>> + reinit_completion(&entity->entity_idle);
>>> + }
>>> +
>>> + spin_unlock(&rq->lock);
>>> + return entity;
>>> +}
>>> +
>>> /**
>>> * drm_sched_job_done - complete a job
>>> * @s_job: pointer to the job which is done
>>> @@ -804,7 +858,10 @@ drm_sched_select_entity(struct drm_gpu_scheduler *sched)
>>>
>>> /* Kernel run queue has higher priority than normal run queue*/
>>> for (i = DRM_SCHED_PRIORITY_COUNT - 1; i >= DRM_SCHED_PRIORITY_MIN; i--) {
>>> - entity = drm_sched_rq_select_entity(&sched->sched_rq[i]);
>>> + entity = drm_sched_policy != 1 ?
>>> + drm_sched_rq_select_entity_rr(&sched->sched_rq[i]) :
>>> + drm_sched_rq_select_entity_fifo(&sched->sched_rq[i]);
>>> +
>>> if (entity)
>>> break;
>>> }
>>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
>>> index addb135eeea6..95865881bfcf 100644
>>> --- a/include/drm/gpu_scheduler.h
>>> +++ b/include/drm/gpu_scheduler.h
>>> @@ -314,6 +314,14 @@ struct drm_sched_job {
>>>
>>> /** @last_dependency: tracks @dependencies as they signal */
>>> unsigned long last_dependency;
>>> +
>>> + /**
>>> + * @submit_ts:
>>> + *
>>> + * Marks job submit time
>>> + */
>>> + ktime_t submit_ts;
>>> +
>>> };
>>>
>>> static inline bool drm_sched_invalidate_job(struct drm_sched_job *s_job,
Regards,
--
Luben
More information about the amd-gfx
mailing list