[PATCH] drm/sced: Add FIFO policy for scheduler rq
Luben Tuikov
luben.tuikov at amd.com
Tue Aug 23 21:37:48 UTC 2022
On 2022-08-23 14:57, Andrey Grodzovsky wrote:
> On 2022-08-23 14:30, Luben Tuikov wrote:
>
>>
>> On 2022-08-23 14:13, Andrey Grodzovsky wrote:
>>> On 2022-08-23 12:58, Luben Tuikov wrote:
>>>> Inlined:
>>>>
>>>> On 2022-08-22 16:09, Andrey Grodzovsky wrote:
>>>>> Poblem: Given many entities competing for same rq on
>>>> ^Problem
>>>>
>>>>> same scheduler an uncceptabliy long wait time for some
>>>> ^unacceptably
>>>>
>>>>> jobs waiting stuck in rq before being picked up are
>>>>> observed (seen using GPUVis).
>>>>> The issue is due to Round Robin policy used by scheduler
>>>>> to pick up the next entity for execution. Under stress
>>>>> of many entities and long job queus within entity some
>>>> ^queues
>>>>
>>>>> jobs could be stack for very long time in it's entity's
>>>>> queue before being popped from the queue and executed
>>>>> while for other entites with samller job queues a job
>>>> ^entities; smaller
>>>>
>>>>> might execute ealier even though that job arrived later
>>>> ^earlier
>>>>
>>>>> then the job in the long queue.
>>>>>
>>>>> Fix:
>>>>> Add FIFO selection policy to entites in RQ, chose next enitity
>>>>> on rq in such order that if job on one entity arrived
>>>>> ealrier then job on another entity the first job will start
>>>>> executing ealier regardless of the length of the entity's job
>>>>> queue.
>>>>>
>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky at amd.com>
>>>>> Tested-by: Li Yunxiang (Teddy) <Yunxiang.Li at amd.com>
>>>>> ---
>>>>> drivers/gpu/drm/scheduler/sched_entity.c | 2 +
>>>>> drivers/gpu/drm/scheduler/sched_main.c | 65 ++++++++++++++++++++++--
>>>>> include/drm/gpu_scheduler.h | 8 +++
>>>>> 3 files changed, 71 insertions(+), 4 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
>>>>> index 6b25b2f4f5a3..3bb7f69306ef 100644
>>>>> --- a/drivers/gpu/drm/scheduler/sched_entity.c
>>>>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
>>>>> @@ -507,6 +507,8 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job)
>>>>> atomic_inc(entity->rq->sched->score);
>>>>> WRITE_ONCE(entity->last_user, current->group_leader);
>>>>> first = spsc_queue_push(&entity->job_queue, &sched_job->queue_node);
>>>>> + sched_job->submit_ts = ktime_get();
>>>>> +
>>>>>
>>>>> /* first job wakes up scheduler */
>>>>> if (first) {
>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
>>>>> index 68317d3a7a27..c123aa120d06 100644
>>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>>>> @@ -59,6 +59,19 @@
>>>>> #define CREATE_TRACE_POINTS
>>>>> #include "gpu_scheduler_trace.h"
>>>>>
>>>>> +
>>>>> +
>>>>> +int drm_sched_policy = -1;
>>>>> +
>>>>> +/**
>>>>> + * DOC: sched_policy (int)
>>>>> + * Used to override default entites scheduling policy in a run queue.
>>>>> + */
>>>>> +MODULE_PARM_DESC(sched_policy,
>>>>> + "specify schedule policy for entites on a runqueue (-1 = auto(default) value, 0 = Round Robin,1 = use FIFO");
>>>>> +module_param_named(sched_policy, drm_sched_policy, int, 0444);
>>>> As per Christian's comments, you can drop the "auto" and perhaps leave one as the default,
>>>> say the RR.
>>>>
>>>> I do think it is beneficial to have a module parameter control the scheduling policy, as shown above.
>>>
>>> Christian is not against it, just against adding 'auto' here - like the
>>> default.
>> Exactly what I said.
>>
>> Also, I still think an O(1) scheduling (picking next to run) should be
>> what we strive for in such a FIFO patch implementation.
>> A FIFO mechanism is by it's nature an O(1) mechanism for picking the next
>> element.
>>
>> Regards,
>> Luben
>
>
> The only solution i see for this now is keeping a global per rq jobs
> list parallel to SPCP queue per entity - we use this list when we switch
> to FIFO scheduling, we can even start building it ONLY when we switch
> to FIFO building it gradually as more jobs come. Do you have other solution
> in mind ?
The idea is to "sort" on insertion, not on picking the next one to run.
cont'd below:
>
> Andrey
>
>>
>>>
>>>>> +
>>>>> +
>>>>> #define to_drm_sched_job(sched_job) \
>>>>> container_of((sched_job), struct drm_sched_job, queue_node)
>>>>>
>>>>> @@ -120,14 +133,16 @@ void drm_sched_rq_remove_entity(struct drm_sched_rq *rq,
>>>>> }
>>>>>
>>>>> /**
>>>>> - * drm_sched_rq_select_entity - Select an entity which could provide a job to run
>>>>> + * drm_sched_rq_select_entity_rr - Select an entity which could provide a job to run
>>>>> *
>>>>> * @rq: scheduler run queue to check.
>>>>> *
>>>>> - * Try to find a ready entity, returns NULL if none found.
>>>>> + * Try to find a ready entity, in round robin manner.
>>>>> + *
>>>>> + * Returns NULL if none found.
>>>>> */
>>>>> static struct drm_sched_entity *
>>>>> -drm_sched_rq_select_entity(struct drm_sched_rq *rq)
>>>>> +drm_sched_rq_select_entity_rr(struct drm_sched_rq *rq)
>>>>> {
>>>>> struct drm_sched_entity *entity;
>>>>>
>>>>> @@ -163,6 +178,45 @@ drm_sched_rq_select_entity(struct drm_sched_rq *rq)
>>>>> return NULL;
>>>>> }
>>>>>
>>>>> +/**
>>>>> + * drm_sched_rq_select_entity_fifo - Select an entity which could provide a job to run
>>>>> + *
>>>>> + * @rq: scheduler run queue to check.
>>>>> + *
>>>>> + * Try to find a ready entity, based on FIFO order of jobs arrivals.
>>>>> + *
>>>>> + * Returns NULL if none found.
>>>>> + */
>>>>> +static struct drm_sched_entity *
>>>>> +drm_sched_rq_select_entity_fifo(struct drm_sched_rq *rq)
>>>>> +{
>>>>> + struct drm_sched_entity *tmp, *entity = NULL;
>>>>> + ktime_t oldest_ts = KTIME_MAX;
>>>>> + struct drm_sched_job *sched_job;
>>>>> +
>>>>> + spin_lock(&rq->lock);
>>>>> +
>>>>> + list_for_each_entry(tmp, &rq->entities, list) {
>>>>> +
>>>>> + if (drm_sched_entity_is_ready(tmp)) {
>>>>> + sched_job = to_drm_sched_job(spsc_queue_peek(&tmp->job_queue));
>>>>> +
>>>>> + if (ktime_before(sched_job->submit_ts, oldest_ts)) {
>>>>> + oldest_ts = sched_job->submit_ts;
>>>>> + entity = tmp;
>>>>> + }
>>>>> + }
>>>>> + }
>>>> Here I think we need an O(1) lookup of the next job to pick out to run.
>>>> I see a number of optimizations, for instance keeping the current/oldest
>>>> timestamp in the rq struct itself,
>>>
>>> This was my original design with rb tree based min heap per rq based on
>>> time stamp of
>>> the oldest job waiting in each entity's job queue (head of entity's SPCP
>>> job queue). But how in this
>>> case you record the timestamps of all the jobs waiting in entity's the
>>> SPCP queue behind the head job ?
>>> If you record only the oldest job and more jobs come you have no place
>>> to store their timestamps and so
>>> on next job select after current HEAD how you will know who came before
>>> or after between 2 job queues of
>>> of 2 entities ?
>>>
>>>
>>>> or better yet keeping the next job
>>>> to pick out to run at a head of list (a la timer wheel implementation).
>>>> For suck an optimization to work, you'd prep things up on job insertion, rather
>>>> than on job removal/pick to run.
>>>
>>> I looked at timer wheel and I don't see how this applies here - it
>>> assumes you know before
>>> job submission your deadline time for job's execution to start - which
>>> we don't so I don't see
>>> how we can use it. This seems more suitable for real time scheduler
>>> implementation where you
>>> have a hard requirement to execute a job by some specific time T.
In a timer wheel you instantly know the "soonest" job to run--it's naturally
your "next" job, regardless of in what order the timers were added and what
their timeout time is.
>>>
>>> I also mentioned a list, obviously there is the brute force solution of
>>> just ordering all jobs in one giant list and get
>>> naturally a FIFO ordering this way with O(1) insertions and extractions
>>> but I assume we don't want one giant jobs queue
>>> for all the entities to avoid more dependeies between them (like locking
>>> the entire list when one specific entity is killed and
>>> needs to remove it's jobs from SW queue).
You can also have a list of list pointers. It'd be trivial to remove a whole
list from the main list, by simply removing an element--akin to locking out a rq,
or should you need to edit the rq's entity list.
>>>
>>>> I'm also surprised that there is no job transition from one queue to another,
>>>> as it is picked to run next--for the above optimizations to implemented, you'd
>>>> want a state transition from (state) queue to queue.
>>>
>>> Not sure what you have in mind here ? How this helps ?
I think I've explained this a few times now--each list represents a state and a job/entity
travels through lists as it travels through states, which states more or less represent
the state of execution in the hardware--it could be as simple as incoming --> pending --> done.
It allows a finer grain when resetting the hardware (should the hardware allow it).
Note that this isn't directly related to the O(1) mechanism I brought up here. As I said, I was surprised
to find out none such distinction existed--that's all. Don't fixate on this.
Regards,
Luben
>>>
>>> Andrey
>>>
>>>
>>>> Regards,
>>>> Luben
>>>>
>>> In my origianl design
>>>
>>>>> +
>>>>> + if (entity) {
>>>>> + rq->current_entity = entity;
>>>>> + reinit_completion(&entity->entity_idle);
>>>>> + }
>>>>> +
>>>>> + spin_unlock(&rq->lock);
>>>>> + return entity;
>>>>> +}
>>>>> +
>>>>> /**
>>>>> * drm_sched_job_done - complete a job
>>>>> * @s_job: pointer to the job which is done
>>>>> @@ -804,7 +858,10 @@ drm_sched_select_entity(struct drm_gpu_scheduler *sched)
>>>>>
>>>>> /* Kernel run queue has higher priority than normal run queue*/
>>>>> for (i = DRM_SCHED_PRIORITY_COUNT - 1; i >= DRM_SCHED_PRIORITY_MIN; i--) {
>>>>> - entity = drm_sched_rq_select_entity(&sched->sched_rq[i]);
>>>>> + entity = drm_sched_policy != 1 ?
>>>>> + drm_sched_rq_select_entity_rr(&sched->sched_rq[i]) :
>>>>> + drm_sched_rq_select_entity_fifo(&sched->sched_rq[i]);
>>>>> +
>>>>> if (entity)
>>>>> break;
>>>>> }
>>>>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
>>>>> index addb135eeea6..95865881bfcf 100644
>>>>> --- a/include/drm/gpu_scheduler.h
>>>>> +++ b/include/drm/gpu_scheduler.h
>>>>> @@ -314,6 +314,14 @@ struct drm_sched_job {
>>>>>
>>>>> /** @last_dependency: tracks @dependencies as they signal */
>>>>> unsigned long last_dependency;
>>>>> +
>>>>> + /**
>>>>> + * @submit_ts:
>>>>> + *
>>>>> + * Marks job submit time
>>>>> + */
>>>>> + ktime_t submit_ts;
>>>>> +
>>>>> };
>>>>>
>>>>> static inline bool drm_sched_invalidate_job(struct drm_sched_job *s_job,
>> Regards,
Regards,
--
Luben
More information about the dri-devel
mailing list