[PATCH v2] drm/sced: Add FIFO sched policy to rq

Andrey Grodzovsky andrey.grodzovsky at amd.com
Wed Sep 7 15:58:07 UTC 2022


Luben, just a ping, whenever you have time.

Andrey

On 2022-09-05 01:57, Christian König wrote:
>
>
> Am 03.09.22 um 04:48 schrieb Andrey Grodzovsky:
>> Poblem: Given many entities competing for same rq on
>> same scheduler an uncceptabliy long wait time for some
>> jobs waiting stuck in rq before being picked up are
>> observed (seen using  GPUVis).
>> The issue is due to Round Robin policy used by scheduler
>> to pick up the next entity for execution. Under stress
>> of many entities and long job queus within entity some
>> jobs could be stack for very long time in it's entity's
>> queue before being popped from the queue and executed
>> while for other entites with samller job queues a job
>> might execute ealier even though that job arrived later
>> then the job in the long queue.
>>
>> Fix:
>> Add FIFO selection policy to entites in RQ, chose next enitity
>> on rq in such order that if job on one entity arrived
>> ealrier then job on another entity the first job will start
>> executing ealier regardless of the length of the entity's job
>> queue.
>>
>> v2:
>> Switch to rb tree structure for entites based on TS of
>> oldest job waiting in job queue of enitity. Improves next
>> enitity extraction to O(1). Enitity TS update
>> O(log(number of entites in rq))
>>
>> Drop default option in module control parameter.
>>
>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky at amd.com>
>> Tested-by: Li Yunxiang (Teddy) <Yunxiang.Li at amd.com>
> [SNIP]
>>   /**
>> @@ -313,6 +330,14 @@ struct drm_sched_job {
>>         /** @last_dependency: tracks @dependencies as they signal */
>>       unsigned long            last_dependency;
>> +
>> +
>> +    /**
>> +    * @submit_ts:
>> +    *
>> +    * Marks job submit time
>
> Maybe write something like "When the job was pushed into the entity 
> queue."
>
> Apart from that I leave it to Luben and you to get this stuff upstream.
>
> Thanks,
> Christian.
>
>> +    */
>> +    ktime_t                         submit_ts;
>>   };
>>     static inline bool drm_sched_invalidate_job(struct drm_sched_job 
>> *s_job,
>> @@ -501,6 +526,10 @@ void drm_sched_rq_add_entity(struct drm_sched_rq 
>> *rq,
>>   void drm_sched_rq_remove_entity(struct drm_sched_rq *rq,
>>                   struct drm_sched_entity *entity);
>>   +void drm_sched_rq_update_fifo(struct drm_sched_entity *entity, 
>> ktime_t ts,
>> +                  bool remove_only);
>> +
>> +
>>   int drm_sched_entity_init(struct drm_sched_entity *entity,
>>                 enum drm_sched_priority priority,
>>                 struct drm_gpu_scheduler **sched_list,
>


More information about the amd-gfx mailing list