[PATCH v2] drm/sced: Add FIFO sched policy to rq

Christian König christian.koenig at amd.com
Mon Sep 5 05:57:41 UTC 2022



Am 03.09.22 um 04:48 schrieb Andrey Grodzovsky:
> Poblem: Given many entities competing for same rq on
> same scheduler an uncceptabliy long wait time for some
> jobs waiting stuck in rq before being picked up are
> observed (seen using  GPUVis).
> The issue is due to Round Robin policy used by scheduler
> to pick up the next entity for execution. Under stress
> of many entities and long job queus within entity some
> jobs could be stack for very long time in it's entity's
> queue before being popped from the queue and executed
> while for other entites with samller job queues a job
> might execute ealier even though that job arrived later
> then the job in the long queue.
>
> Fix:
> Add FIFO selection policy to entites in RQ, chose next enitity
> on rq in such order that if job on one entity arrived
> ealrier then job on another entity the first job will start
> executing ealier regardless of the length of the entity's job
> queue.
>
> v2:
> Switch to rb tree structure for entites based on TS of
> oldest job waiting in job queue of enitity. Improves next
> enitity extraction to O(1). Enitity TS update
> O(log(number of entites in rq))
>
> Drop default option in module control parameter.
>
> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky at amd.com>
> Tested-by: Li Yunxiang (Teddy) <Yunxiang.Li at amd.com>
[SNIP]
>   /**
> @@ -313,6 +330,14 @@ struct drm_sched_job {
>   
>   	/** @last_dependency: tracks @dependencies as they signal */
>   	unsigned long			last_dependency;
> +
> +
> +	/**
> +	* @submit_ts:
> +	*
> +	* Marks job submit time

Maybe write something like "When the job was pushed into the entity queue."

Apart from that I leave it to Luben and you to get this stuff upstream.

Thanks,
Christian.

> +	*/
> +	ktime_t                         submit_ts;
>   };
>   
>   static inline bool drm_sched_invalidate_job(struct drm_sched_job *s_job,
> @@ -501,6 +526,10 @@ void drm_sched_rq_add_entity(struct drm_sched_rq *rq,
>   void drm_sched_rq_remove_entity(struct drm_sched_rq *rq,
>   				struct drm_sched_entity *entity);
>   
> +void drm_sched_rq_update_fifo(struct drm_sched_entity *entity, ktime_t ts,
> +			      bool remove_only);
> +
> +
>   int drm_sched_entity_init(struct drm_sched_entity *entity,
>   			  enum drm_sched_priority priority,
>   			  struct drm_gpu_scheduler **sched_list,



More information about the dri-devel mailing list