[RFC PATCH 06/10] drm/sched: Submit job before starting TDR

Luben Tuikov luben.tuikov at amd.com
Thu May 4 05:23:05 UTC 2023


On 2023-04-03 20:22, Matthew Brost wrote:
> If the TDR is set to a value, it can fire before a job is submitted in
> drm_sched_main. The job should be always be submitted before the TDR
> fires, fix this ordering.
> 
> Signed-off-by: Matthew Brost <matthew.brost at intel.com>
> ---
>  drivers/gpu/drm/scheduler/sched_main.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index 6ae710017024..4eac02d212c1 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -1150,10 +1150,10 @@ static void drm_sched_main(struct work_struct *w)
>  		s_fence = sched_job->s_fence;
>  
>  		atomic_inc(&sched->hw_rq_count);
> -		drm_sched_job_begin(sched_job);
>  
>  		trace_drm_run_job(sched_job, entity);
>  		fence = sched->ops->run_job(sched_job);
> +		drm_sched_job_begin(sched_job);
>  		complete_all(&entity->entity_idle);
>  		drm_sched_fence_scheduled(s_fence);
>  

Not sure if this is correct. In drm_sched_job_begin() we add the job to the "pending_list"
(meaning it is pending execution in the hardware) and we also start a timeout timer. Both
of those should be started before the job is given to the hardware.

If the timeout is set to too small a value, then that should probably be fixed instead.

Regards,
Luben


More information about the dri-devel mailing list