[PATCH] drm/sched: Remove job submit/free race when using unordered workqueues

Tvrtko Ursulin tursulin at ursulin.net
Thu Jan 16 16:01:06 UTC 2025


On 10/01/2025 11:14, Tvrtko Ursulin wrote:
> After commit f7fe64ad0f22 ("drm/sched: Split free_job into own work item")
> and with drivers who use the unordered workqueue sched_jobs can be freed
> in parallel as soon as the complete_all(&entity->entity_idle) is called.
> This makes all dereferencing in the lower part of the worker unsafe so
> lets fix it by moving the complete_all() call to after the worker is done
> touching the job.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at igalia.com>
> Fixes: f7fe64ad0f22 ("drm/sched: Split free_job into own work item")

I went back to write a comment for a v2 of this patch and realised the 
Fixes: target is wrong. And maybe even there was no race to begin with.

I *think* when looking into this I was mistaken that freeing of jobs 
happens in the job free worker - but actually for the 
drm_sched_entity_fini() case I was worried about it happens from the 
system_wq.

And that relies on the entity->last_scheduled keeping a reference. So as 
long as the pop and run work are serialized in one worker, and the 
asynchronous drm_sched_entity_fini() waits for entity to idle, I think 
there actually isn't a race. Regardless of the placement of 
complete_all() in the worker.

So as long as someone doesn't disagree I think this patch can go back to 
the initial version. Which had no Fixes: and was just removing the 
s_fence local variable. Maybe that local was needed at some point but I 
don't see how it is with the current code base.

Regards,

Tvrtko

P.S. What potentially could be confusing is how both job free worker and 
entity fini work via the sched->ops->free_job() vfunc. Kind of giving 
out the impression it is the final "put" - not involving fence reference 
counting. While in actually most drivers call drm_sched_job_cleanup() 
from the vfunc - which uses dma_fence_put. Perhaps a more intuitive 
design would be if the scheduler core would be calling dma_fence_put and 
the driver specific vfunc was actually called from the scheduler fence 
release vfunc. But I don't know.. don't particularly want to go there at 
this time.

> Cc: Christian König <christian.koenig at amd.com>
> Cc: Danilo Krummrich <dakr at redhat.com>
> Cc: Matthew Brost <matthew.brost at intel.com>
> Cc: Philipp Stanner <pstanner at redhat.com>
> Cc: <stable at vger.kernel.org> # v6.8+
> ---
>   drivers/gpu/drm/scheduler/sched_main.c | 7 ++-----
>   1 file changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index 57da84908752..f0d02c061c23 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -1188,7 +1188,6 @@ static void drm_sched_run_job_work(struct work_struct *w)
>   		container_of(w, struct drm_gpu_scheduler, work_run_job);
>   	struct drm_sched_entity *entity;
>   	struct dma_fence *fence;
> -	struct drm_sched_fence *s_fence;
>   	struct drm_sched_job *sched_job;
>   	int r;
>   
> @@ -1207,15 +1206,12 @@ static void drm_sched_run_job_work(struct work_struct *w)
>   		return;
>   	}
>   
> -	s_fence = sched_job->s_fence;
> -
>   	atomic_add(sched_job->credits, &sched->credit_count);
>   	drm_sched_job_begin(sched_job);
>   
>   	trace_drm_run_job(sched_job, entity);
>   	fence = sched->ops->run_job(sched_job);
> -	complete_all(&entity->entity_idle);
> -	drm_sched_fence_scheduled(s_fence, fence);
> +	drm_sched_fence_scheduled(sched_job->s_fence, fence);
>   
>   	if (!IS_ERR_OR_NULL(fence)) {
>   		/* Drop for original kref_init of the fence */
> @@ -1232,6 +1228,7 @@ static void drm_sched_run_job_work(struct work_struct *w)
>   				   PTR_ERR(fence) : 0);
>   	}
>   
> +	complete_all(&entity->entity_idle);
>   	wake_up(&sched->job_scheduled);
>   	drm_sched_run_job_queue(sched);
>   }


More information about the dri-devel mailing list