[PATCH 2/3] drm/amdgpu: Pop jobs from the queue more robustly

Christian König christian.koenig at amd.com
Tue Feb 11 08:22:45 UTC 2025


Am 06.02.25 um 17:40 schrieb Tvrtko Ursulin:
> Replace a copy of DRM scheduler's to_drm_sched_job with a copy of a newly
> added __drm_sched_entity_queue_pop.
>
> This allows breaking the hidden dependency that queue_node has to be the
> first element in struct drm_sched_job.
>
> A comment is also added with a reference to the mailing list discussion
> explaining the copied helper will be removed when the whole broken
> amdgpu_job_stop_all_jobs_on_sched is removed.
>
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at igalia.com>
> Cc: Christian König <christian.koenig at amd.com>
> Cc: Danilo Krummrich <dakr at kernel.org>
> Cc: Matthew Brost <matthew.brost at intel.com>
> Cc: Philipp Stanner <phasta at kernel.org>
> Cc: "Zhang, Hawking" <Hawking.Zhang at amd.com>

Reviewed-by: Christian König <christian.koenig at amd.com>

> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 22 +++++++++++++++++++---
>   1 file changed, 19 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> index 100f04475943..22cb48bab24d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> @@ -411,8 +411,24 @@ static struct dma_fence *amdgpu_job_run(struct drm_sched_job *sched_job)
>   	return fence;
>   }
>   
> -#define to_drm_sched_job(sched_job)		\
> -		container_of((sched_job), struct drm_sched_job, queue_node)
> +/*
> + * This is a duplicate function from DRM scheduler sched_internal.h.
> + * Plan is to remove it when amdgpu_job_stop_all_jobs_on_sched is removed, due
> + * latter being incorrect and racy.
> + *
> + * See https://lore.kernel.org/amd-gfx/44edde63-7181-44fb-a4f7-94e50514f539@amd.com/
> + */
> +static struct drm_sched_job *
> +__drm_sched_entity_queue_pop(struct drm_sched_entity *entity)
> +{
> +	struct spsc_node *node;
> +
> +	node = spsc_queue_pop(&entity->job_queue);
> +	if (!node)
> +		return NULL;
> +
> +	return container_of(node, struct drm_sched_job, queue_node);
> +}
>   
>   void amdgpu_job_stop_all_jobs_on_sched(struct drm_gpu_scheduler *sched)
>   {
> @@ -425,7 +441,7 @@ void amdgpu_job_stop_all_jobs_on_sched(struct drm_gpu_scheduler *sched)
>   		struct drm_sched_rq *rq = sched->sched_rq[i];
>   		spin_lock(&rq->lock);
>   		list_for_each_entry(s_entity, &rq->entities, list) {
> -			while ((s_job = to_drm_sched_job(spsc_queue_pop(&s_entity->job_queue)))) {
> +			while ((s_job = __drm_sched_entity_queue_pop(s_entity))) {
>   				struct drm_sched_fence *s_fence = s_job->s_fence;
>   
>   				dma_fence_signal(&s_fence->scheduled);



More information about the amd-gfx mailing list