[PATCH] drm/sched: Increment job count before swapping tail spsc queue

Cavitt, Jonathan jonathan.cavitt at intel.com
Fri Jun 13 22:03:27 UTC 2025


-----Original Message-----
From: Intel-xe <intel-xe-bounces at lists.freedesktop.org> On Behalf Of Matthew Brost
Sent: Friday, June 13, 2025 2:20 PM
To: intel-xe at lists.freedesktop.org; dri-devel at lists.freedesktop.org
Cc: dakr at kernel.org; christian.koenig at amd.com; pstanner at redhat.com
Subject: [PATCH] drm/sched: Increment job count before swapping tail spsc queue
> 
> A small race exists between spsc_queue_push and the run-job worker, in
> which spsc_queue_push may return not-first while the run-job worker has
> already idled due to the job count being zero. If this race occurs, job
> scheduling stops, leading to hangs while waiting on the job’s DMA
> fences.
> 
> Seal this race by incrementing the job count before appending to the
> SPSC queue.
> 
> This race was observed on a drm-tip 6.16-rc1 build with the Xe driver in
> an SVM test case.
> 
> Fixes: 1b1f42d8fde4 ("drm: move amd_gpu_scheduler into common location")
> Fixes: 27105db6c63a ("drm/amdgpu: Add SPSC queue to scheduler.")
> Cc: stable at vger.kernel.org
> Signed-off-by: Matthew Brost <matthew.brost at intel.com>

LGTM.

Maybe in the future, we should consider giving the queue a mutex on the
job count and tail to prevent these race conditions, though that would
require a serious refactoring of the code compared to this fix which works
with minimal change.

Reviewed-by: Jonathan Cavitt <jonathan.cavitt at intel.com>
-Jonathan Cavitt

> ---
>  include/drm/spsc_queue.h | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/include/drm/spsc_queue.h b/include/drm/spsc_queue.h
> index 125f096c88cb..ee9df8cc67b7 100644
> --- a/include/drm/spsc_queue.h
> +++ b/include/drm/spsc_queue.h
> @@ -70,9 +70,11 @@ static inline bool spsc_queue_push(struct spsc_queue *queue, struct spsc_node *n
>  
>  	preempt_disable();
>  
> +	atomic_inc(&queue->job_count);
> +	smp_mb__after_atomic();
> +
>  	tail = (struct spsc_node **)atomic_long_xchg(&queue->tail, (long)&node->next);
>  	WRITE_ONCE(*tail, node);
> -	atomic_inc(&queue->job_count);
>  
>  	/*
>  	 * In case of first element verify new node will be visible to the consumer
> -- 
> 2.34.1
> 
> 


More information about the dri-devel mailing list