[PATCH] drm/sched: Increment job count before swapping tail spsc queue
Matthew Brost
matthew.brost at intel.com
Tue Jul 1 23:19:00 UTC 2025
On Tue, Jul 01, 2025 at 09:40:05AM +0200, Christian König wrote:
> On 13.06.25 23:20, Matthew Brost wrote:
> > A small race exists between spsc_queue_push and the run-job worker, in
> > which spsc_queue_push may return not-first while the run-job worker has
> > already idled due to the job count being zero. If this race occurs, job
> > scheduling stops, leading to hangs while waiting on the job’s DMA
> > fences.
> >
> > Seal this race by incrementing the job count before appending to the
> > SPSC queue.
> >
> > This race was observed on a drm-tip 6.16-rc1 build with the Xe driver in
> > an SVM test case.
> >
> > Fixes: 1b1f42d8fde4 ("drm: move amd_gpu_scheduler into common location")
> > Fixes: 27105db6c63a ("drm/amdgpu: Add SPSC queue to scheduler.")
> > Cc: stable at vger.kernel.org
> > Signed-off-by: Matthew Brost <matthew.brost at intel.com>
>
> Sorry for the late response, if it isn't already pushed to drm-misc-fixes then feel free to add Reviewed-by: Christian König <christian.koenig at amd.com>
>
Thanks. Just pushed to drm-misc-fixes.
Matt
> > ---
> > include/drm/spsc_queue.h | 4 +++-
> > 1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/drm/spsc_queue.h b/include/drm/spsc_queue.h
> > index 125f096c88cb..ee9df8cc67b7 100644
> > --- a/include/drm/spsc_queue.h
> > +++ b/include/drm/spsc_queue.h
> > @@ -70,9 +70,11 @@ static inline bool spsc_queue_push(struct spsc_queue *queue, struct spsc_node *n
> >
> > preempt_disable();
> >
> > + atomic_inc(&queue->job_count);
> > + smp_mb__after_atomic();
> > +
> > tail = (struct spsc_node **)atomic_long_xchg(&queue->tail, (long)&node->next);
> > WRITE_ONCE(*tail, node);
> > - atomic_inc(&queue->job_count);
> >
> > /*
> > * In case of first element verify new node will be visible to the consumer
>
More information about the dri-devel
mailing list