DRM scheduler issue with spsc_queue
Christian König
christian.koenig at amd.com
Fri Jun 13 17:26:22 UTC 2025
On 6/13/25 19:01, Matthew Brost wrote:
> All,
>
> After about six hours of debugging, I found an issue in a fairly
> aggressive test case involving the DRM scheduler function
> drm_sched_entity_push_job. The problem is that spsc_queue_push does not
> correctly return first on a job push, causing the queue to fail to run
> even though it is ready.
>
> I know this sounds a bit insane, but I assure you it’s happening and is
> quite reproducible. I'm working off a pull of drm-tip from a few days
> ago + some local change to Xe's memory management, with a Kconfig that
> has no debug options enabled. I’m not sure if there’s a bug somewhere in
> the kernel related to barriers or atomics in the recent drm-tip. That
> seems unlikely—but just as unlikely is that this bug has existed for a
> while without being triggered until now.
>
> I've verified the hang in several ways: using printks, adding a debugfs
> entry to manually kick the DRM scheduler queue when it's stuck (which
> gets it unstuck), and replacing the SPSC queue with one guarded by a
> spinlock (which completely fixes the issue).
>
> That last point raises a big question: why are we using a convoluted
> lockless algorithm here instead of a simple spinlock? This isn't a
> critical path—and even if it were, how much performance benefit are we
> actually getting from the lockless design? Probably very little.
>
> Any objections to me rewriting this around a spinlock-based design? My
> head hurts from chasing this bug, and I feel like this is the best way
> forward rather than wasting more time here.
Well the spsc queue is some standard code I used in previous projects and we have never experienced any issue with that.
This is a massively performance critical code path and we need to make sure that we move as few cache lines as possible between the producer and consumer side.
That was the reason why we replaced the spinlock with the spsc queue before.
Regards,
Christian.
>
> Matt
More information about the dri-devel
mailing list