[PATCH] drm/sched: Discourage usage of separate workqueues

Wed Jun 4 17:10:21 UTC 2025

On Wed, Jun 04, 2025 at 06:53:44PM +0200, Danilo Krummrich wrote:
> On Wed, Jun 04, 2025 at 09:45:00AM -0700, Matthew Brost wrote:
> > On Wed, Jun 04, 2025 at 05:07:15PM +0200, Simona Vetter wrote:
> > > We should definitely document this trick better though, I didn't find any
> > > place where that was documented.
> > 
> > This is a good idea.
> 
> I think - and I also mentioned this a few times in the patch series that added
> the workqueue support - we should also really document the pitfalls of this.
> 
> If the scheduler shares a workqueue with the driver, the driver needs to take
> special care when submitting work that it's not possible to prevent run_job and
> free_job work from running by doing this.
> 
> For instance, if it's a single threaded workqueue and the driver submits work
> that allocates with GFP_KERNEL, this is a deadlock condition.
> 
> More generally, if the driver submits N work that, for instance allocates with
> GFP_KERNEL, it's also a deadlock condition if N == max_active.

Can we prime lockdep on scheduler init? e.g.

fs_reclaim_acquire(GFP_KERNEL);
workqueue_lockdep_acquire();
workqueue_lockdep_release();
fs_reclaim_release(GFP_KERNEL);

In addition to documentation, this would prevent workqueues from being
used that allocate with GFP_KERNEL.

Maybe we could use dma_fence_sigaling annotations instead of
fs_reclaim_acquire, but at one point those gave Xe false lockdep
positives so use fs_reclaim_acquire in similar cases. Maybe that has
been fixed though.

Matt