[PATCH 0/3] Rework work queue usage

Fri Mar 29 16:52:33 UTC 2024

On Thu, Mar 28, 2024 at 09:40:18AM -1000, Tejun Heo wrote:
> Hello,
> 
> On Thu, Mar 28, 2024 at 07:30:41PM +0000, Matthew Brost wrote:
> > The test creates 100s of exec queues that all can be preempted in
> > parallel. In the current code this results in each exec queue kicking a
> > worker which is scheduled on the system_unbound_wq, these workers wait
> > and sleep (using a waitqueue) on signaling from another worker. The
> > other worker, which is also scheduled system_unbound_wq, is processing a
> > queue which interacts with the GPU. I'm thinking the worker which
> > interacts with hardware gets straved by the waiter resulting in a
> > deadlock.
> > 
> > This patch changes the waiters to uses a device private ordered work
> > queue so at most we have 1 waiter a time. Regardless of the new work
> > queue behavior this a better design.
> > 
> > It is beyond my knowledge if the old behavior, albiet poorly designed,
> > should still work with the work queue changes in 6.9.
> 
> Ah, okay, I think you're hitting the max_active limit which regulates the
> maximum number of work items which can be in flight at any given time. Is
> the test machine a NUMA setup by any chance?
> 

Not a NUMA setup in this case.

> We went through a couple changes in terms of how max_active is enforced on
> NUMA machines. Originally, we applied it per-node, ie. if you have
> max_active of 16, each node would be able to have 16 work items in flight at
> any given time. While introducing the affinity stuff, the enforcement became
> per-CPU - ie. each CPU would get 16 work items, which didn't turn out well
> for some workloads. v6.9 changes it so that it's always applied to the whole
> system for unbound workqueues whether NUMA or not.
> 
> system_unbound_wq is created with max_active set at WQ_MAX_ACTIVE which
> happens to be 512. If you stuff more concurrent work items into it which
> have inter-dependency - ie. completion of one work item depends on another,
> it can deadlock, which isn't too unlikely given a lot of basic kernle infra
> depends on system_unbound_wq.
>

That appears to what is happening before my series.

> > > > I think we need some of this information in the commit message in patch
> > > > 1. Because patch 1 simply says it's moving to a device private wq to
> > > > avoid hogging the system one, but the issue is much more serious.
> > > > 
> > > > Also, is the "Fixes:"  really correct? It seems more like a regression
> > > > from the wq changes and there could be other drivers showing similar
> > > > issues now. But it could alos be my lack of understanding of the real
> > > > issue.
> > > 
> > > I don't have enough context to tell whether this is a workqueue problem but
> > > if so we should definitely fix workqueue.
> > 
> > It is beyond my knowledge if the old behavior, albeit poorly designed,
> > should still work with the work queue changes in 6.9.
> 
> So, yeah, in this case, it makes sense to separate it out to a separate
> workqueue.
>

Agree. Thanks for your time.

Matt

> Thanks.
> 
> -- 
> tejun