[PATCH 0/3] Rework work queue usage

Thu Mar 28 19:40:18 UTC 2024

Hello,

On Thu, Mar 28, 2024 at 07:30:41PM +0000, Matthew Brost wrote:
> The test creates 100s of exec queues that all can be preempted in
> parallel. In the current code this results in each exec queue kicking a
> worker which is scheduled on the system_unbound_wq, these workers wait
> and sleep (using a waitqueue) on signaling from another worker. The
> other worker, which is also scheduled system_unbound_wq, is processing a
> queue which interacts with the GPU. I'm thinking the worker which
> interacts with hardware gets straved by the waiter resulting in a
> deadlock.
> 
> This patch changes the waiters to uses a device private ordered work
> queue so at most we have 1 waiter a time. Regardless of the new work
> queue behavior this a better design.
> 
> It is beyond my knowledge if the old behavior, albiet poorly designed,
> should still work with the work queue changes in 6.9.

Ah, okay, I think you're hitting the max_active limit which regulates the
maximum number of work items which can be in flight at any given time. Is
the test machine a NUMA setup by any chance?

We went through a couple changes in terms of how max_active is enforced on
NUMA machines. Originally, we applied it per-node, ie. if you have
max_active of 16, each node would be able to have 16 work items in flight at
any given time. While introducing the affinity stuff, the enforcement became
per-CPU - ie. each CPU would get 16 work items, which didn't turn out well
for some workloads. v6.9 changes it so that it's always applied to the whole
system for unbound workqueues whether NUMA or not.

system_unbound_wq is created with max_active set at WQ_MAX_ACTIVE which
happens to be 512. If you stuff more concurrent work items into it which
have inter-dependency - ie. completion of one work item depends on another,
it can deadlock, which isn't too unlikely given a lot of basic kernle infra
depends on system_unbound_wq.

> > > I think we need some of this information in the commit message in patch
> > > 1. Because patch 1 simply says it's moving to a device private wq to
> > > avoid hogging the system one, but the issue is much more serious.
> > > 
> > > Also, is the "Fixes:"  really correct? It seems more like a regression
> > > from the wq changes and there could be other drivers showing similar
> > > issues now. But it could alos be my lack of understanding of the real
> > > issue.
> > 
> > I don't have enough context to tell whether this is a workqueue problem but
> > if so we should definitely fix workqueue.
> 
> It is beyond my knowledge if the old behavior, albeit poorly designed,
> should still work with the work queue changes in 6.9.

So, yeah, in this case, it makes sense to separate it out to a separate
workqueue.

Thanks.

-- 
tejun