[PATCH] drm/atomic: Perform blocking commits on workqueue

Tue Oct 10 11:41:28 UTC 2023

On Mon, Oct 09, 2023 at 02:36:17PM +0200, Christian König wrote:
> Am 09.10.23 um 14:19 schrieb Ville Syrjälä:
> > On Mon, Oct 09, 2023 at 08:42:24AM +0200, Christian König wrote:
> > > Am 06.10.23 um 20:48 schrieb Ray Strode:
> > > > Hi,
> > > > 
> > > > On Fri, Oct 6, 2023 at 3:12 AM Christian König <christian.koenig at amd.com> wrote:
> > > > > When the operation busy waits then that *should* get accounted to the
> > > > > CPU time of the current process. When the operation sleeps and waits for
> > > > > some interrupt for example it should not get accounted.
> > > > > What you suggest is to put the parts of the operation which busy wait
> > > > > into a background task and then sleep for that task to complete. This is
> > > > > not a solution to the problem, but just hides it.
> > > > Actually, I think we both probably agree that there shouldn't be long
> > > > busy waits in the context of the current process. After all, we both
> > > > agree what the AMD DC driver code is doing is wrong.
> > > > 
> > > > To be clear, my take is, if driver code is running in process context
> > > > and needs to wait for periods of time on the order of or in excess of
> > > > a typical process time slice it should be sleeping during the waiting.
> > > > If the operation is at a point where it can be cancelled without side
> > > > effects, the sleeping should be INTERRUPTIBLE. If it's past the point
> > > > of no return, sleeping should be UNINTERRUPTIBLE. At no point, in my
> > > > opinion, should kernel code busy block a typical process for dozens of
> > > > milliseconds while keeping the process RUNNING. I don't think this is
> > > > a controversial take.
> > > Exactly that's what I completely disagree on.
> > > 
> > > When the driver is burning CPU cycles on behalves of a process then
> > > those CPU cycles should be accounted to the process causing this.
> > > 
> > > That the driver should probably improve it's behavior is a different issue.
> > > 
> > > > Actually, I think (maybe?) you might even agree with that, but you're
> > > > also saying: user space processes aren't special here. While it's not
> > > > okay to busy block them, it's also not okay to busy block on the
> > > > system unbound workqueue either. If that's your sentiment, I don't
> > > > disagree with it.
> > > No, it's absolutely ok to busy block them it's just not nice to do so.
> > > 
> > > As Daniel pointed out this behavior is not incorrect at all. The DRM
> > > subsystem doesn't make any guarantee that drmModeAtomicCommit() will not
> > > burn CPU cycles.
> > > 
> > > > So I think we both agree the busy waiting is a problem, but maybe we
> > > > disagree on the best place for the problem to manifest when it
> > > > happens.
> > > > 
> > > > One thought re the DC code is regardless of where the code is running,
> > > > the scheduler is going to forcefully preempt it at some point right?
> > > > Any writereg/udelay(1)/readreg loop is going to get disrupted by a
> > > > much bigger than 1us delay by the kernel if the loop goes on long
> > > > enough. I'm not wrong about that? if that's true, the code might as
> > > > well switch out the udelay(1) for a usleep(1) and call it a day (well
> > > > modulo the fact I think it can be called from an interrupt handler; at
> > > > least "git grep" says there's a link_set_dpms_off in
> > > > link_dp_irq_handler.c)
> > > > 
> > > > > Stuff like that is not a valid justification for the change. Ville
> > > > > changes on the other hand tried to prevent lock contention which is a
> > > > > valid goal here.
> > > > Okay so let's land his patchset! (assuming it's ready to go in).
> > > > Ville, is that something you'd want to resend for review?
> > > Well, while Ville patch has at least some justification I would still
> > > strongly object to move the work into a background thread to prevent
> > > userspace from being accounted for the work it causes.
> > Aren't most wayland compositors using nonblocking commits anyway?
> > If so they would already be bypassing proper CPU time accounting.
> > Not saying we shouldn't try to fix that, but just pointing out that
> > it already is an issue with nonblocking commits.
> 
> That's a rather good argument, but for async operations background work is
> simply a necessity because you otherwise can't implement them.

Yeah I don't think we can use "we need to properly account" stuff to
reject this, because we don't. Also we already do fail with inheriting the
priority properly, so another nail in the "kms is best effort".

> The key point here is that the patch puts the work into the background just
> to avoid that it is accounted to the thread issuing it, and that in turn is
> not valid as far as I can see.

Yeah it's that aspect I'm really worried about, because we essentially
start to support some gurantees that a) most drivers can't uphold without
a huge amount of work, some of the DC state recomputations are _really_
expensive b) without actually making the semantics clear, it's just
duct-tape.

Yes compositors want to run kms in real-time, and yes that results in fun
if you try to strictly account for cpu time spent. Especially if your
policy is to just nuke the real time thread instead of demoting it to
SCHED_NORMAL for a time.

I think if we want more than hacks here we need to answer two questions:
- which parts of the kms api are real time
- what exactly do we guarantee with that

And that answer probably needs to include things like the real time thread
workers for cursor and other nonblocking commits.
-Sima
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch