[PATCH] drm/atomic: Perform blocking commits on workqueue

Fri Oct 6 18:48:56 UTC 2023

Hi,

On Fri, Oct 6, 2023 at 3:12 AM Christian König <christian.koenig at amd.com> wrote:
> When the operation busy waits then that *should* get accounted to the
> CPU time of the current process. When the operation sleeps and waits for
> some interrupt for example it should not get accounted.
> What you suggest is to put the parts of the operation which busy wait
> into a background task and then sleep for that task to complete. This is
> not a solution to the problem, but just hides it.

Actually, I think we both probably agree that there shouldn't be long
busy waits in the context of the current process. After all, we both
agree what the AMD DC driver code is doing is wrong.

To be clear, my take is, if driver code is running in process context
and needs to wait for periods of time on the order of or in excess of
a typical process time slice it should be sleeping during the waiting.
If the operation is at a point where it can be cancelled without side
effects, the sleeping should be INTERRUPTIBLE. If it's past the point
of no return, sleeping should be UNINTERRUPTIBLE. At no point, in my
opinion, should kernel code busy block a typical process for dozens of
milliseconds while keeping the process RUNNING. I don't think this is
a controversial take.

Actually, I think (maybe?) you might even agree with that, but you're
also saying: user space processes aren't special here. While it's not
okay to busy block them, it's also not okay to busy block on the
system unbound workqueue either. If that's your sentiment, I don't
disagree with it.

So I think we both agree the busy waiting is a problem, but maybe we
disagree on the best place for the problem to manifest when it
happens.

One thought re the DC code is regardless of where the code is running,
the scheduler is going to forcefully preempt it at some point right?
Any writereg/udelay(1)/readreg loop is going to get disrupted by a
much bigger than 1us delay by the kernel if the loop goes on long
enough. I'm not wrong about that? if that's true, the code might as
well switch out the udelay(1) for a usleep(1) and call it a day (well
modulo the fact I think it can be called from an interrupt handler; at
least "git grep" says there's a link_set_dpms_off in
link_dp_irq_handler.c)

> Stuff like that is not a valid justification for the change. Ville
> changes on the other hand tried to prevent lock contention which is a
> valid goal here.

Okay so let's land his patchset! (assuming it's ready to go in).
Ville, is that something you'd want to resend for review?

Note, a point that I don't think has been brought up yet, too, is the
system unbound workqueue doesn't run with real time priority. Given
the lion's share of mutter's drmModeAtomicCommit calls are nonblock,
and so are using the system unbound workqueue for handling the
commits, even without this patch, that somewhat diminishes the utility
of using a realtime thread anyway. I believe the original point of the
realtime thread was to make sure mouse cursor motion updates are as
responsive as possible, because any latency there is something the
user really feels. Maybe there needs to be a different mechanism in
place to make sure user perceived interactivity is given priority.

--Ray