[PATCH] drm/atomic: Perform blocking commits on workqueue

Tue Oct 17 07:32:31 UTC 2023

On Fri, Oct 13, 2023 at 12:22:52PM +0200, Michel Dänzer wrote:
> On 10/13/23 11:41, Daniel Vetter wrote:
> > On Thu, Oct 12, 2023 at 02:19:41PM -0400, Ray Strode wrote:
> >> On Mon, Oct 09, 2023 at 02:36:17PM +0200, Christian König wrote:
> >>>>>> To be clear, my take is, if driver code is running in process context
> >>>>>> and needs to wait for periods of time on the order of or in excess of
> >>>>>> a typical process time slice it should be sleeping during the waiting.
> >>>>>> If the operation is at a point where it can be cancelled without side
> >>>>>> effects, the sleeping should be INTERRUPTIBLE. If it's past the point
> >>>>>> of no return, sleeping should be UNINTERRUPTIBLE. At no point, in my
> >>>>>> opinion, should kernel code busy block a typical process for dozens of
> >>>>>> milliseconds while keeping the process RUNNING. I don't think this is
> >>>>>> a controversial take.
> >>>>> Exactly that's what I completely disagree on.
> >>
> >> Okay if we can't agree that it's not okay for user space (or the
> >> kernel running in the context of user space) to busy loop a cpu core
> >> at 100% utilization throughout and beyond the process's entire
> >> scheduled time slice then we really are at an impasse. I gotta say i'm
> >> astonished that this seemingly indefensible behavior is somehow a
> >> point of contention, but I'm not going to keep arguing about it beyond
> >> this email.
> >>
> >> I mean we're not talking about scientific computing, or code
> >> compilation, or seti at home. We're talking about nearly the equivalent
> >> of `while (1) __asm__ ("nop");`
> > 
> > I don't think anyone said this shouldn't be fixed or improved.
> > 
> > What I'm saying is that the atomic ioctl is not going to make guarantees
> > that it will not take up to much cpu time (for some extremely vague value
> > of "too much") to the point that userspace can configure it's compositor
> > in a way that it _will_ get killed if we _ever_ violate this rule.
> > 
> > We should of course try to do as good as job as possible, but that's not
> > what you're asking for. You're asking for a hard real-time guarantee with
> > the implication if we ever break it, it's a regression, and the kernel has
> > to bend over backwards with tricks like in your patch to make it work.
> 
> I don't think mutter really needs or wants such a hard real-time
> guarantee. What it needs is a fighting chance to react before the kernel
> kills its process.
> 
> The intended mechanism for this is SIGXCPU, but that can't work if the
> kernel is stuck in a busy-loop. Ray's patch seems like one way to avoid
> that.

I don't think signals will get us out of this either, or at least still
has risk. We are trying to make everything interruptible and bail out
asap, but those checks are when we're blocking, not when the cpu is busy.

So this also wont guarantee that you expire your timeslice when a driver
is doing a silly expensive atomic_check computation. Much less likely, but
definitely not a zero chance.

> That said, as long as SIGXCPU can work as intended with the non-blocking
> commits mutter uses for everything except modesets, mutter's workaround
> of dropping RT priority for the blocking commits seems acceptable for
> the time being.

Is there no rt setup where the kernel just auto-demotes when you've used
up your slice? That's the only setup I see that guarantees you're not
getting killed here.

I think dropping rt priority around full modesets is still good since
modests really shouldn't ever run as rt, that makes no sense to me.
-Sima
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch