[PATCH] drm/atomic: Perform blocking commits on workqueue

Ville Syrjälä ville.syrjala at linux.intel.com
Thu Oct 5 10:16:27 UTC 2023


On Thu, Oct 05, 2023 at 11:57:41AM +0200, Daniel Vetter wrote:
> On Tue, Sep 26, 2023 at 01:05:49PM -0400, Ray Strode wrote:
> > From: Ray Strode <rstrode at redhat.com>
> > 
> > A drm atomic commit can be quite slow on some hardware. It can lead
> > to a lengthy queue of commands that need to get processed and waited
> > on before control can go back to user space.
> > 
> > If user space is a real-time thread, that delay can have severe
> > consequences, leading to the process getting killed for exceeding
> > rlimits.
> > 
> > This commit addresses the problem by always running the slow part of
> > a commit on a workqueue, separated from the task initiating the
> > commit.
> > 
> > This change makes the nonblocking and blocking paths work in the same way,
> > and as a result allows the task to sleep and not use up its
> > RLIMIT_RTTIME allocation.
> > 
> > Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2861
> > Signed-off-by: Ray Strode <rstrode at redhat.com>
> 
> So imo the trouble with this is that we suddenly start to make
> realtime/cpu usage guarantees in the atomic ioctl. That's a _huge_ uapi
> change, because even limited to the case of !ALLOW_MODESET we do best
> effort guarantees at best. And some drivers (again amd's dc) spend a ton
> of cpu time recomputing state even for pure plane changes without any crtc
> changes like dpms on/off (at least I remember some bug reports about
> that). And that state recomputation has to happen synchronously, because
> it always influences the ioctl errno return value.
> 
> My take is that you're papering over a performance problem here of the
> "the driver is too slow/wastes too much cpu time". We should fix the
> driver, if that's possible.
> 
> Another option would be if userspace drops realtime priorities for these
> known-slow operations. And right now _all_ kms operations are potentially
> cpu and real-time wasters, the entire uapi is best effort.
> 
> We can also try to change the atomic uapi to give some hard real-time
> guarantees so that running compositors as SCHED_RT is possible, but that
> - means a very serious stream of bugs to fix all over
> - therefore needs some very wide buy-in from drivers that they're willing
>   to make this guarantee
> - probably needs some really carefully carved out limitations, because
>   there's imo flat-out no way we'll make all atomic ioctl hard time limit
>   bound
> 
> Also, as König has pointed out, you can roll this duct-tape out in
> userspace by making the commit non-blocking and immediately waiting for
> the fences.
> 
> One thing I didn't see mention is that there's a very subtle uapi
> difference between non-blocking and blocking:
> - non-blocking is not allowed to get ahead of the previous commit, and
>   will return EBUSY in that case. See the comment in
>   drm_atomic_helper_commit()
> - blocking otoh will just block until any previous pending commit has
>   finished
> 
> Not taking that into account in your patch here breaks uapi because
> userspace will suddenly get EBUSY when they don't expect that.

The -EBUSY logic already checks whether the current commit is
non-blocking vs. blocking commit, so I don't see how there would
be any change in behaviour from simply stuffing the commit_tail
onto a workqueue, especially as the locks will be still held across
the flush.

In my earlier series [1] where I move the flush to happen after dropping
the locks there is a far more subtle issue because currently even
non-blocking commits can actually block due to the mutex. Changing
that might break something, so I preserved that behaviour explicitly.
Full explanation in the first patch there.

[1] https://patchwork.freedesktop.org/series/108668/

-- 
Ville Syrjälä
Intel


More information about the dri-devel mailing list