[Intel-gfx] Shared atomic state causing Weston repaint failure

Wed Jul 4 19:58:15 UTC 2018

On Wed, Jul 4, 2018 at 5:44 PM, Daniel Stone <daniel at fooishbar.org> wrote:
> Hi,
> The atomic API being super-explicit about how userspace sequences its
> calls is great and all, but having shared global state implicitly
> dragged in is kind of ruining my day.
>
> Currently on Intel, Weston sometimes fails on hotplug, because a
> commit which only enables CRTC B (not touching CRTC A or any other
> CRTC!), causes all commits to CRTC A to fail until CRTC B's modeset
> commit has fully retired:
>     https://gitlab.freedesktop.org/wayland/weston/issues/24
>
> The reason is that committing CRTC B resizes the DDB allocation for
> CRTC A as well, pulling CRTC A's CRTC state into the commit. This
> makes sense, but on the other hand it's totally opaque to userspace,
> and impossible for us to reason about when making commits.
>
> I suggested some options in that GitLab commit, none of which I like:
>   * if any other CRTCs are pulled into a commit state, always execute
> a blocking commit in the kernel
>   * if we're passing ALLOW_MODESET in userspace, only ever do blocking commits
>   * whenever we get -EBUSY in userspace, assume we've been screwed by
> the kernel and defer until other outputs have completed
>   * whenever we want to reconfigure any output in userspace, wait
> until all outputs are completely quiescent and do a single atomic
> commit covering all outputs
>
> The first one seems completely non-obvious from the kernel, but on the
> other hand the current -EBUSY failing behaviour is also non-obvious.
>
> The second is maybe the most reasonable, but on the other hand just
> working around a painful leaky abstraction: we also can't know upfront
> from userspace if this is actually going to be required, or if we're
> just killing responsiveness blocking for no reason.
>
> The third is the thing I least want to do, because it might well paper
> over legitimate bugs in userspace, and complicates our state tracking
> for no reason.
>
> The fourth is probably the most legitimate, but, well ... someone has
> to type up all the code to make our output-configuration API
> completely asynchronous.
>
> I suspect we're the first ones to be hitting this, because Weston has
> a truly independent per-CRTC repaint loop, we're one of the few atomic
> users, and also because Pekka did some seriously brutal hotplug
> testing whilst reworking Weston's output configuration API. Also
> because our approach to failed output repaints is to just freeze the
> output until it next cycles off and on, which is much more apparent
> than just silently dropping a frame here and there. ;)
>
> Any bright ideas on what could practically be done here?
>
> Cheers,
> Daniel

We had an entirely inconclusive chat on irc about this topic. Random notes:

- This can fail both on the commit doing a modeset (when it adds some
random other CRTC which still has a pending flip), and for normal
flips (if they run into a CRTC which has been affected by a modeset).

- Just waiting for all nonblocking modesets to complete is not enough,
because you don't get an event for these affected CRTC. And they might
take longer. So you can still get an EBUSY even when it looks like
everything is done.

- If you don't combine nonblocking with ALLOW_MODESET there's no
issue, because everyone just blocks (and maybe misses a few frames)
until the modeset is done.

- We want to tell apart an EBUSY due to the redraw loop having lost
sync from an EBUSY due to these modesets that affect more than just
the passed-in CRTCs. It's a really nice debug feature for compositor
developers.

-> One way or another we need new/clarified uapi, atomic as
implemented doesn't really cut it.

A few of the options we've discussed:

- Eating all the EBUSY for ALLOW_MODESET. A bit a bigger hammer.
Slightly smaller hammer is just eating all the EBUSY for crtc not in
the original request. It would still be nice to tell userspace about
the additional CRTCs its atomic comit affects.

- Some new flags to send out events (what about fence_ptr?) for these
affected crtcs, plus reporting them out in a new affected_crtcs_mask
in the atomic ioctl struct.

- One tricky issue is that drivers might pull in CRTCs which are
disabled and stay disabled. Events aren't well-defined on these at
all, but they might still provoke an EBUSY!

- Compositors would like to know how many frames they've lost. One
idea was to split up the retry error codes: EINTR would mean retrying
immedetialy due to a signal. EAGAIN would mean the kernel had to
insert something to make a modeset possible, retry only after you
waited for one vblank. With current userspace this would only result
in cpu wasted, but updated userspace could be more intelligent about
things. We could also do a special flag or whatever.

We didn't zero in on anything specific since it's not really clear
what compositors want here at all.

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch