[Intel-gfx] Shared atomic state causing Weston repaint failure
Jakob Bornecrantz
jakob at collabora.com
Thu Jul 5 13:32:42 UTC 2018
Hello DanielĀ² et al,
I apologies in advance if the things I bring up are a bit orthogonal
to the current discussion. And I'm writing this from the view of VR,
but I have not written a VR compositor so take my comments with a bit
of salt. And also changing modes while using a VR headset is probably
going to be ultra rare so not sure how much effort it is worth.
So from a VR compositor getting blocked like this is a no-go as the
user would quickly throw EPUKE. The situation is compounded by the
fact that the VR compositor has no idea what the display compositor is
doing with regards to setting modes so can not do any mitigating on
its side (like displaying a black screen).
Some solutions that springs to mind (some I admit are probably not possible).
- Make sure we don't get into this situation by locking the resources
of the VR crtc group or allocating enough bandwidth for the display
compositor crtcs up front.
- Add priority and preemption to atomic so that VR compositor can
never be blocked.
- Add X/Wayland protocol for the compositor to tell the VR compositor
that a modeset might effect it, so it can display a black-screen
during that time.
- Make it possible for the VR compositor to tell the kernel what it
should do this case, like show black if I happen to get block before I
can queue a new pageflip.
Cheers, Jakob.
On Wed, Jul 4, 2018 at 8:58 PM Daniel Vetter <daniel at ffwll.ch> wrote:
>
> On Wed, Jul 4, 2018 at 5:44 PM, Daniel Stone <daniel at fooishbar.org> wrote:
> > Hi,
> > The atomic API being super-explicit about how userspace sequences its
> > calls is great and all, but having shared global state implicitly
> > dragged in is kind of ruining my day.
> >
> > Currently on Intel, Weston sometimes fails on hotplug, because a
> > commit which only enables CRTC B (not touching CRTC A or any other
> > CRTC!), causes all commits to CRTC A to fail until CRTC B's modeset
> > commit has fully retired:
> > https://gitlab.freedesktop.org/wayland/weston/issues/24
> >
> > The reason is that committing CRTC B resizes the DDB allocation for
> > CRTC A as well, pulling CRTC A's CRTC state into the commit. This
> > makes sense, but on the other hand it's totally opaque to userspace,
> > and impossible for us to reason about when making commits.
> >
> > I suggested some options in that GitLab commit, none of which I like:
> > * if any other CRTCs are pulled into a commit state, always execute
> > a blocking commit in the kernel
> > * if we're passing ALLOW_MODESET in userspace, only ever do blocking commits
> > * whenever we get -EBUSY in userspace, assume we've been screwed by
> > the kernel and defer until other outputs have completed
> > * whenever we want to reconfigure any output in userspace, wait
> > until all outputs are completely quiescent and do a single atomic
> > commit covering all outputs
> >
> > The first one seems completely non-obvious from the kernel, but on the
> > other hand the current -EBUSY failing behaviour is also non-obvious.
> >
> > The second is maybe the most reasonable, but on the other hand just
> > working around a painful leaky abstraction: we also can't know upfront
> > from userspace if this is actually going to be required, or if we're
> > just killing responsiveness blocking for no reason.
> >
> > The third is the thing I least want to do, because it might well paper
> > over legitimate bugs in userspace, and complicates our state tracking
> > for no reason.
> >
> > The fourth is probably the most legitimate, but, well ... someone has
> > to type up all the code to make our output-configuration API
> > completely asynchronous.
> >
> > I suspect we're the first ones to be hitting this, because Weston has
> > a truly independent per-CRTC repaint loop, we're one of the few atomic
> > users, and also because Pekka did some seriously brutal hotplug
> > testing whilst reworking Weston's output configuration API. Also
> > because our approach to failed output repaints is to just freeze the
> > output until it next cycles off and on, which is much more apparent
> > than just silently dropping a frame here and there. ;)
> >
> > Any bright ideas on what could practically be done here?
> >
> > Cheers,
> > Daniel
>
> We had an entirely inconclusive chat on irc about this topic. Random notes:
>
> - This can fail both on the commit doing a modeset (when it adds some
> random other CRTC which still has a pending flip), and for normal
> flips (if they run into a CRTC which has been affected by a modeset).
>
> - Just waiting for all nonblocking modesets to complete is not enough,
> because you don't get an event for these affected CRTC. And they might
> take longer. So you can still get an EBUSY even when it looks like
> everything is done.
>
> - If you don't combine nonblocking with ALLOW_MODESET there's no
> issue, because everyone just blocks (and maybe misses a few frames)
> until the modeset is done.
>
> - We want to tell apart an EBUSY due to the redraw loop having lost
> sync from an EBUSY due to these modesets that affect more than just
> the passed-in CRTCs. It's a really nice debug feature for compositor
> developers.
>
> -> One way or another we need new/clarified uapi, atomic as
> implemented doesn't really cut it.
>
> A few of the options we've discussed:
>
> - Eating all the EBUSY for ALLOW_MODESET. A bit a bigger hammer.
> Slightly smaller hammer is just eating all the EBUSY for crtc not in
> the original request. It would still be nice to tell userspace about
> the additional CRTCs its atomic comit affects.
>
> - Some new flags to send out events (what about fence_ptr?) for these
> affected crtcs, plus reporting them out in a new affected_crtcs_mask
> in the atomic ioctl struct.
>
> - One tricky issue is that drivers might pull in CRTCs which are
> disabled and stay disabled. Events aren't well-defined on these at
> all, but they might still provoke an EBUSY!
>
> - Compositors would like to know how many frames they've lost. One
> idea was to split up the retry error codes: EINTR would mean retrying
> immedetialy due to a signal. EAGAIN would mean the kernel had to
> insert something to make a modeset possible, retry only after you
> waited for one vblank. With current userspace this would only result
> in cpu wasted, but updated userspace could be more intelligent about
> things. We could also do a special flag or whatever.
>
> We didn't zero in on anything specific since it's not really clear
> what compositors want here at all.
>
> Cheers, Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
> _______________________________________________
> dri-devel mailing list
> dri-devel at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
More information about the Intel-gfx
mailing list