[RFC v1 0/4] drm: Add support for DRM_CAP_DEFERRED_OUT_FENCE capability

Kasireddy, Vivek vivek.kasireddy at intel.com
Thu Aug 5 04:15:27 UTC 2021


Hi Daniel,

> > >>> The solution:
> > >>> - To ensure full framerate, the Guest compositor has to start it's repaint cycle
> (including
> > >>> the 9 ms wait) when the Host compositor sends the frame callback event to its
> clients.
> > >>> In order for this to happen, the dma-fence that the Guest KMS waits on -- before
> sending
> > >>> pageflip completion -- cannot be tied to a wl_buffer.release event. This means that,
> the
> > >>> Guest compositor has to be forced to use a new buffer for its next repaint cycle
> when it
> > >>> gets a pageflip completion.
> > >>
> > >> Is that really the only solution?
> > > [Kasireddy, Vivek] There are a few others I mentioned here:
> > > https://gitlab.freedesktop.org/wayland/weston/-/issues/514#note_986572
> > > But I think none of them are as compelling as this one.
> > >
> > >>
> > >> If we fix the event timestamps so that both guest and host use the same
> > >> timestamp, but then the guest starts 5ms (or something like that) earlier,
> > >> then things should work too? I.e.
> > >> - host compositor starts at (previous_frametime + 9ms)
> > >> - guest compositor starts at (previous_frametime + 4ms)
> > >>
> > >> Ofc this only works if the frametimes we hand out to both match _exactly_
> > >> and are as high-precision as the ones on the host side. Which for many gpu
> > >> drivers at least is the case, and all the ones you care about for sure :-)
> > >>
> > >> But if the frametimes the guest receives are the no_vblank fake ones, then
> > >> they'll be all over the place and this carefully tuned low-latency redraw
> > >> loop falls apart. Aside fromm the fact that without tuning the guests to
> > >> be earlier than the hosts, you're guaranteed to miss every frame (except
> > >> when the timing wobbliness in the guest is big enough by chance to make
> > >> the deadline on the oddball frame).
> > > [Kasireddy, Vivek] The Guest and Host use different event timestamps as we don't
> > > share these between the Guest and the Host. It does not seem to be causing any other
> > > problems so far but we did try the experiment you mentioned (i.e., adjusting the
> delays)
> > > and it works. However, this patch series is meant to fix the issue without having to
> tweak
> > > anything (delays) because we can't do this for every compositor out there.
> >
> > Maybe there could be a mechanism which allows the compositor in the guest to
> automatically adjust its repaint cycle as needed.
> >
> > This might even be possible without requiring changes in each compositor, by adjusting
> the vertical blank periods in the guest to be aligned with the host compositor repaint
> cycles. Not sure about that though.
> >
> > Even if not, both this series or making it possible to queue multiple flips require
> corresponding changes in each compositor as well to have any effect.
> 
> Yeah from all the discussions and tests done it sounds even with a
> deeper queue we have big coordination issues between the guest and
> host compositor (like the example that the guest is now rendering at
> 90fps instead of 60fps like the host).
[Kasireddy, Vivek] Oh, I think you are referring to my reply to Gerd. That 90 FPS vs 
60 FPS problem is a completely different issue that is associated with Qemu GTK UI
backend. With the GTK backend -- and also with SDL backend -- we Blit the Guest
scanout FB onto one of the backbuffers managed by EGL. 

I am trying to add a new Qemu Wayland UI backend so that we can eliminate that Blit
and thereby have a truly zero-copy solution. And, this is there I am running into the 
halved frame-rate issue -- the current problem.

> 
> Hence my gut feeling reaction that first we need to get these two
> compositors aligned in their timings, which propobably needs
> consistent vblank periods/timestamps across them (plus/minux
> guest/host clocksource fun ofc). Without this any of the next steps
> will simply not work because there's too much jitter by the time the
> guest compositor gets the flip completion events.
[Kasireddy, Vivek] Timings are not a problem and do not significantly
affect the repaint cycles from what I have seen so far.

> 
> Once we have solid events I think we should look into statically
> tuning guest/host compositor deadlines (like you've suggested in a
> bunch of places) to consisently make that deadline and hit 60 fps.
> With that we can then look into tuning this automatically and what to
> do when e.g. switching between copying and zero-copy on the host side
> (which might be needed in some cases) and how to handle all that.
[Kasireddy, Vivek] As I confirm here: https://gitlab.freedesktop.org/wayland/weston/-/issues/514#note_984065
tweaking the deadlines works (i.e., we get 60 FPS) as we expect. However,
I feel that this zero-copy solution I am trying to create should be independent
of compositors' deadlines, delays or other scheduling parameters.

> Only when that all shows that we just can't hit 60fps consistently and
> really need 3 buffers in flight should we look at deeper kms queues.
> And then we really need to implement them properly and not with a
> mismatch between drm_event an out-fence signalling. These quick hacks
> are good for experiments, but there's a pile of other things we need
> to do first. At least that's how I understand the problem here right
> now.
[Kasireddy, Vivek] Experiments done so far indicate that we can hit 59 FPS consistently
-- in a zero-copy way independent of compositors' delays/deadlines -- with this
patch series + the Weston MR I linked in the cover letter. The main reason why this
works is because we relax the assumption that when the Guest compositor gets a
pageflip completion event that it could reuse the old FB it submitted in the previous
atomic flip and instead force it to use a new one. And, we send the pageflip completion
event to the Guest when the Host compositor sends a frame callback event. Lastly,
we use the (deferred) out_fence as just a mechanism to tell the Guest compositor when
it can release references on old FBs so that they can be reused again.

With that being said, the only question is how can we accomplish the above in an upstream
acceptable way without regressing anything particularly on bare-metal. Its not clear if just
increasing the queue depth would work or not but I think the Guest compositor has to be told
when it can start its repaint cycle and when it can assume the old FB is no longer in use.
On bare-metal -- and also with VKMS as of today -- a pageflip completion indicates both.
In other words, Vblank event is the same as Flip done, which makes sense on bare-metal.
But if we were to have two events at-least for VKMS: vblank to indicate to Guest to start
repaint and flip_done to indicate to drop references on old FBs, I think this problem can
be solved even without increasing the queue depth. Can this be acceptable?

Thanks,
Vivek
> 
> Cheers, Daniel
> 
> >
> >
> > --
> > Earthling Michel Dänzer               |               https://redhat.com
> > Libre software enthusiast             |             Mesa and X developer
> 
> 
> 
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch


More information about the dri-devel mailing list