[RFC v1 0/4] drm: Add support for DRM_CAP_DEFERRED_OUT_FENCE capability
Michel Dänzer
michel at daenzer.net
Tue Aug 3 07:33:53 UTC 2021
On 2021-08-03 8:11 a.m., Kasireddy, Vivek wrote:
>
>>> The goal:
>>> - Maintain full framerate even when the Guest scanout FB is flipped onto a hardware
>> plane
>>> on the Host -- regardless of either compositor's scheduling policy -- without making any
>>> copies and ensuring that both Host and Guest are not accessing the buffer at the same
>> time.
>>>
>>> The problem:
>>> - If the Host compositor flips the client's buffer (in this case Guest compositor's buffer)
>>> onto a hardware plane, then it can send a wl_buffer.release event for the previous buffer
>>> only after it gets a pageflip completion. And, if the Guest compositor takes 10-12 ms to
>>> submit a new buffer and given the fact that the Host compositor waits only for 9 ms, the
>>> Guest compositor will miss the Host's repaint cycle resulting in halved frame-rate.
>>>
>>> The solution:
>>> - To ensure full framerate, the Guest compositor has to start it's repaint cycle (including
>>> the 9 ms wait) when the Host compositor sends the frame callback event to its clients.
>>> In order for this to happen, the dma-fence that the Guest KMS waits on -- before sending
>>> pageflip completion -- cannot be tied to a wl_buffer.release event. This means that, the
>>> Guest compositor has to be forced to use a new buffer for its next repaint cycle when it
>>> gets a pageflip completion.
>>
>> Is that really the only solution?
> [Kasireddy, Vivek] There are a few others I mentioned here:
> https://gitlab.freedesktop.org/wayland/weston/-/issues/514#note_986572
> But I think none of them are as compelling as this one.
>
>>
>> If we fix the event timestamps so that both guest and host use the same
>> timestamp, but then the guest starts 5ms (or something like that) earlier,
>> then things should work too? I.e.
>> - host compositor starts at (previous_frametime + 9ms)
>> - guest compositor starts at (previous_frametime + 4ms)
>>
>> Ofc this only works if the frametimes we hand out to both match _exactly_
>> and are as high-precision as the ones on the host side. Which for many gpu
>> drivers at least is the case, and all the ones you care about for sure :-)
>>
>> But if the frametimes the guest receives are the no_vblank fake ones, then
>> they'll be all over the place and this carefully tuned low-latency redraw
>> loop falls apart. Aside fromm the fact that without tuning the guests to
>> be earlier than the hosts, you're guaranteed to miss every frame (except
>> when the timing wobbliness in the guest is big enough by chance to make
>> the deadline on the oddball frame).
> [Kasireddy, Vivek] The Guest and Host use different event timestamps as we don't
> share these between the Guest and the Host. It does not seem to be causing any other
> problems so far but we did try the experiment you mentioned (i.e., adjusting the delays)
> and it works. However, this patch series is meant to fix the issue without having to tweak
> anything (delays) because we can't do this for every compositor out there.
Maybe there could be a mechanism which allows the compositor in the guest to automatically adjust its repaint cycle as needed.
This might even be possible without requiring changes in each compositor, by adjusting the vertical blank periods in the guest to be aligned with the host compositor repaint cycles. Not sure about that though.
Even if not, both this series or making it possible to queue multiple flips require corresponding changes in each compositor as well to have any effect.
--
Earthling Michel Dänzer | https://redhat.com
Libre software enthusiast | Mesa and X developer
More information about the dri-devel
mailing list