[Bug 768079] New: waylandsink: add support wayland presentation time interface (non-live case)

Mon Jun 27 03:18:40 UTC 2016

https://bugzilla.gnome.org/show_bug.cgi?id=768079

            Bug ID: 768079
           Summary: waylandsink: add support wayland presentation time
                    interface (non-live case)
    Classification: Platform
           Product: GStreamer
           Version: git master
                OS: All
            Status: NEW
          Severity: major
          Priority: Normal
         Component: gst-plugins-bad
          Assignee: gstreamer-bugs at lists.freedesktop.org
          Reporter: chul0812 at gmail.com
        QA Contact: gstreamer-bugs at lists.freedesktop.org
     GNOME version: ---

I bring comments about this tasks and wrap writer's name with angle brackets,
sorry for the poor readability.

Waylandsink was handled by George Kiagiadakis and he had written presentation
time interface codes for the demo, but the interface has been changed and
settled down as a stable protocol.

I was starting it based on George's work
(http://cgit.collabora.com/git/user/gkiagia/gst-plugins-bad.git/log/?h=demo),
removing presentation queue and considering display stack delay.
It was predicting latency at display stack from wl_commit/damage/attach to
frame presence and Pekka Paalanen(pq) advised that would not estimate the delay
from wl_surface_commit() to display.

(it's part of comments)
<pq> wonchul, if you are trying to estimate the delay from wl_surface_commit()
to display, and you don't sync the time you call commit() to the incoming
events, that's going to be a lot less accurate.
<pq> 11:11:07> no, I literally meant replacing the queueing protocol calls with
a queue implementation in the sink, so you don't use the queueing protocol
anymore, but rely only on the feedback protocol to trigger attach+commits from
the queue.
<pq> 11:12:27> the queue being a timestamp-ordered list of frame, just like in
the weston implementation.

So, the way estimating the delay from wayland is not much accurate.
I turned to add a queue holding buffers before doing render() in the
waylandsink

<Olivier Crête>
I'm a bit concerned about adding a queue in the sink that would increase the
latency unnecessarily. I wonder if this could be done while queueing around 1
buffer there in normal streaming. Are we talking about queuing the actual
frames or just information about the frames?

<Wonchul Lee>
I've queued reference of frames and tried to render based on the wayland
presentation clock.
It could bring some delay depending on specific contents by adding a queue in
the sink, It's not clear to me what specific factor cause delay yet, but yes,
it would increase the latency at the moment.

The idea was disabling clock synchronization in gstbasesink and
rendering(wayland commit/damage/attach) frames based on the calibrated wayland
clock. I pushed the reference of gstbuffer to the queue and set the async clock
callback to request render at a right time, and then rendered or dropped it
depending on the adjusted timestamp.
This changes have issues that adjusted timestamp what requested to render is
getting late than expected and it could cause dropping most of the frames at
some cases since the adjusted timestamp was always late.
So I'm referring audiobasesink to adjust clock synchronization for the frames
with wayland clock.

<Olivier Crête>
This work has two separate goals:

When the video has a different framerate than the display framerate, it should
drops frames more or less evenly, so if you need to display 4 out of 5 frames,
it should be something like 1,2,3,4,6,7,8,9,11,... Or if you need to display
30/60 frames it should display 1,3,5,7,9, etc .. Currently, GstBaseSink is not
very clever about that.
And we have to be careful as this can be also caused by the compositor not
being able to keep up. It's not because the display can do 60fps that the
compositor is actually able to produce 60 new frames, it could be limited to a
lower number, so we'll also have to make sure we're protected against that.
We want to guess the latency added by the display stack. The current GStreamer
video sinks more or less assume that a buffer is rendered immediately when the
render() vmethod returns, but this is not really how current display hardware
work. Especially when you have double or triple buffering. So we want to know
how much in advance to submit the buffer, but not too early to not display it
one interval too early.
I just asked @nicolas a quick question about how he though we should do this,
then we spent two hours whiteboarding ideas about this and we've barely been
able to define the problem.

Here are some ideas we bounced around:

After submitting one frame (the first frame? the preroll frame?), we can have
an idea of the upper bound of the latency for the live pipeline case. It should
be the time between the moment a frame was submitted and when it was finally
rendered + the "refresh". We can probably delay sending the async-done until
the presented event of the first frame has arrived.
For the non-live case, we can probably find a way to submit the frame as early
as possible before the next. Finding that time is the tricky part I think
@wonchul: could you summarize the different things your tried, what were the
hypothesis and what were the results? It's important to keep these kinds of
records for the Tax R&D filings (and so we can keep up with your work).

@pq or @daniels:

what is the logic behind the seq field, how do you expect it can be used? Do
you know any example where it is used?
I'm also not sure how we can detect the case where the compositor cannot keep
up? Or is the compositor is gnome-shell and has a gc that makes it miss a
couple frames for no good reason?
>From the info is the presented event (or any other way), is there a way we can
evaluate when is the latest we can submit a buffer to have it arrive in time
for a specific refresh? Or do we have to try and then do some kind of search to
find what those deadlines are in practice?

<Pekka Paalanen>
seq field of wp_presentation_feedback.presented event:

No examples of use, I don't think. I didn't originally considerer it as needed,
but it was added to allow implementing GLX_OML_sync_control on top of it. I do
not think we should generally depend on seq unless you specifically care about
the refresh count instead of timings. My intention with the design was that new
code can work better with timestamps, while old code you don't want to port to
timestamps could use seq as it has always done. Timestamps are "accurate",
while seq may have been estimated from a clock in the kernel and may change its
rate or may not have a constant rate at all.

seq comes from a time, when display refresh was a known guaranteed constant
frequency, and you could use it as a clock by simply counting cycles. I believe
all timing-sensitive X11 apps have been written with this assumption. But it is
no longer exactly true, it has caveats (hard to maintain across video mode
switches or display suspends, lacking hardware support, etc.), and with new
display tech it will become even less true (variable refresh rate, self-refresh
panels, ...).

seq is not guaranteed to be provided, it may be zero depending on the graphics
stack used by the compositor. I'm also not sure what it means if you don't have
both VSYNC and HW_COMPLETION in flags

The timestamp OTOH is always provided, but it may have some caveats which
should be indicated by unset bits in flags.

Compositor not keeping up:

Maybe you could use the tv + refresh from presented event to guess when the
compositor should be presenting your frame, and compare afterwards with what
actually happened?

I can't really think of a good way to know if the compositor cannot keep up or
why it cannot keep up. Hickups can happen and the compositor probably won't
know why either. All I can say is collect statistics and analyze then over
time. This might be a topic for further investigations, but to get more
information about which steps take too much time we need some kernel support
(explicit fencing) that is being developed, and make the compositor use that
information.

Only hand-waving, sorry.

Finding the deadline:

I don't think there is a way to know really, and also the compositor might be
adjusting its own schedules, so it might be variable.

The way I imaged it is that from presented event you compute the time of the
next possible presentation, and if you want to hit that, submit a frame ASAP.
This should get you just below one display-frame-cycle latency in any case, if
your rendering is already complete.

If we really need the deadline, that would call for extending the protocol, so
that the compositor could tell you when the deadline is. The compositor chooses
the deadline based on how fast it thinks it can do a composition and hit the
right vblank.

<Wonchul Lee>
About the latency, I tried to get latency added by the display stack from the
wl commit/damage/attach to the present frame. It's a variable delay depending
on the situation as pq mentioned before and could disturb targeting next
present. The way we could assume optimal latency by accumulating it and observe
a gap by the presentation feedback, maybe not always reliable.

I tried to synchronize GStreamer clock time with presentation feedback to
render a frame on time and added a queue in GstWaylandSink to request render on
each presentation feedback if there's a frame on time, similar to what George
did. It's not well fit with GstBaseSink though, and GstWaylandSink needs to
disable BaseSink time synchronization and computing itself. I faced unexpected
underflow (consistently increasing delay) when playing with mpegts stream, so
It also needs proper QOS handling to prevent underflow.

I would be good to get reliable latency from the display stack to make use of
it when synchronizing presenting time whether computing it GstWaylandSink
itself or not, there's a latency what we're missing anyway, though I'm not sure
it's feasible.

<Pekka Paalanen>
@wonchul btw. what do you mean when you say "synchronize GStreamer clock time
with presentation feedback"?

Does it mean something else than looking at what clock is advertised by
wp_presentation.clock_id and then synchronizing GStreamer clock with
clock_gettime() using the given clock id? Or does synchronizing mean something
else than being able to convert a timestamp from one clock domain to the other
domain?

<Nicolas Dufresne>
@pq I would need some clarification about submitting frame ASAP. If we blindly
do that, frames will get displayed too soon on screen (in playback, decoders
are much faster then the expected render speed). In GStreamer, we have
infrastructure to wait until the moment is right. The logic (simplified) is to
wait for the right moment minus the "currently expected" render latency, and
submit. This is in playback case of course, and is to ensure the best possible
A/V sync. In that case we expect the presentation information to be helpful in
constantly correcting that moment. What we miss, is some semantic, as just
blindly obey to the computed render delay of last frames does not seem like
best idea. We expected to be able to calculate, or estimate, a submission
window that will (most of the time) hit the screen at an estimated time.

For the live case, we're still quite screwed. Nothing seems to improve our
situation. We need at start to pick a latency, and if later find that latency
was too small (the latency is the window in which we are able to adapt), we
end-up screwing up the audio (a glitch) to increase that latency window. So
again, some semantic that we could use to calculate a pessimistic latency from
the first presentation report would be nice.

<Olivier Crête>
I think that in the live case you can probably keep a 1 frame queue at the
sink, so when a new frame arrives, you can decide if you want to present the
queued one at the next refresh or replace it with a new one. Then the thread
that talks to the compositor (and gets the events, etc), can pick the buffers
from the "queue" to send to the compositor.

<Nicolas Dufresne>
Ok, that make sense for non-live. Would be nice to document the intended use,
that was far from obvious. We keep thinking we need to look at the number, but
we don't understand at first the the moment we get called back is important.
You seem to assume that we can "pick" a frame, like if the sink was pulling
whatever it wants randomly, that unfortunately not how things works. We can
though introduce a small queue (some late queue) so we only start blocking
upstream when that queue is full. And it would help making decisions

For live it's much more complex. The entire story about declared latency is
because if we don't declare any latency, that queue will always be empty. Worst
case, the report will always tell use that we have displayed the frame late.
I'm quite sure you told me that the render pipeline can have multiple step,
where submitting frame 1 2 3 at 1 blank distance, will render on blank 3 4 5
with effectively 3 blank latency. That latency is what we need to report for
proper A/V sink in live pipeline, and changing is to be done with care as it
breaks the audio. That we need some ideas, cause right now we have no clue.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.