[RFC] wl_surface video protocol extension
james.dutton at gmail.com
Fri Oct 18 12:52:22 CEST 2013
On 18 October 2013 08:30, Pekka Paalanen <ppaalanen at gmail.com> wrote:
> On Thu, 17 Oct 2013 21:34:02 +0100
> James Courtier-Dutton <james.dutton at gmail.com> wrote:
> > The key point I was trying to make is that the media player needs to be
> > able to predict when frames will be displayed.
> Yes, the *player* needs to be able to predict, and we aim to give it the
> historical information to do exactly that.
Yes, the "player" needs
> > All its prediction calculations will be based on a fixed scanout rate.
> Why only fixed scanout rate? Why can't you dynamically adapt to the
> latency between requested and realized presentation time, over time?
Conceptually, the player is not requesting a presentation time. It can't.
The scanout does not change to match the player.
The player predicts that a particular frame scanout will happen at time X,
and the player then decides what it wants displayed for that scanout.
So what we need is:
1) Predictable scanout times. <- We can get this from the historical
callback timestamps as you mentioned.
2) A Deterministic way to ensure that the frame the player wants to appear
at a particular scanout does actually appear then.
It is this second point that I am not sure we have yet.
> There are lots of ways to predict based on past measurements and
> taking account uncertainty, Kalman filter is the first one to come
> my mind, but I guess there are better methods for meeting a
> deadline/cycle kind of scenarios.
I don't think drivers predict anything, do they? Right now, drivers
> only tell you what happened (the presentation timestamp), if even that.
> AFAIK, most popular drivers do not currently even have queues. You can
> schedule *one* page flip at a time, and you cannot even cancel that
> later if you wanted to, from what I've heard.
The current situation is not ideal, so players can only achieve best effort.
But when you can get deterministic frame display, the player can do a much
I difference is noticeable to the person watching. The artifact that is
most noticeable is jitter, caused by missing a scanout or the wrong frame
appearing on the wrong scanout.
> In the future I'm hoping we would have drivers where you could actually
> queue buffers for hardware overlays, so the buffer queue does not need
> to live completely in the userspace on the mercy of scheduling hickups
> and system load. But even then I can't imagine drivers doing any
> predictions, they would just try to meet requested presentation times.
> If drivers predict, they will do it internally only to make sure the
> requested presentation time is realized. *How* it is realized is a good
> question (never show before the requested time? or make sure is on
> screen at the requested time? or...)
For low power applications, like smart phones with a separate APU and GPU,
there is a possibility to send say 10 frames from the APU to the GPU. The
APU can then sleep for 10 frames, while the GPU outputs the frames at all
the correct scanouts. The APU then wakes up to gather the next 10 frames.
So, in this scenario, the API between the APU and GPU would need to be able
request the previous 10 frame scanout times stamps, because an individual
frame callback would not work because the APU was sleeping.
> The compositor is similar to drivers in my opinion. If it tries to
> predict something, it will only do that to be able to meet the
> requested presentation times. Who knows, maybe predictions in the
> compositor and drivers would actually make prediction in the clients
> worse by introducing more variables.
> In any case, I do not see the compositor being responsible for
> predicting times for clients' buffers. In my view, it is clients'
> resposibility to predict in any way they want, based on hard facts of
> the past given by the compositor and drivers. The client (player) is the
> only one knowing everything relevant, like the original video frame rate
> and audio position, and whether it wants to prefer showing frames early,
> late, when to skip, etc.
If you are mentioning the compositor, a nice feature would be to be able to
send the video stream to the compositor, then also an overlay to be mixed
with the video stream just before display. Both the video stream and the
overlay need a way to make sure they are displayed at the correct scanout.
There is also a need for another overlay to not need to be deterministic.
E.g. 1 Video stream. One overlay stream for subtitles. <- need to be in
sync for eg. karaoke.
second overlay for non-deterministic item such as menus, program guide
overlay on video etc.
> If you are concerned about video start, where you do not yet have
> measurements available to predict from, then yeah, all you have is the
> current monitor framerate. But after the first frame has been
> presented, you get the first presentation timestamp, which should allow
> you to adapt to the phase of the scanout cycle. Right?
So long as there has not been a frame rate change recently, historical data
should be available.
If you are changing the frame rate at the very start of the video, you can
normally wait a few frames before starting the video, allowing the display
> And of course repeat that "coarse calibration" every time scanout rate
> (monitor refresh rate) changes.
> While the presentation loop is running, you could theoretically adjust
> the gap between requested and realized presentation timestamps. If the
> gap gets too small and the frames start to miss their intended scanout
> cycle, you see it in the realized timestamps, and can back off. It is a
> trial and error, but I'm hoping it will take only a few frames in the
> beginning to reach a steady state and lock the gap length, after which
> the predictor in the player should be in sync.
In ALSA (Linux sound) there is a concept of "delay".
This is a measure of "If I submit the sound sample now, when will it be
played?" I.e. It will be played in 20ms. The returned value depends on the
amount of sound samples already queued, but not yet "SoundOut".
For Video, this would be "If I submit a frame now, which scanout will it be
displayed on?". Can we get an answer for this question through some API?
> > Also, If buffer2 has T=200. What is the 200 being compared to in order to
> > decide to display the buffer or not?
> > This will not be the timestamp on the presentation callback will it?
> Yes, it will, if you mean clock domains. There needs to be a way to have
> all timestamps related to frames in the same clock domain, of course. An
> idea we also already discussed but didn't mention yet, is the
> compositor to tell clients which clock domain it uses for all frame
> timing purposes. Then clients can ask the current time directly from
> the kernel if they need it, and relate it to the audio clock.
Will the graphics card have its own clock, being used for scanout clocking.
Can this be read by the user? Can it use that clock to timestamp the
One of the problems with PCs is that the graphics card has its own clock,
the sound card has its own clock, and the kernel has its own clock.
The software then has to somehow match them all up.
Embedded systems are generally better and can use the same hardware based
clocking for graphics card and sound card and kernel clock.
> It is a good question, what does the requested presentation time
> actually mean. Should the system guarantee, that the frame is already
> on screen at that time, or that it is not shown before that time?
> Luckily, the realized presentation time can be defined exactly, e.g.
> the time when the hardware starts to scan out the new image. The
> realized time is the one that matters, and the specific meaning of
> requested time is not so important, since a client can adapt. No?
The client can adapt so long as it known which clock domain each of the
timestamps are in.
> > There is also the matter of clock sync. Is T=100 referenced to the system
> > monotonic clock or some other clock.
> > Video and Audio sync is not achieved by comparing Video and Audio time
> > stamps.
> > You have a global system monotonic clock, and sync Video to the system
> > clock, and sync Audio to the system clock.
> > A good side effect of this is that Audio and Video are then in sync.
> > The advantage of this syncing to the system monotonic clock is that you
> > then run Video and Audio in separate thread, different CPUs, whenever,
> > they will always be in sync.
> Some GStreamer people have told me that they actually prefer to use the
> audio clock as the master clock, and infer everything else from that.
> Anyway, yes, clock domains are important.
This method is not always best. Some sound cards are very bad at providing
an audio clock.
xine provides several different sync/clock methods so the user can select
the best one for their hardware.
> I would like to know if I have understood something wrong in this whole
> video thing.
I think how wayland can help provide the "deterministic" angle is not yet
clear to me.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the wayland-devel