[Spice-devel] [client v2 07/12] channel-display: Minimize the stream lag by ignoring the server one
Francois Gouget
fgouget at codeweavers.com
Tue Jun 25 23:19:26 UTC 2019
On Thu, 20 Jun 2019, Frediano Ziglio wrote:
[...]
> > - It performs its own frame mmtime conversion to the local monotonic
> > clock spice_session_get_client_time() since the server-controlled
> > mmtime clock cannot be relied on. This conversion uses data from all
>
> The "since the server-controlled mmtime clock cannot be relied on" could
> be confusing. In a relative, not absolute way is reliable.
> But I suppose what you are meaning here is that you rely on the
> mmtimes from the frames, not the "server_time".
The client's mmtime clock depends on mm_time_offset which the server may
yank one way or the other at any time. That makes it unreliable for the
purpose of letting the _client_ be in control of when it's going to
display each frame.
> > streams but is global so all streams ihave a common time reference.
>
> typo: "ihave"
Fixed.
> > - This is also where mmtime clock changes caused by server migration
> > are handled.
>
> "To do so this is also where mmtime clock changes caused by server migration
> are handled." ?
Reworded.
> > - It tracks the margin between the converted time-to-display frame
> > timestamp and the curren time and adds a delay to ensure this margin
>
> typo: "curren"
Fixed.
> > record(frames_stats,
> > - "frame mm_time %u size %u creation time %" PRId64
> > + "frame time %u size %u creation time %" PRId64
> > " decoded time %" PRId64 " queue %u before %u",
> > - frame->mm_time, frame->size, frame->creation_time,
> > duration,
> > + frame->time, frame->size, frame->creation_time, duration,
>
> Here we really need the original server mmtime, not hard to fix this,
> just add again the mm_time with the initial server value but use
> just "time" field for everything else
Fixed.
> > @@ -124,8 +124,22 @@ struct display_stream {
> >
> > SpiceChannel *channel;
> >
> > - /* stats */
> > uint32_t first_frame_mm_time;
>
> should this field be moved to "stats" section ?
Done.
> > + uint32_t last_frame_mm_time;
> > + uint32_t last_frame_time;
> > +
> > + /* Lag management (see display_handle_stream_data()) */
> > + uint32_t delay;
> > + uint32_t margin_eval_start;
>
> "eval" meaning "evaluation" ?
Yes. It's the start time of the evaluation period for determining the
how low the margin can get and various other parameters.
> > + uint32_t margin_eval_count;
> > + int32_t min_margin;
> > + int32_t min_margin_next;
> > + float avg_margin;
> > + int32_t min_avg_margin;
> > + int32_t max_avg_margin;
> > + uint32_t margin_spread;
>
> I think these fields deserve some more documentation, many seems
> so similar and are not clear. Sure you need them all?
> For instance what the difference between "min_margin" and "min_margin_next"
> and why you need both?
The goal is to get the minimum margin over a long enough period of time
for it to be meaningful, but to also take into account changes.
So min_margin_next is computed over a period of at least 1 second and 20
frames and only then is it moved to min_margin and reset for the next
evaluation period.
min_avg_margin and max_avg_margin are evaluated during the same period
to figure out how much the average margin fluctuates and are used to
compute the margin_spread which is then used.
> Why "avg_margin" is float but "min_avg_margin/max_avg_margin" are not?
> From comment in the code is a rounding issue but than you could change
> avg_margin formulae (see below).
avg_margin is the rolling margin average and it needs to be a float
because, as mentioned below, integer computations don't converge to the
true average. There is no such calculation with {min,max}_avg_margin so
they don't need to be floats. At the end of the margin evaluation period
{min,max}_avg_margin are used to compute the margin_spread over that
evaluation period.
The margin_spread is used as a measure of how much the margin average
can be expected to fluctuate which tells us when we can consider that
the margin matches the target_lag.
> > + frame_interval = 0;
> > + mmtime_margin = 400; /* server's default mm_time offset */
>
> This values, which is used just for the report was simply:
>
> "margin_report = op->multi_media_time - mmtime;"
>
> the computation got quite complicated even if the purpose of this patch
> is not much about changing streaming reports.
I'm not sure I follow. Do you mean the calculations below?
> > if (op->multi_media_time == 0) {
> > - g_critical("Received frame with invalid 0 timestamp! perhaps wrong
> > graphic driver?");
> > - op->multi_media_time = mmtime + 100; /* workaround... */
> > - }
> > + g_critical("Received frame with invalid 0 timestamp! Perhaps wrong
> > graphic driver?");
> > + op->multi_media_time = current_mmtime; /* workaround... */
> > + frame_interval = spice_mmtime_diff(op->multi_media_time,
> > st->last_frame_mm_time);
> > + } else if (st->last_frame_mm_time == 0) {
> > + /* First frame so frame_interval is unknown */
> > + mmtime_margin = spice_mmtime_diff(op->multi_media_time,
> > current_mmtime);
[...]
These deal with the server migrations which cause the mmtime clock to
shift abruptly.
> > + } else if (op->multi_media_time < st->last_frame_mm_time) {
>
> Here spice_mmtime_diff should be used
[...]
> > + } else if (op->multi_media_time > st->last_frame_mm_time + 1000) {
>
> Here too.
Done.
[...]
> > + st->margin_eval_start = stream_time;
> > + st->margin_eval_count = 0;
> > + st->min_margin = 0; /* Force wait before reducing the delay */
>
> I suppose margin is never negative otherwise you would have a minimum
> that is bigger then the real minimum.
That would be ok. What matters is that min_margin <= 0 which means there
is no margin for reducing delay. Eventually we'll get a new minimum
margin value and if that's positive that's when we'll reduce the delay
if necessary.
[...]
> > + /* Note that when using integers the moving average can stabilize up
> > to
> > + * weight/2-1 away from the true average (where weight=16 here) due
> > + * to the rounding errors (e.g. consider 1 16 16...).
> > + * So use a float for avg_margin.
>
> Why not "(st->avg_margin * 15 + margin + 8) / 16" ?
As mentioned in the comment it does not always converge to the right
value and I'm annoyed by it being 7-8 ms off given that we can often do
better (particularly on LANs). Consider:
Current Average
16
1 15
1 14
1 13
1 12
1 11
1 10
1 9
1 9
...
You get the same issue whether you round down, up or to the nearest.
What would work is rounding down whenever the value is lower than the
current average and up otherwise. So:
(st->avg_margin * 15 + margin + (st->avg_margin < margin ? 15 : 0)) / 16
So if floats are a problem I could do that.
[...]
> > + /* Only replace the current min_margin and margin_spread estimates once
> > + * enough data has been collected for the *_next values, both in term
> > + * of elapsed time and number of frames.
> > + */
> > + st->margin_eval_count++;
> > + if (stream_time - st->margin_eval_start > MARGIN_EVAL_TIME &&
> > + st->margin_eval_count >= MARGIN_EVAL_COUNT) {
> > + st->min_margin = st->min_margin_next;
> > +
> > + st->margin_eval_start = stream_time;
> > + st->margin_eval_count = 1;
> > + st->min_margin_next = margin;
> > +
> > + st->margin_spread = (st->max_avg_margin - st->min_avg_margin + 1) / 2;
> > + st->min_avg_margin = st->avg_margin;
> > + st->max_avg_margin = ceilf(st->avg_margin);
>
> Why do you need ceilf here? You are just restarting the min/max computation.
To be truly correct, if avg_margin is 7.9 ms then max_avg_margin should
be 8 ms, not 7 ms. Granted, it won't make a big difference since the min
and max are estimates anyway. If ceilf() is an issue it can be replaced
with "st->avg_margin + 1" or even a basic round down.
[...]
> > + gint32 nudge = 0;
> > + if (st->avg_margin + st->margin_spread < target_lag) {
> > + nudge = MIN((frame_interval + 3) / 4,
> > + target_lag - st->avg_margin);
> > + } else if (st->min_margin > 0 &&
> > + st->avg_margin > target_lag + st->margin_spread) {
> > + nudge = -MIN((frame_interval + 3) / 4,
> > + MIN(st->avg_margin - target_lag,
> > + st->min_margin));
> > + }
> > + if (nudge) {
> > + st->delay += nudge;
> > + frame_time += nudge;
> > +
> > + /* The delay nudge also impacts the margin history */
> > + st->min_margin_next += nudge;
> > + st->min_margin += nudge;
> > + st->avg_margin += nudge;
> > + st->min_avg_margin += nudge;
> > + st->max_avg_margin += nudge;
> > + }
> > }
> > }
>
> I will have to read all these formulaes again but looks like you have
> part of the code which is doing some smooth changes but at some intervals
> you do some bigger adjustments.
> It looks quite complicated and it seems hard to see if it's good or not.
> Not sure if it would be possible to encapsulate all that complexity
> and have some tests.
> What the cases you have in mind to resolv? Just ignoring peaks?
The goal is to have a solid minimum margin value, not one that's based
on the past couple of milliseconds because we just reset to a new
period.
So min_margin is the value we got from the last period. And
min_margin_next is the one we compute for use next. But when we adjust
the delay this impacts all margin values, current and next, hence the
adjustments above.
The same goes for min_avg_margin, etc.
> > @@ -2385,6 +2391,46 @@ int spice_session_get_connection_id(SpiceSession
> > *session)
> > return s->connection_id;
> > }
> >
> > +G_GNUC_INTERNAL
> > +guint32 spice_session_get_client_time(SpiceSession *session, guint32 mmtime)
> > +{
>
> I agree with Snir that a "get" function doing so much stuff is confusing,
> maybe a "spice_session_adjust_to_client_time" or a "spice_session_mmtime2client_time".
I can go with spice_session_mmtime2client_time().
> > + g_return_val_if_fail(SPICE_IS_SESSION(session), g_get_monotonic_time());
> > +
>
> I suppose here should be "g_get_monotonic_time() / 1000", that is next "now"
> value.
Right.
>
> > + SpiceSessionPrivate *s = session->priv;
> > +
> > + guint32 now = g_get_monotonic_time() / 1000;
> > + gint64 new_offset = now - mmtime;
>
> This should be an unsigned 32 bit, all arithmetic are modulo 2**32.
Right, that should work. Done.
> > + if (new_offset < s->client_time_offset - 1000 ||
> > + new_offset > s->client_time_offset + 1000) {
> > + /* The change in offset was likely caused by a server migration.
> > + * Reset the time offset.
> > + */
>
> Was this tested? Is there no other event to specifically detect migration?
I added code to simulate a migration based on the
display_session_mm_time_reset_cb() comment in channel-display.c. But I
have no way of actually testing it.
> We (the client) needs to reconnect during migration so surely there's
> another event during this change.
The problem as I understand it is that the playback channel may be
migrated and start receiving new mmtime timestamps before the old video
stream has been shut down. This can result in a discrepency in the
time base and if there is a clean way to detect and deal with it that
comment does not say how.
> > + s->client_time_offset = new_offset;
> > + return now;
> > + }
> > + if (new_offset < s->client_time_offset) {
> > + /* The previous message we used to compute the offset was probably
> > + * delayed resulting in a too large offset. Eventually the offset
> > + * should settle to the true clock offset plus the network latency,
> > + * excluding the network jitter.
> > + */
>
> A bit OT: maybe would be great to have the "client time" when we start receiving
> the message from the server instead of taking the time after the full message
> arrives. Should be more consistent with server mmtime changes (server timestamp
> messages before sending).
I don't think that's possible without having lower layers know way too
much for their own good.
--
Francois Gouget <fgouget at codeweavers.com>
More information about the Spice-devel
mailing list