[Spice-devel] [client v2 07/12] channel-display: Minimize the stream lag by ignoring the server one

Tue Jun 25 23:19:26 UTC 2019

On Thu, 20 Jun 2019, Frediano Ziglio wrote:
[...]
> > - It performs its own frame mmtime conversion to the local monotonic
> >   clock spice_session_get_client_time() since the server-controlled
> >   mmtime clock cannot be relied on. This conversion uses data from all
> 
> The "since the server-controlled mmtime clock cannot be relied on" could
> be confusing. In a relative, not absolute way is reliable.
> But I suppose what you are meaning here is that you rely on the 
> mmtimes from the frames, not the "server_time".

The client's mmtime clock depends on mm_time_offset which the server may 
yank one way or the other at any time. That makes it unreliable for the 
purpose of letting the _client_ be in control of when it's going to 
display each frame.

> >   streams but is global so all streams ihave a common time reference.
> 
> typo: "ihave"

Fixed.

> > - This is also where mmtime clock changes caused by server migration
> >   are handled.
> 
> "To do so this is also where mmtime clock changes caused by server migration
> are handled." ?

Reworded.

> > - It tracks the margin between the converted time-to-display frame
> >   timestamp and the curren time and adds a delay to ensure this margin
> 
> typo: "curren"

Fixed.

> >              record(frames_stats,
> > -                   "frame mm_time %u size %u creation time %" PRId64
> > +                   "frame time %u size %u creation time %" PRId64
> >                     " decoded time %" PRId64 " queue %u before %u",
> > -                   frame->mm_time, frame->size, frame->creation_time,
> > duration,
> > +                   frame->time, frame->size, frame->creation_time, duration,
> 
> Here we really need the original server mmtime, not hard to fix this,
> just add again the mm_time with the initial server value but use
> just "time" field for everything else

Fixed.

> > @@ -124,8 +124,22 @@ struct display_stream {
> >  
> >      SpiceChannel                *channel;
> >  
> > -    /* stats */
> >      uint32_t             first_frame_mm_time;
> 
> should this field be moved to "stats" section ?

Done.

> > +    uint32_t             last_frame_mm_time;
> > +    uint32_t             last_frame_time;
> > +
> > +    /* Lag management (see display_handle_stream_data()) */
> > +    uint32_t             delay;
> > +    uint32_t             margin_eval_start;
> 
> "eval" meaning "evaluation" ?

Yes. It's the start time of the evaluation period for determining the 
how low the margin can get and various other parameters.

> > +    uint32_t             margin_eval_count;
> > +    int32_t              min_margin;
> > +    int32_t              min_margin_next;
> > +    float                avg_margin;
> > +    int32_t              min_avg_margin;
> > +    int32_t              max_avg_margin;
> > +    uint32_t             margin_spread;
> 
> I think these fields deserve some more documentation, many seems
> so similar and are not clear. Sure you need them all?
> For instance what the difference between "min_margin" and "min_margin_next"
> and why you need both?

The goal is to get the minimum margin over a long enough period of time 
for it to be meaningful, but to also take into account changes.

So min_margin_next is computed over a period of at least 1 second and 20 
frames and only then is it moved to min_margin and reset for the next 
evaluation period.

min_avg_margin and max_avg_margin are evaluated during the same period 
to figure out how much the average margin fluctuates and are used to 
compute the margin_spread which is then used.

> Why "avg_margin" is float but "min_avg_margin/max_avg_margin" are not?
> From comment in the code is a rounding issue but than you could change
> avg_margin formulae (see below).

avg_margin is the rolling margin average and it needs to be a float 
because, as mentioned below, integer computations don't converge to the 
true average. There is no such calculation with {min,max}_avg_margin so 
they don't need to be floats. At the end of the margin evaluation period 
{min,max}_avg_margin are used to compute the margin_spread over that 
evaluation period.

The margin_spread is used as a measure of how much the margin average 
can be expected to fluctuate which tells us when we can consider that 
the margin matches the target_lag.

> > +    frame_interval = 0;
> > +    mmtime_margin = 400; /* server's default mm_time offset */
> 
> This values, which is used just for the report was simply:
> 
> "margin_report = op->multi_media_time - mmtime;"
> 
> the computation got quite complicated even if the purpose of this patch
> is not much about changing streaming reports.

I'm not sure I follow. Do you mean the calculations below?

> >      if (op->multi_media_time == 0) {
> > -        g_critical("Received frame with invalid 0 timestamp! perhaps wrong
> > graphic driver?");
> > -        op->multi_media_time = mmtime + 100; /* workaround... */
> > -    }
> > +        g_critical("Received frame with invalid 0 timestamp! Perhaps wrong
> > graphic driver?");
> > +        op->multi_media_time = current_mmtime; /* workaround... */
> > +        frame_interval = spice_mmtime_diff(op->multi_media_time,
> > st->last_frame_mm_time);
> > +    } else if (st->last_frame_mm_time == 0) {
> > +        /* First frame so frame_interval is unknown */
> > +        mmtime_margin = spice_mmtime_diff(op->multi_media_time,
> > current_mmtime);
[...]

These deal with the server migrations which cause the mmtime clock to 
shift abruptly.

> > +    } else if (op->multi_media_time < st->last_frame_mm_time) {
> 
> Here spice_mmtime_diff should be used
[...]
> > +    } else if (op->multi_media_time > st->last_frame_mm_time + 1000) {
> 
> Here too.

Done.

[...]
> > +        st->margin_eval_start = stream_time;
> > +        st->margin_eval_count = 0;
> > +        st->min_margin = 0; /* Force wait before reducing the delay */
> 
> I suppose margin is never negative otherwise you would have a minimum
> that is bigger then the real minimum.

That would be ok. What matters is that min_margin <= 0 which means there 
is no margin for reducing delay. Eventually we'll get a new minimum 
margin value and if that's positive that's when we'll reduce the delay 
if necessary.

[...]
> > +        /* Note that when using integers the moving average can stabilize up
> > to
> > +         * weight/2-1 away from the true average (where weight=16 here) due
> > +         * to the rounding errors (e.g. consider 1 16 16...).
> > +         * So use a float for avg_margin.
> 
> Why not "(st->avg_margin * 15 + margin + 8) / 16" ?

As mentioned in the comment it does not always converge to the right 
value and I'm annoyed by it being 7-8 ms off given that we can often do 
better (particularly on LANs). Consider:

Current  Average
         16
1        15
1        14
1        13
1        12
1        11
1        10
1        9
1        9
...

You get the same issue whether you round down, up or to the nearest.
What would work is rounding down whenever the value is lower than the 
current average and up otherwise. So:

  (st->avg_margin * 15 + margin + (st->avg_margin < margin ? 15 : 0)) / 16

So if floats are a problem I could do that.

[...]
> > +        /* Only replace the current min_margin and margin_spread estimates once
> > +         * enough data has been collected for the *_next values, both in term
> > +         * of elapsed time and number of frames.
> > +         */
> > +        st->margin_eval_count++;
> > +        if (stream_time - st->margin_eval_start > MARGIN_EVAL_TIME &&
> > +            st->margin_eval_count >= MARGIN_EVAL_COUNT) {
> > +            st->min_margin = st->min_margin_next;
> > +
> > +            st->margin_eval_start = stream_time;
> > +            st->margin_eval_count = 1;
> > +            st->min_margin_next = margin;
> > +
> > +            st->margin_spread = (st->max_avg_margin - st->min_avg_margin + 1) / 2;
> > +            st->min_avg_margin = st->avg_margin;
> > +            st->max_avg_margin = ceilf(st->avg_margin);
> 
> Why do you need ceilf here? You are just restarting the min/max computation.

To be truly correct, if avg_margin is 7.9 ms then max_avg_margin should 
be 8 ms, not 7 ms. Granted, it won't make a big difference since the min 
and max are estimates anyway. If ceilf() is an issue it can be replaced 
with "st->avg_margin + 1" or even a basic round down.

[...]
> > +            gint32 nudge = 0;
> > +            if (st->avg_margin + st->margin_spread < target_lag) {
> > +                nudge = MIN((frame_interval + 3) / 4,
> > +                            target_lag - st->avg_margin);
> > +            } else if (st->min_margin > 0 &&
> > +                       st->avg_margin  > target_lag + st->margin_spread) {
> > +                nudge = -MIN((frame_interval + 3) / 4,
> > +                             MIN(st->avg_margin - target_lag,
> > +                                 st->min_margin));
> > +            }
> > +            if (nudge) {
> > +                st->delay += nudge;
> > +                frame_time += nudge;
> > +
> > +                /* The delay nudge also impacts the margin history */
> > +                st->min_margin_next += nudge;
> > +                st->min_margin += nudge;
> > +                st->avg_margin += nudge;
> > +                st->min_avg_margin += nudge;
> > +                st->max_avg_margin += nudge;
> > +            }
> >          }
> >      }
> 
> I will have to read all these formulaes again but looks like you have
> part of the code which is doing some smooth changes but at some intervals
> you do some bigger adjustments.
> It looks quite complicated and it seems hard to see if it's good or not.
> Not sure if it would be possible to encapsulate all that complexity 
> and have some tests.
> What the cases you have in mind to resolv? Just ignoring peaks?

The goal is to have a solid minimum margin value, not one that's based 
on the past couple of milliseconds because we just reset to a new 
period.

So min_margin is the value we got from the last period. And 
min_margin_next is the one we compute for use next. But when we adjust 
the delay this impacts all margin values, current and next, hence the 
adjustments above.

The same goes for min_avg_margin, etc.

> > @@ -2385,6 +2391,46 @@ int spice_session_get_connection_id(SpiceSession
> > *session)
> >      return s->connection_id;
> >  }
> >  
> > +G_GNUC_INTERNAL
> > +guint32 spice_session_get_client_time(SpiceSession *session, guint32 mmtime)
> > +{
> 
> I agree with Snir that a "get" function doing so much stuff is confusing,
> maybe a "spice_session_adjust_to_client_time" or a "spice_session_mmtime2client_time".

I can go with spice_session_mmtime2client_time().

> > +    g_return_val_if_fail(SPICE_IS_SESSION(session), g_get_monotonic_time());
> > +
>
> I suppose here should be "g_get_monotonic_time() / 1000", that is next "now"
> value.

Right.

> 
> > +    SpiceSessionPrivate *s = session->priv;
> > +
> > +    guint32 now = g_get_monotonic_time() / 1000;
> > +    gint64 new_offset = now - mmtime;
> 
> This should be an unsigned 32 bit, all arithmetic are modulo 2**32.

Right, that should work. Done.

> > +    if (new_offset < s->client_time_offset - 1000 ||
> > +        new_offset > s->client_time_offset + 1000) {
> > +        /* The change in offset was likely caused by a server migration.
> > +         * Reset the time offset.
> > +         */
> 
> Was this tested? Is there no other event to specifically detect migration?

I added code to simulate a migration based on the 
display_session_mm_time_reset_cb() comment in channel-display.c. But I 
have no way of actually testing it.

> We (the client) needs to reconnect during migration so surely there's
> another event during this change.

The problem as I understand it is that the playback channel may be 
migrated and start receiving new mmtime timestamps before the old video 
stream has been shut down. This can result in a discrepency in the 
time base and if there is a clean way to detect and deal with it that 
comment does not say how.

> > +        s->client_time_offset = new_offset;
> > +        return now;
> > +    }
> > +    if (new_offset < s->client_time_offset) {
> > +        /* The previous message we used to compute the offset was probably
> > +         * delayed resulting in a too large offset. Eventually the offset
> > +         * should settle to the true clock offset plus the network latency,
> > +         * excluding the network jitter.
> > +         */
> 
> A bit OT: maybe would be great to have the "client time" when we start receiving
> the message from the server instead of taking the time after the full message
> arrives. Should be more consistent with server mmtime changes (server timestamp
> messages before sending).

I don't think that's possible without having lower layers know way too 
much for their own good.

-- 
Francois Gouget <fgouget at codeweavers.com>