[Spice-devel] Remote 3d support - streaming and lag

Frediano Ziglio fziglio at redhat.com
Thu Feb 9 14:30:15 UTC 2017


Seems weird to reply to my mail after more than 6 months but apparently
the content is still worth.

There is a concept about streaming that people seems to ignore, forget
or not understand. Why the streaming was added and why it lags!

If you are at home and if you are watching your Netflix/Amazon prime or
whatever service you have you want to have good quality and no network
issues like the movie lagging or showing buffering issues.
The solution is simple, you increase and monitor the buffer, basically
you try to have, say, 10 seconds, in the buffer so even if the network
completely stop working for 8/9 seconds is not an issue and family and
children are happy!
Good and simple! However... they know the future! Yes, basically they
are sending you 10 seconds of movie before knowing that you are going
to see these 10 seconds. Easy to predict: usually you continue to
watch the movie!

We introduced the streaming code, beside for compression, for streaming
purposes like the above. But... how to send the future? Simple! We can't!
So how do we "create" this future, this buffering? We lie! We just tell
the client to wait a bit for display creating a sort of buffer but
basically showing a recent past!

Back to code. How is it implemented in spice-server/client ?
Basically spice-server send a multimedia time which is a bit less than
the frame ones. Say we start with mm_time (multimedia time) == 0 and we
sent to the client a time of -100 and the current frame as having 0 as time.
The client will think that the frame is in the future (as -100 < 0.. by
the way, the multimedia time is expressed in milliseconds), wait for the
difference (in this case 100 ms) and then display it.
The 100 is actually in the code mm_time_latency and it's minimal value
is MM_TIME_DELTA currently defined as 400. In practice every streams
will have a minimum of 400ms delay. Compared to the 10 seconds buffering
of the above case is really small but if potentially I'm drawing something
with the mouse and streaming is detected the lag will make my drawing
attempt really bad (OT: by the way I'm really bad at drawing so the result
won't be much worse).

Is it a bad solution? Beside some implementation problems it's just a
problem of use cases. Probably we are going to use streaming for
really different use cases.

How is updated this mm_time_latency? This is another complicated
subject!

Frediano

> 
> Some updates.
> 
> Yesterday I found a big cause of part of the lag. The client and
> multimedia synchronization. After some video playing/game running pressing
> Ctrl-Z to suspend Qemu you can see the client still playing for a while.
> I checked my software to reduce bandwidth and was working correctly not
> sending
> any more data after the set latency. But the client continued to play for
> couple
> of seconds! This could be good if we are just watching a movie but as soon as
> we
> get more interactive and want to have some feedback 2 seconds make working
> impossible.
> So I changed the code of the client to remove any delay to try to sync and
> I get this https://www.youtube.com/watch?v=D_DCs2sriu0. Quite good
> (unfortunately
> there is no audio, this was quite out of sync).
> Seems that the latency/bandwidth computation is not able to handle well the
> current queued data causing the bandwidth detected to be reduced a lot (so
> video
> quality decrease a lot) while the latency computed is so high that the client
> use this big delay (I got some experiment were the lag was much more than 2
> seconds!).
> To make the video so good I had to force the bitrate in our gstreamer code.
> Also the compressed frame size of this game are quite low.
> 
> About VAAPI, gstreamer and our code. It looks like our code is not able to
> reduce
> the bitrate used by the encoder (I'm actually using H264 and Intel
> implementation
> of vaapi). The result is that in some cases the frame rate is reduced to 3/4
> fps.
> I tried lot of parameters (like cabac and dct8x8) but had no luck. Sometimes
> our code seems to deadlock (I had some chat with Francois some day ago and
> could
> be due to the way buffers are produced by the encoder). Setting a different
> rate-control for vaapih264enc seems to cause our code to fail (other rate
> control
> settings should behave much better for limiting the bit rate).
> 
> Frediano
> 
> > 
> > Hi,
> >   some news on the patch and tests.
> > 
> > The patch is still more or less as I send it last time
> > (https://lists.freedesktop.org/archives/spice-devel/2016-July/030662.html).
> > 
> > So the a bit of history.
> > Time ago I started a branch with the idea to fed frames from Virgl to
> > the old drawing path to see what would happen. Many reason to do this,
> > one is to exercise the streaming path for this and also see with the
> > refactory work this could be done easier.
> > The intention wasn't a final patch for this (extracting texture is
> > surely not a good idea if it can be avoided and is not clear if doing
> > this long trip is the good way or if there are shorter path for instance
> > injecting directly into streaming code).
> > The branch got stuck for a while (kind of a month or two) as just
> > extracting the raw frame was not as easy (and got lost in different
> > stuff). By the way when I got back time later I found a way using DRM
> > directly and was easy to insert the frames. Beside some memory issues
> > (fixed) and some frame flipping (worked around) was working!
> > Locally is working very well, surprisingly all is smooth and fast
> > (I run everything in a laptop machine with an Intel card).
> > Obviously once is more or less working you try to get a bit harder
> > and more real world setup so... playing games with even some network
> > restriction (after some thinking I think this is one of the worst
> > cases you can imagine that is if this works fine you are not far from
> > a release!).
> > 
> > Here of course problems started.
> > 
> > Simulation
> > To simulate some more real network case I used a program which
> > "slow down sockets" forwarding data (I used Linux traffic shaping but
> > this cause some problems). I knew this is not optimal (for instance
> > queues and rtt detection from program are quite impossible) so I
> > decided to use tun/tap (I tried to avoid having to use root to do such
> > tests) and the final version
> > (https://cgit.freedesktop.org/~fziglio/latency)
> > is working really well (I just did some more tuning on CPU scheduling
> > and the program is using just 2/3% of CPU so should not change tests
> > that much).
> > 
> > Latency
> > One of the first issue of introducing a real network in the path was
> > latency. Especially playing you can feel a very long lag (kind of
> > seconds even if the stream is quite fast). At the end I'm using xterm
> > and wireshark to measure the delay. The reason is that xterm cursor does
> > not blink and does very few screen operations so in wireshark you
> > can see a single DRAW_COPY operation and as this change is quite small
> > you can also feel the delay without using wireshark. This test is quite
> > reliable and the simulator behave very fine (also a real network).
> > I usually use h264 for encoding. Using normal stream configuration
> > the lag is much lower (also the video quality) but even if the video
> > is fluid the delay is higher than xterm. I put some debugging on the
> > frames trying to introduce delays for encoding and extraction and
> > usualy a frame is processed in 5 ms (since Qemu call) so I don't
> > understand where the lag came. Could be some options of the encoders,
> > the encoding buffer is too large (the network one isn't) or some problems
> > with gstreamer interaction (server/gstreamer-encoder.c file).
> > Trying to use vaapi the lag is getting much worse, even combined with very
> > large bandwidth, however the behaviour of gstreamer vaapi is quite
> > different
> > and the options are also much different. Maybe there are options to
> > improve compression/delay, maybe some detail in the plugin introduce
> > other delays. For sure the vaapi h264 has bitrate which cannot be changed
> > dynamically so this could be an issue. The results is that quality is
> > much better but frame rate and delay is terrible. Also while using x264
> > encoder (software one) the network queue (you can see using netstat)
> > is quite low (kind of 20-80kb) with low bandwidth while with vaapi
> > is always too high (kind of 1-3mb) which obviously do not help with
> > latency.
> > 
> > Bandwidth
> > Obviously an high bandwidth helps. But I can say that x264 encoder
> > do quite a good job when the bandwidth is not enough. On the opposite
> > it get quite some time (kind of 10-20 minutes) to understand that
> > bandwidth got better. vaapi was mainly not working.
> > Sometimes using a real wifi connection (with a cheap and old router)
> > you can see bandwidth get down for a while, probably some packet
> > lost and retransmission kick in).
> > 
> > CPU usage
> > Running all in a single machine without helping in encoding decoding
> > made this problem quite difficult you end up using all CPU power and
> > even more turning kernel schedule in the equation. Sometimes I try
> > using another machine as client so I can see more clearly where the CPU
> > is used to support a virtual machine.
> > 
> > Qemu
> > There is still an hack to support listening to tcp instead of unix sockets,
> > will be changed with spice-server changes.
> > Turns out that for every frame a monitor_config is sent. Due to the
> > implementation of spice-server this is not helping improving the latency.
> > I merge my cork branch and did some changes in spice-server and you can
> > get some good improvement.
> > Got a patch from Marc-andre for remove a timer which is causing lot
> > of cpu usage on RedWorker, still to try.
> > The VM with Virgl is not powering off, didn't investigate.
> > 
> > 
> > In the end lot of small issues and stuff to investigate, I don't have
> > a clear idea on how to progress. My last though is avoid vaapi for
> > a while and fix some small issues (like monitor_config and trying to
> > understand additional lag when stream is using). vaapi state and
> > gstreamer to full implement offloading of the encoding has too
> > variables (our gstreamer code, options, pipeline to use, code
> > stability, card support).
> > gstreamer and texture data extraction (a fallback we should have)
> > seems to work better with GL stuff so possibly having Qemu communicate
> > some EGL setup will be required (that is ABI change between Qemu and
> > spice-server).
> > Maybe EGL extraction, data extraction lazyness (to avoid expensive
> > data copy if frames are dropped) could be a possible step
> > stable enough to have some code merged.
> > 


More information about the Spice-devel mailing list