[Spice-devel] Remote 3d support
Frediano Ziglio
fziglio at redhat.com
Fri Jul 8 15:42:18 UTC 2016
Hi,
some news on the patch and tests.
The patch is still more or less as I send it last time
(https://lists.freedesktop.org/archives/spice-devel/2016-July/030662.html).
So the a bit of history.
Time ago I started a branch with the idea to fed frames from Virgl to
the old drawing path to see what would happen. Many reason to do this,
one is to exercise the streaming path for this and also see with the
refactory work this could be done easier.
The intention wasn't a final patch for this (extracting texture is
surely not a good idea if it can be avoided and is not clear if doing
this long trip is the good way or if there are shorter path for instance
injecting directly into streaming code).
The branch got stuck for a while (kind of a month or two) as just
extracting the raw frame was not as easy (and got lost in different
stuff). By the way when I got back time later I found a way using DRM
directly and was easy to insert the frames. Beside some memory issues
(fixed) and some frame flipping (worked around) was working!
Locally is working very well, surprisingly all is smooth and fast
(I run everything in a laptop machine with an Intel card).
Obviously once is more or less working you try to get a bit harder
and more real world setup so... playing games with even some network
restriction (after some thinking I think this is one of the worst
cases you can imagine that is if this works fine you are not far from
a release!).
Here of course problems started.
Simulation
To simulate some more real network case I used a program which
"slow down sockets" forwarding data (I used Linux traffic shaping but
this cause some problems). I knew this is not optimal (for instance
queues and rtt detection from program are quite impossible) so I
decided to use tun/tap (I tried to avoid having to use root to do such
tests) and the final version (https://cgit.freedesktop.org/~fziglio/latency)
is working really well (I just did some more tuning on CPU scheduling
and the program is using just 2/3% of CPU so should not change tests
that much).
Latency
One of the first issue of introducing a real network in the path was
latency. Especially playing you can feel a very long lag (kind of
seconds even if the stream is quite fast). At the end I'm using xterm
and wireshark to measure the delay. The reason is that xterm cursor does
not blink and does very few screen operations so in wireshark you
can see a single DRAW_COPY operation and as this change is quite small
you can also feel the delay without using wireshark. This test is quite
reliable and the simulator behave very fine (also a real network).
I usually use h264 for encoding. Using normal stream configuration
the lag is much lower (also the video quality) but even if the video
is fluid the delay is higher than xterm. I put some debugging on the
frames trying to introduce delays for encoding and extraction and
usualy a frame is processed in 5 ms (since Qemu call) so I don't
understand where the lag came. Could be some options of the encoders,
the encoding buffer is too large (the network one isn't) or some problems
with gstreamer interaction (server/gstreamer-encoder.c file).
Trying to use vaapi the lag is getting much worse, even combined with very
large bandwidth, however the behaviour of gstreamer vaapi is quite different
and the options are also much different. Maybe there are options to
improve compression/delay, maybe some detail in the plugin introduce
other delays. For sure the vaapi h264 has bitrate which cannot be changed
dynamically so this could be an issue. The results is that quality is
much better but frame rate and delay is terrible. Also while using x264
encoder (software one) the network queue (you can see using netstat)
is quite low (kind of 20-80kb) with low bandwidth while with vaapi
is always too high (kind of 1-3mb) which obviously do not help with
latency.
Bandwidth
Obviously an high bandwidth helps. But I can say that x264 encoder
do quite a good job when the bandwidth is not enough. On the opposite
it get quite some time (kind of 10-20 minutes) to understand that
bandwidth got better. vaapi was mainly not working.
Sometimes using a real wifi connection (with a cheap and old router)
you can see bandwidth get down for a while, probably some packet
lost and retransmission kick in).
CPU usage
Running all in a single machine without helping in encoding decoding
made this problem quite difficult you end up using all CPU power and
even more turning kernel schedule in the equation. Sometimes I try
using another machine as client so I can see more clearly where the CPU
is used to support a virtual machine.
Qemu
There is still an hack to support listening to tcp instead of unix sockets,
will be changed with spice-server changes.
Turns out that for every frame a monitor_config is sent. Due to the
implementation of spice-server this is not helping improving the latency.
I merge my cork branch and did some changes in spice-server and you can
get some good improvement.
Got a patch from Marc-andre for remove a timer which is causing lot
of cpu usage on RedWorker, still to try.
The VM with Virgl is not powering off, didn't investigate.
In the end lot of small issues and stuff to investigate, I don't have
a clear idea on how to progress. My last though is avoid vaapi for
a while and fix some small issues (like monitor_config and trying to
understand additional lag when stream is using). vaapi state and
gstreamer to full implement offloading of the encoding has too
variables (our gstreamer code, options, pipeline to use, code
stability, card support).
gstreamer and texture data extraction (a fallback we should have)
seems to work better with GL stuff so possibly having Qemu communicate
some EGL setup will be required (that is ABI change between Qemu and
spice-server).
Maybe EGL extraction, data extraction lazyness (to avoid expensive
data copy if frames are dropped) could be a possible step
stable enough to have some code merged.
Frediano
More information about the Spice-devel
mailing list