High delay of video-streams
Frediano Ziglio
freddy77 at gmail.com
Sat Jun 1 16:14:14 UTC 2024
Il giorno lun 27 mag 2024 alle ore 16:19 Victor Toso
<victortoso at redhat.com> ha scritto:
>
> Hi,
>
> On Tue, Apr 16, 2024 at 12:59:50PM GMT, Michael Scherle wrote:
> > Hello,
> >
> > Thanks for your changesets, they definitely reduce the delay significantly
> > (to a similar level as our provosoric fixes, but yours are much cleaner).
> >
> > On the client side (spice-gtk) I looked at the problem with the high
> > decoding time (2 frames buffering) and was able to find a simple fix with
> > the help of the gstreamer community:
> >
> > ---
> > src/channel-display-priv.h | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/src/channel-display-priv.h b/src/channel-display-priv.h
> > index 1a7590a..a2af1a7 100644
> > --- a/src/channel-display-priv.h
> > +++ b/src/channel-display-priv.h
> > @@ -177,7 +177,7 @@ static const struct {
> > * (hardcoded in spice-server), let's add it here to avoid the warning.
> > */
> > { SPICE_DISPLAY_CAP_CODEC_H264, "h264",
> > - "h264parse ! avdec_h264", "video/x-h264,stream-format=byte-stream" },
> > + "h264parse ! avdec_h264",
> > "video/x-h264,stream-format=byte-stream,alignment=au" },
> >
> > /* SPICE_VIDEO_CODEC_TYPE_VP9 */
> > { SPICE_DISPLAY_CAP_CODEC_VP9, "vp9",
> > @@ -185,7 +185,7 @@ static const struct {
> >
> > /* SPICE_DISPLAY_CAP_CODEC_H265 */
> > { SPICE_DISPLAY_CAP_CODEC_H265, "h265",
> > - "h265parse ! avdec_h265", "video/x-h265,stream-format=byte-stream" },
> > + "h265parse ! avdec_h265",
> > "video/x-h265,stream-format=byte-stream,alignment=au" },
>
> jfyi, this was discussed in the past. It depends how spice server
> was configured too, no? I'm not sure, it has been awhile. What I
> mean is, what/who is doing h264 encoding. We had a
> spice-streaming-agent that wrapped guest's GPU h264 encoding and
> sent to the client, with the same protocol.... depending how it
> is configured, the stream-format was important I think. Again,
> not 100% sure.
>
I think we used the same format. I also remember that we sent an
additional NAL unit to force the "flush" so I think it's very similar,
the stream render waits the next SPICE packet as not recognizing the
frame to have ended.
> >
> > };
> >
> > --
> > 2.40.1
> >
> > However, this change should probably still be tested on different setups.
> > Since I don't know whether they are always au aligned, I should probably
> > find out about that.
> >
> > Also I have made other experiments, such as removing decoding_queue in
> > channel-display-gst.c and adding the SpiceGstFrame to the metadata of the
> > gstBuffer instead, as well as completely ignoring the display time of a
> > frame and instead displaying them immediately. With that i got down to
> > 60-80ms delay.
>
> If you send patches about this one, feel free to tag me. This
> looks cool.
>
> > Do you know if your changes or similar ones that reduce the
> > delay will go upstream soon?
> >
> > While looking through the source code, I found
> > SPICE_KEYPRESS_DELAY, which is not mentioned anywhere. Is this
> > the only use to save some network traffic? Is there any reason
> > not to always set this to 0 in today's network environments?
> > (And maybe set the default to 0?)
>
> Introduced in c03e002152dc0c, commit log says:
>
> > widget: add keypress-delay property
> >
> > The delay before the press event is sent to the server if the key is
> > kept pressed. If the key is released within that time, that delay is
> > ignored and a single key-press-release event will be sent.
>
> Introduced in 2012. I'm pretty sure there were reasons for it.
> Not sure if worth to remove it.
>
Not much indication on why it was introduced. Beside reducing the
network packets (but not much the traffic, display traffic is way
bigger) I would suppose wonky networks. Suppose the network has quite
some weird latency patterns and you type (so press and release) "A"
key. You send a press request and a release request. But the server
receives the release after a while (say 1 second or more for
instance). This could trigger key repetition in the guest causing a
"AAA" (for instance) to be typed. Typing normally 100ms is enough to
release the key so even on wonky networks you won't hit key
repetitions due to network delays. But that's a theory. Surely if you
want to play a game this delay is not helping :-)
> Cheers,
> Victor
>
> > Michael
> >
Frediano
> > On 03.04.24 21:22, Frediano Ziglio wrote:
> > > Frediano
> > >
> > > Il giorno mar 2 apr 2024 alle ore 15:27 Michael Scherle
> > > <michael.scherle at rz.uni-freiburg.de> ha scritto:
> > > >
> > > > Hi Frediano,
> > > >
> > > > thank you very much for your detailed answer.
> > > >
> > > >
> > > > On 02.04.24 14:13, Frediano Ziglio wrote:
> > > >
> > > > > Really short explanation: Lipsync.
> > > > >
> > > > > Less cryptic explanation: video streaming was added much time ago when
> > > > > desktops used 2D graphic drawings, like lines, fillings, strings and
> > > > > so on. At that time networks were more unreliable, latency bigger, and
> > > > > with high probability a continuous bitblt on the same big area was a
> > > > > video playing. So the idea of detecting the video playing and
> > > > > optimizing to sync audio and video was a good idea.
> > > >
> > > > ok this explains a lot.
> > > >
> > > > > Now starts my opinionated ideas. The idea of continuous bitblt being
> > > > > only a video stream is wrong, nowadays desktops do use large bitblt
> > > > > for everything, or better they use 3D cards a lot and compose the
> > > > > various windows on the screen which appears to us as just bitblt,
> > > > > often contiguous. So the delay should just be removed optimizing for
> > > > > real time video streaming. As you realize the algorithm also keeps
> > > > > increasing the delay for every glitch found which is not improving the
> > > > > user experience. I have different changesets removing entirely all
> > > > > these delays (it's possible to get this just by changing the server
> > > > > part), the result is much less delay, the audio/video sync (watching a
> > > > > movie) is, with nowadays networks, acceptable.
> > > >
> > > >
> > > > Would it be possible to get your changesets, so that I could try them
> > > > out? I would be interested to know how this can be implemented with only
> > > > server-side changes. A dirty idea I had (and tried) would be to set the
> > > > mm_time to the past so that the client displays the image immediately,
> > > > but that would not be a good fix in my opinion.
> > > >
> > >
> > > That's the commit
> > > https://cgit.freedesktop.org/~fziglio/spice-server/commit/?h=nvidia&id=eaaec7be80a9d402f425f7571bb27a082ebf739a.
> > >
> > > > I would rather consider it reasonable that the server timestamps the
> > > > frames (and perhaps the sound) with the encoding time and that the
> > > > client itself calculates when it wants to display them (from the diffs).
> > > > So the client could decide if it wants to display the images directly or
> > > > add some delay to compensate for network jitter (or lipsync) or maybe
> > > > even implement something like v-sync. These would of course be breaking
> > > > changes that would require changes to the client and server and would
> > > > make them incompatible with older versions. If this could not be done
> > > > directly, due to compatibility reasons, maybe this could be implemented
> > > > in a separate low latency mode or something like that (which both server
> > > > and client needs to support).
> > > >
> > >
> > > I suppose the negative time you though is something like
> > > https://cgit.freedesktop.org/~fziglio/spice-server/commit/?h=nvidia&id=4a1a2a20505bc453f30573a0d453a9dfa1d97e7c
> > > (which improve the previous).
> > >
> > > > Even with above ideas applied, for spice-gtk, I have noticed a high
> > > > decode delay. The gstreamer pipeline always seems to keep at least 2
> > > > frames in the pipeline (regardless of the frame rate) which increases
> > > > the delay further. Have you also noticed this? I'm currently looking
> > > > into the reason for this.
> > > >
> > > > When testing stuff out we saw that Sunshine/Moonlight performed very
> > > > well in generating a low delay and high QoE. That is kind of our
> > > > benchmark for remote access to strive for :)
> > > >
> > > > Greetings
> > > > Michael
> > > >
> > >
> > > Frediano
> > >
> > > > > >
> > > > > > On 15.03.24 14:08, Michael Scherle wrote:
> > > > > > > Hello spice developers,
> > > > > > >
> > > > > > > we are trying to develop an Open Source virtual desktop infrastructure
> > > > > > > to be deployed at multiple German universities as described, by my
> > > > > > > colleagues, in the paper which I have put in the attachment. The
> > > > > > > solution based on openstack, qemu, spice... Our plan is also to have VM
> > > > > > > instances with virtual GPUs (SR-IOV). Due to the resulting requirements,
> > > > > > > it is necessary to transmit the image data as a video stream.
> > > > > > > We have seen Vivek Kasireddy recent work on spice which solves exactly
> > > > > > > this problem. However, when we tested it, we noticed a very high input
> > > > > > > to display delay (400 ms+ but only if the image data is transferred as
> > > > > > > video-stream). However, the problem seems to be a more general spice
> > > > > > > problem or is there something wrong with our setup or are there special
> > > > > > > parameters that we are missing?
> > > > > > >
> > > > > > > Our setup:
> > > > > > >
> > > > > > > QEMU: https://gitlab.freedesktop.org/Vivek/qemu/-/commits/spice_gl_on_v2
> > > > > > > Spice:
> > > > > > > https://gitlab.freedesktop.org/Vivek/spice/-/commits/encode_dmabuf_v6
> > > > > > > virt-viewer
> > > > > > > Intel HW decoder/encoder (but same with sw)
> > > > > > >
> > > > > > > I have looked into what is causing the delay and have noticed that
> > > > > > > encoding only takes about 3-4ms. In general, the image seems to reach
> > > > > > > the client in less than 15ms.
> > > > > > > The main problem seems to be that gstreamer gets a very high
> > > > > > > margin(https://gitlab.freedesktop.org/spice/spice-gtk/-/blob/master/src/channel-display.c?ref_type=heads#L1773) and therefore waits a long time before starting decoding. And the reason for the high margin seems to be the bad mm_time_offset https://gitlab.freedesktop.org/spice/spice-gtk/-/blob/master/src/spice-session.c?ref_type=heads#L2418 which is used to offset the server time to the client time (with some margin). And this variable is set by the spice server to initially 400 ms https://gitlab.freedesktop.org/spice/spice/-/blob/master/server/reds.cpp?ref_type=heads#L3062 and gets updated with the latency https://gitlab.freedesktop.org/spice/spice/-/blob/master/server/reds.cpp?ref_type=heads#L2614 (but only increased). I still need to see how this latency is calculated.
> > > > > > >
> > > > > > > Am I missing something or is this design not intended for transmitting
> > > > > > > interactive content via video stream?
> > > > > > > Temporarily overwriting the margin and tweaking parameter settings on
> > > > > > > the msdkh264dec brought the delay to about 80-100ms, which is not yet
> > > > > > > optimal but usable. To see what is technical possible on my setup, I
> > > > > > > made a comparison using moonlight/sunshine which resulted in an delay of
> > > > > > > 20-40ms.
> > > > > > >
> > > > > > > Our goal is to achieve some round trip time similar to the
> > > > > > > moonlight/sunshine scenario to achieve a properly usable desktop
> > > > > > > experience.
> > > > > > >
> > > > > > > Greetings
> > > > > > > Michael
> > > > > >
> > > > > > Greetings
> > > > > > Michael
> >
More information about the Spice-devel
mailing list