[Spice-devel] [RFC PATCH spice-protocol v2 03/20] Add the StreamInfo message

Fri Aug 17 16:16:51 UTC 2018

On Fri, 2018-08-17 at 06:36 -0400, Frediano Ziglio wrote:
> > On Thu, 2018-08-16 at 16:08 -0500, Jonathon Jongsma wrote:
> > > On Thu, 2018-08-16 at 18:26 +0200, Lukáš Hrázký wrote:
> > > > The message is meant to contain information related to the
> > > > stream,
> > > > e.g.
> > > > now the guest (xrandr) output ID of the streamed monitor.
> > > > 
> > > > Signed-off-by: Lukáš Hrázký <lhrazky at redhat.com>
> > > > ---
> > > >  spice/stream-device.h | 11 +++++++++++
> > > >  1 file changed, 11 insertions(+)
> > > > 
> > > > diff --git a/spice/stream-device.h b/spice/stream-device.h
> > > > index 6add42b..be22e06 100644
> > > > --- a/spice/stream-device.h
> > > > +++ b/spice/stream-device.h
> > > > @@ -90,6 +90,8 @@ typedef enum StreamMsgType {
> > > >      STREAM_TYPE_CURSOR_SET,
> > > >      /* guest cursor position */
> > > >      STREAM_TYPE_CURSOR_MOVE,
> > > > +    /* the stream information message */
> > > > +    STREAM_TYPE_INFO,
> > > >  } StreamMsgType;
> > > >  
> > > >  typedef enum StreamCapabilities {
> > > > @@ -140,6 +142,15 @@ typedef struct StreamMsgData {
> > > >      uint8_t data[0];
> > > >  } StreamMsgData;
> > > >  
> > > > +/* Message containing information about the stream, sent when
> > > > a new
> > > > stream is created.
> > > > + */
> > > > +typedef struct StreamMsgInfo {
> > > > +    /* The xrandr output ID (index of the output in the list)
> > > > that
> > > > is being
> > > > +     * streamed, 0-based.
> > > > +     */
> > > > +    uint32_t guest_output_id;
> > > > +} StreamMsgInfo;
> > > > +
> > > >  /* Tell to stop current stream and possibly start a new one.
> > > >   * This message is sent by the host to the guest.
> > > >   * Allows to communicate the codecs supported by the clients.
> > > 
> > > 
> > > So, here's where our work starts to overlap a bit (and why it's a
> > > bit
> > > unfortunate that I haven't gotten around to sending out a
> > > proposal for
> > > multi-monitor support so far, perhaps).
> > 
> > It's not much of a problem, I just did it the simple way not
> > knowing
> > what the plans were for multi-monitor, but it's not hard to change.
> > 
> > > The change above assumes that there will be a single output for a
> > > single stream device. This is currently true because we only
> > > support a
> > > single output. For a variety of reasons, I've chosen to implement
> > > multimonitor support by multiplexing multiple outputs over a
> > > single
> > > stream device. I'll share my current working patch for the stream
> > > device protocol below.
> > > 
> > > But first I'll try to explain why I chose to multiplex displays
> > > over a
> > > single stream device. In general, I think it makes things easier.
> > > Some
> > > examples:
> > > 
> > >  * if you use one stream device per output, you need to decide
> > > before
> > >    you launch the vm how many outputs it will support and
> > > allocate
> > >    those devices by passing the correct options to qemu. To add
> > > an
> > >    additional output, you need to re-start qemu with different
> > >    parameters.
> > 
> > This alone is reason enough to do it the way you did it...
> > 
> > >  * We currently use a hard-coded spice port name (org.spice-
> > >    space.stream.0). If we have a separate stream device for each
> > >    display, we will have multiple port names. How do those port
> > > names
> > >    get communicated from the server to the guest?
> > >  * If you have multiple stream devices, do we launch a separate
> > > spice-
> > >    streaming-agent process for each one? Or a single streaming
> > > agent
> > >    that handles all of them?
> > >  * You also have to decide how to handle odd issues such as the
> > > fact
> > >    that the driver may be perfectly capable of enabling
> > > additional
> > >    monitors, but we don't have sufficient stream devices for
> > > them. So
> > >    what do you do when the vm was configured with only a single
> > > spice
> > >    device, but the user enables an additional display from within
> > > the
> > >    guest? Now perhaps the guest has two displays enabled, but we
> > > have
> > >    no way of sending that second display to the client.
> > > 
> 
> Another solution would be to allocate a maximum of devices we want to
> support and use up to this number. Kind of 8 devices. Obviously this
> will be a limitation.

Well, the allocation of devices is not something we can do anything
about. It up to the sysadmin to configure it properly. If they
configure fewer devices, then we need to be able to handle it somehow.
In the case where all displays share a single device, there are only
two scenarios to handle: 1) device exists (streaming works), or 2)
device doesn't exist (streaming doesn't work).

> 
> > > So there were a lot of little things like this that led me to
> > > decide
> > > that using a single stream device was simpler and more flexible.
> > > There
> > > may have been other reasons that I've forgotten already as well.
> > > 
> > > The only real downside I could think of was that the stream
> > > device
> > > protocol would have to change in a non-backward-compatible way
> > > (see
> > > below). But since the streaming agent is still in early stages,
> > > that
> > > doesn't seem like a huge deal.
> > 
> > I'm not sure, we made a release and no compatibility disclaimer was
> > made about the protocol compatibility (only about API/ABI
> > instability),
> > but at this stage I'd say we shouldn't worry about compatibility
> > yet,
> > probability of changes is still high.
> > 
> 
> "have to change in a non-backward-compatible way" is pretty false.
> I didn't see in your patch below the needs to do it. Just add
> capabilities
> and messages.

Sure, my wording was not technically correct. But since the streaming
agent is not actually used yet, I didn't feel that it was worth adding
capabilities to the protocol yet since there's no existing
installations that need to be supported yet (right?).

> With these reasoning we'll probably have HTTP 187 and not HTTP 2 and
> TCP
> 934 instead of 6.
> Personally I would use the reserved "padding" field in
> StreamDevHeader
> for the output ID.
> 
> > > Of course, it's possible that I missed
> > > other disadvantages, so let me know any problems you see with
> > > this
> > > approach.
> > 
> > I can only think of possible performace issues of a single port
> > with
> > multiplexed 4 4k displays for example? Not sure of the behaviour
> > and if
> > something would be different if there were 4 ports? Maybe a test
> > would
> > be in place before we commit to a solution?
> > 
> 
> One possible issue of this is serialization. If I need to send 4
> encoded
> frames, each frame has to be contained in a single message (as is
> today)
> and each message is contiguous is possible that before sending the
> forth
> frame we need to send all other complete frames which can cause
> potential
> delays. 

I guess we'd have to do some tests to see if there really was any
significant performance / delay caused by using a single device. I
suppose it depends a bit on how qemu handles the spice port reads from
multiple devices. Does qemu have a thread of execution for each spice
port, or is there a single thread handling all spice ports. If it's the
latter, then it seems that there wouldn't be much performance advantage
to using multiple devices, right? (I briefly looked at qemu but didn't
find the answer right away)

> Different protocol when they implement multiplexing like these
> allow the messages/requests to be split (examples are HTTP2, TDS and
> potentially
> there are plans for WebSockets). This would be potentially a change
> requiring
> breaking compatibility.
> 
> I suppose the server will have multiple connection with the client
> using multiple DisplayChannel (as protocol) and each frame will be
> sent
> to the appropriate channel. Or are we going to implement multiple
> surfaces for DisplayChannel (as protocol implementation)?
> Maybe asking too much details about multi monitor.

Well, that's a bit of an open question yet, to be honest. I was
thinking so far that it'd be better to use the linux approach (multiple
outputs on a single display channel) for a similar reason that I
decided to use a single stream device above.

> But if we need multiple DisplayChannel and potentially we have the
> issue
> that we need to open all at the beginning and not later (due to the
> login token expiring) won't we have to allocate these channel
> statically
> (like 8) to support them?

Which you demonstrate here: separate display channels adds a little bit
more complexity in some ways.

> 
> I won't be again trying a library like cap'n proto and break
> compatibility,
> seems to me that the team has problems dealing with protocol and
> compatibility,
> but even choosing a library without understanding much protocol
> issues
> is not really good either.

I'm afraid I don't understand what this paragraph means.

> 
> > > Anyway, here's the current proof-of-concept changes to the stream
> > > device protocol that I'm working with:
> > 
> > Looks fine to me in general, hopefully Frediano takes a look as
> > well :)
> > 
> > > diff --git a/spice/stream-device.h b/spice/stream-device.h
> > > index 6add42b..57c768f 100644
> > > --- a/spice/stream-device.h
> > > +++ b/spice/stream-device.h
> > > @@ -58,7 +58,7 @@
> > >   */
> > >  
> > >  /* version of the protocol */
> > > -#define STREAM_DEVICE_PROTOCOL 1
> > > +#define STREAM_DEVICE_PROTOCOL 2
> > >  
> > >  typedef struct StreamDevHeader {
> > >      /* should be STREAM_DEVICE_PROTOCOL */
> > > @@ -78,8 +78,8 @@ typedef enum StreamMsgType {
> > >      STREAM_TYPE_INVALID = 0,
> > >      /* allows to send version information */
> > >      STREAM_TYPE_CAPABILITIES,
> > > -    /* send screen resolution */
> > > -    STREAM_TYPE_FORMAT,
> > > +    /* send guest output configuration */
> > > +    STREAM_TYPE_OUTPUT_CONFIG,
> > 
> > So the naming is "output_config" here. I don't mind, if we want to
> > call
> > each "monitor" by the name "output" here? At this point I don't
> > have
> > much of an opinion, except... (see below)
> > 
> 
> I don't see why simply a new message cannot be added instead.
> Or use capability and extend current message.

It can be. But as I mentioned above, I assumed we didn't need to
support any existing installations so why carry along the old baggage
if we can discard it?

> 
> > >      /* stream data */
> > >      STREAM_TYPE_DATA,
> > >      /* server ask to start a new stream */
> > > @@ -115,6 +115,16 @@ typedef struct StreamMsgCapabilities {
> > >  
> > >  #define STREAM_MSG_CAPABILITIES_MAX_BYTES 1024
> > >  
> > > +typedef struct StreamMsgOutput {
> > > +    uint8_t output_id;
> > 
> > ... "output_id" is very similar to the "guest_output_id" I used
> > throughout the monitor ID series. (though you've got uint8_t here
> > and I
> > use uint32_t)
> > 
> > Most important role of the "output_id" here is being the unique
> > identifier of the "Outputs" as you define them here. A major
> > question
> > we should answer is whether we want to use this "output_id" also as
> > the
> > guest_output_id. At this moment, "guest_output_id" is a sequence
> > starting from 0, so "output_id" can easily be equal to that, but
> > they
> > are not in essence not the same numbers.
> > 
> > On a side note, a question was already raised what will the
> > "guest_output_id"s be on Wayland. The answer is I have no idea. On
> > Wayland it seems each compositor is doing the multi-monitor setup
> > itself, which seems like a nightmare for us.
> > 
> 
> Yes, I think Wayland is quite obfuscating lot of stuff in the name of
> "security". I really don't understand why a computer user, which
> should
> represent a human user in the system has to have much less rights
> than
> the real one. I can see all the screens with my eyes and all the
> windows
> so why my user on the computer is so limited? Why I can launch a
> lspci
> or similar but Wayland filter lot of hardware information?
> 
> OT: due to additional delay for the compositor is recommended to use
> Xorg for gaming!
> 
> > > +    /* screen resolution/stream size */
> > > +    uint32_t width;
> > > +    uint32_t height;
> > > +    /* coordinates of the top-left corner of the screen */
> > > +    int32_t x;
> > > +    int32_t y;
> 
> You need to align the structure.
> 
> Should the server compute a kind of fake surface? I was thinking
> about the protocol server <-> client for multi monitor.
> 
> > > +} StreamMsgOutput;
> > > +
> > >  /* Define the format of the stream, start a new stream.
> > >   * This message is sent by the guest to the host to
> > >   * tell the host the new stream format.
> > > @@ -122,21 +132,21 @@ typedef struct StreamMsgCapabilities {
> > >   * States allowed: Ready
> > >   *   state will change to Streaming.
> > >   */
> > > -typedef struct StreamMsgFormat {
> > > -    /* screen resolution/stream size */
> > > -    uint32_t width;
> > > -    uint32_t height;
> > > +typedef struct StreamMsgOutputConfig {
> > >      /* as defined in SpiceVideoCodecType enumeration */
> > >      uint8_t codec;
> > > -    uint8_t padding1[3];
> > > -} StreamMsgFormat;
> > > +    uint8_t n_outputs;
> > > +    StreamMsgOutput outputs[0];
> 
> alignment.
> 
> From the protocol prospective why not sending multiple formats?
> Well, I suppose would need to add ID field.

What do you mean by formats? different codecs?

> 
> > > +} StreamMsgOutputConfig;
> > > +
> > 
> > So the "codec" is common to all the outputs and each output has
> > it's
> > own dimensions. Makes sense...
> > 
> > Cheers,
> > Lukas
> > 
> > > -/* This message contains just raw data stream.
> > > +/* This message contains just raw data stream for the specified
> > > output.
> > >   * This message is sent by the guest to the host.
> > >   *
> > >   * States allowed: Streaming
> > >   */
> > >  typedef struct StreamMsgData {
> > > +    uint8_t output_id;
> 
> This won't be necessary using the "padding" in the header.
> 
> > >      uint8_t data[0];
> > >  } StreamMsgData;
> > >  
> > > 
> > > 
> > > Jonathon
> 
> Frediano