[Spice-devel] [RFC/POC PATCH 00/16] add output_id to monitors_config

Fri Jun 15 14:44:27 UTC 2018

On Fri, 2018-06-15 at 15:12 +0200, Marc-André Lureau wrote:
> Hi
> 
> On Fri, Jun 15, 2018 at 1:01 PM, Lukáš Hrázký <lhrazky at redhat.com> wrote:
> > On Fri, 2018-06-15 at 12:24 +0200, Marc-André Lureau wrote:
> > > Hi
> > > 
> > > On Fri, Jun 15, 2018 at 12:16 PM, Lukáš Hrázký <lhrazky at redhat.com> wrote:
> > > > On Thu, 2018-06-14 at 21:12 +0200, Marc-André Lureau wrote:
> > > > > Hi
> > > > > 
> > > > > On Tue, Jun 5, 2018 at 5:30 PM, Lukáš Hrázký <lhrazky at redhat.com> wrote:
> > > > > > Hi everyone,
> > > > > > 
> > > > > > following is my attempt at solving the ID issues with monitors_config
> > > > > > and streaming. The concept is as follows:
> > > > > > 
> > > > > 
> > > > > Before introducing a new solution, could you describe the "issues with
> > > > > monitors_config and streaming"? I am not following closely enough the
> > > > > mailing list probably, so a recap would be welcome. I'd like to
> > > > > understand the shortcomings of the current messages, and see if we are
> > > > > on the same page about it. Thanks!
> > > > 
> > > > Right, sorry about that!
> > > > 
> > > > The issue is when having a plugin a for the streaming agent that is
> > > > using a HW-accelerated encoder. The typical case is that you have a QXL
> > > > monitor in the guest which shows the boot and then you have a physical
> > > > HW device (either directly assigned to the VM or a vGPU) which is
> > > > configured in the X server.
> > > > 
> > > > Once the X server starts, the QXL monitor goes blank and you get a
> > > > second monitor on the client with the streamed content from the
> > > > streaming agent plugin.
> > > > 
> > > > The problem is the code and the protocol assumes (amongst other
> > > > assumptions) that all monitors are configured in X. The monitors on the
> > > > client are put into an array and the index is used as the ID throughout
> > > > SPICE. So for the case above, the QXL monitor is ID 0 and the streamed
> > > > monitor is ID 1.
> > > 
> > > 
> > > Since the "streamed monitor" replaces the QXL monitor, wouldn't it
> > > make sense for them to share the same ID? This would be handled by the
> > > server/guest transparently.
> > 
> > Well... for one, it does not entirely replace it. From a user
> > experience PoV, yes, it makes sense to "replace" one monitor for the
> > other. We've even considered switching the content of the display
> > channel from QXL to the streamed one. But in the system, they are
> > different monitors on different channels, so it's a bad idea to give
> > them the same ID. And the ID we're talking about is actually (for the
> > monitors config messages):
> 
> So if it's the same from use PoV, the client API shouldn't need to change.

I didn't say it's the same from the user PoV, not entirely. We can
assume he want's to either be looking at the QXL monitor or the
streamed one, i.e.:

1. VM is booting, we are showing QXL monitor
2. X started, we switch to showing the streamed monitor
3. The user switches to a different TTY, we switch back to QXL
etc...

This is for the simple case of one QXL monitor and one streamed
monitor. But we are still "hiding" from the user that his VM actually
has two graphics devices, the QXL one and a physical (or virtual) GPU.

I once advocated the approach to switch the monitors for the users, but
now I think it would actually complicate things more and could
potentially cause problems.

> The client only cares about about having a specific set of monitors
> configuration. The way the server/guest handled them is not its
> business, it expects the best configuration.

That is true, but the monitors configuration needs to reflect the
actual configuration of the guest. If we mix it up for the clients
convenience, problems... :)

> So far, we have the current simplified model (ie other more complex
> combinations are not supported):
> 
> qxl device, display channel 0,
>   monitor 0:
>   monitor 1:
>   monitor x..
> 
> 
> or
> 
> qxl device, display channel 0,
>   monitor 0:
> qxl device, display channel 1,
>   monitor 0:
> ..
> 
> Feel free to correct me, I haven't been looking into this for many years now.

Correct.

> > 
> > server -> client: a pair (channel_id, monitor_id)
> > 
> > client -> server: channel_id + monitor_id, converted to an array index
> > under the assumption I mentioned
> > 
> > So we can't really make it the same even if we wanted.
> 
> With multi-head/xrandr qxl:
> 
> qxl device, display channel 0,
>   monitor 0:
>   monitor 1:
>   monitor x..
> gpu device, stream channel 0,
>   maps to display channel 0, monitor 0
> gpu device, stream channel 1,
>   maps to display channel 0, monitor 1
> ...
>
> With multihead/xrandr qxl & gpu:
> qxl device, display channel 0,
>   monitor 0:
>   monitor 1:
>   monitor x..
> gpu device, stream channel 0,
>   maps to display channel 0, monitor 0
>   maps to display channel 0, monitor 1
> ...
> 
> With multiple QXL devices:
> 
> qxl device, display channel 0,
>   monitor 0:
> qxl device, display channel 1,
>   monitor 0:
> gpu device, stream channel 0,
>   maps to display channel 0, monitor 0
> gpu device, stream channel 1,
>   maps to display channel 1, monitor 0

But, regardless of multihead/multiple devices, there is no guarantee
that the number of QXL monitors is going to match the number of GPU
device monitors. In fact, the most probable scenario is one QXL monitor
and multiple GPU device monitors.

I don't think it makes sense to have more than one QXL monitor anyway.
And if you did, it makes no sense to map them to GPU device monitors
besides the first one, I guess...

Also, it is unclear what you mean by the "mapping", i.e. how exactly
would you implement it?

> If you want to support QXL devices that should not be associated with
> a gpu/stream channel, then you should be able to flag them.
> 
> If you need a manual association of stream channel with qxl/monitor,
> this could be done with a manual configuration.
> 
> Imho, we should avoid making things more complex. Testing is hard
> enough. We should actually take the simplest approach possible (the
> same decision we took before, it's already a mess to deal with, as you
> noticed)

I fully agree with the notion :) However, we are already making things
much more complex by adding the GPU device streaming feature. It breaks
previous assumptions and adds more stuff on top of it. And I think your
suggestion adds even more complexity than necessary. Perhaps not in the
client, but the "mapping" code, which sounds rather non-trivial, needs
to go somewhere, presumably the server?

It is actually my intention to add as little complexity as possible
with my patches. The fact is, both the current code and the protocol is
not robust enough and need to be fixed. And even if they were robust
enough, the protocol would need to be extended either way. On top of
that, the backwards compatibility turns the solution into a very
complex one. But if we ever could deprecate and drop the old code
paths...

> > > > The more pressing issue with this is that a mouse motion event is sent
> > > > with a display_id 1 from the client, but when it reaches the guest, the
> > > > correct monitor is ID 0, as it is the only one.
> > > > 
> > > > The other thing this breaks is monitors_config messages that
> > > > enable/disable displays and change the resolution. Besides the same
> > > > "shift" happening here as well, another problem is the information that
> > > > the monitors belong to different devices (or channels) is erased when
> > > > they're all put into the single array (under the assumption there is
> > > > either one channel with multiple monitors or multiple channels with one
> > > > monitor each).
> > > 
> > > Correct and so far was a fine solution.
> > 
> > I respectfully disagree. Doing the channel_id + monitor_id I mentioned
> > above is asking for trouble, which is exactly what we have now.
> 
> With the limitation we decided, we avoid the biggest troubles though.

I take it you were involved with the original implementation then...
I'm sorry for being so critical, it is not my intention to be mean :)

However, I am not criticising the limitations you chose to impose, but
the implementation. I find two problems here:

1. Not keeping the IDs of the monitors_config messages as a
(channel_id, monitor_id) pair and instead abusing the limitation you
introduced to change it to a singe ID. I sort of consider it a hard
rule you always keep your unique IDs intact and always keep them with
the data. That way they're there when you need them and they work
(because you didn't mess with them :)).

2. The change in the IDs and the actual creation of the monitors_config
list leaks over the spice-gtk API to the client application (via the
spice_main_update_display{,_enabled}() function). It is unnecessary,
gives more opportunity to the client app to break it, and any extension
of the monitors_config message requires a spice-gtk API change.

You could have imposed the limitation you did (I'm not entirely sure
what was the reason) and still make the implementation more robust by
not doing the above...

And for these reasons this patch series is more complicated that it
could have been (though as I said, the protocol extension, i.e. adding
the output_id) would still be necessary.

I'm not saying this to pick on you, I'm not even sure how you were
involved (didn't go digging who wrote the code), but more as an
explanation and so that we all can learn from it for the future :)

> > > Either you do multiple monitor over a single channel, or you use multiple
> > > channels, each with one monitor.
> > 
> > That is a limitation you can introduce, perhaps there was a reason for
> > it. But the code is taking shortcuts and borderline hacks because of
> > this, while doing it right wouldn't be much harder. I'm not looking to
> > blame or judge anyone, I don't know how this came into place. But what
> > it is now is (IMO) a bad design and I feel the need to say it after
> > working on it for quite some time ;)
> 
> Doing it differently would lead to push the complexity higher up the
> stack, perhaps even to the user. We wanted to avoid that.

Yeah, as I said, not sure what the reason was, but IMO you could have
had that and still make it much more robust.

Oh and BTW, not sure how old this is, but I'm sure I'd have much worse
things to say about the code _I_ wrote say like 8 years ago ;)

> > > > Hope this explains it well enough. It's all very complex and goes over
> > > > multiple interfaces (the network protocols as well as the spice-gtk
> > > > API), so the more brilliant ideas you'll have, the more welcome they
> > > > will be :)
> > > 
> > > If we can avoid modifying all the layers and exposing the inner
> > > details, I would encourage the exploration of other solution.
> > 
> > I don't see how we can do that... And the inner details are already
> > quite exposed, unfortunately :(
> > 
> > As I said, ideas welcome.
> > 
> > > > > > 1. The streaming-agent sends a new StreamInfo message when it starts
> > > > > > streaming. The message contains the output_id of the streamed monitor.
> > > > > > The actual value is the index in the list of xrandr outputs. Basically,
> > > > > > the number should uniquely identify a monitor in the guest context under
> > > > > > X.
> > > > > > 
> > > > > > 2. The server copies the number into the SpiceMsgDisplayMonitorsConfig
> > > > > > message and sends it to the client as part of the monitors_config
> > > > > > exchange.
> > > > > > 
> > > > > > 3. The client stores the output_id in it's list of monitor_configs. Here
> > > > > > I had to make significant changes, because the monitors_config code in
> > > > > > spice-gtk is very loosely written and also exposes quite a few
> > > > > > unnecessary details to the client app. The client sends the output_id
> > > > > > back to the server as part of the monitors_config messages
> > > > > > (VDAgentMonitorsConfigV2) and also uses it for the mouse motion event,
> > > > > > which also contains a display ID interpreted by the vd_agent. In the
> > > > > > end, the API/ABI towards the client app should remain unchanged.
> > > > > > 
> > > > > > 4. The server passes the output_id in above-mentioned messages to the
> > > > > > vd_agent. The output_id is meaningless in the server context.
> > > > > > (Currently, it doesn't pass the monitors_config messages if there is a
> > > > > > QXL device that supports it, though. Needs more work.)
> > > > > > 
> > > > > > 5. vd_agent:
> > > > > >   a) For mouse input, the output_id was passed in the original message,
> > > > > >   so no change needed here, it works.
> > > > > > 
> > > > > >   b) If the server sends monitors_config to the guest, the vdagent will
> > > > > >   prefer to use monitors_configs with the output_ids set, if such are
> > > > > >   present. In that case, it ignores monitors_configs with the output_id
> > > > > >   unset. If no output_ids are present, it should behave as it used to.
> > > > > > 
> > > > > > A couple of things to note:
> > > > > > - While I did copy the VDAgentMonitorsConfig to a different message for
> > > > > > backwards compatibility, I didn't do the same for
> > > > > > SpiceMsgDisplayMonitorsConfig, it remains to be done.
> > > > > > 
> > > > > > - I didn't introduce any capabilities to handle the compatibility, also
> > > > > > remains to be done. Hopefully it is also clear it will be quite a
> > > > > > non-trivial job to do that :( Ain't gonna make the code prettier either.
> > > > > > 
> > > > > > For your convenience, you can also pull the branches below:
> > > > > > https://gitlab.freedesktop.org/lukash/spice-protocol/tree/monitors-config-poc
> > > > > > https://gitlab.freedesktop.org/lukash/spice-common/tree/monitors-config-poc
> > > > > > https://gitlab.com/lhrazky/spice-streaming-agent/tree/monitors-config-poc
> > > > > > https://gitlab.freedesktop.org/lukash/spice/tree/monitors-config-poc
> > > > > > https://gitlab.freedesktop.org/lukash/spice-gtk/tree/monitors-config-poc
> > > > > > https://gitlab.freedesktop.org/lukash/vd_agent/tree/monitors-config-poc
> > > > > > 
> > > > > > All in all, it's big, complex and invasive. Note I did review the
> > > > > > emergency instructional video [1] and am therefore ready for any
> > > > > > bombardment you'll be sending my way :D (Don't hesitate to contact me
> > > > > > with questions either)
> > > > > > 
> > > > > > Last minute note: I've noticed some of the patches are missing
> > > > > > Signed-off-by line, since they are not for merging, should not be an
> > > > > > issue...
> > > > > > 
> > > > > > 
> > > > > > Lukáš Hrázký (16):
> > > > > > spice-protocol
> > > > > >   Add the StreamInfo message
> > > > > >   Create a version 2 of the VDAgentMonitorsConfig message
> > > > > > spice-common
> > > > > >   add output_id to SpiceMsgDisplayMonitorsConfig
> > > > > > spice-streaming-agent
> > > > > >   Send a StreamInfo message when starting to stream
> > > > > > spice-server
> > > > > >   Handle the StreamInfo message from the streaming agent
> > > > > >   Use VDAgentMonitorsConfigV2 that contains the output_id field
> > > > > > spice-gtk
> > > > > >   Rework the handling of monitors_config
> > > > > >   Remove the n_display_channels check when sending monitors_config
> > > > > >   Use an 'enabled' flag instead of status enum in monitors_config
> > > > > >   Use VDAgentMonitorsConfigV2 as the message for monitors_config
> > > > > >   Add output_id to the monitors_config
> > > > > >   Use the new output_id as display ID for the mouse motion event
> > > > > > vd_agent
> > > > > >   vdagent: Log the diddly doo X11 error
> > > > > >   Improve/add some logging messages
> > > > > >   Use VDAgentMonitorsConfigV2 instead of VDAgentMonitorsConfig
> > > > > >   vdagent: Use output_id from VDAgentMonitorsConfigV2
> > > > > > 
> > > > > > [1] https://www.youtube.com/watch?v=IKqXu-5jw60
> > > > > > 
> > > > > > --
> > > > > > 2.17.1
> > > > > > 
> > > > > > _______________________________________________
> > > > > > Spice-devel mailing list
> > > > > > Spice-devel at lists.freedesktop.org
> > > > > > https://lists.freedesktop.org/mailman/listinfo/spice-devel
> > > > > 
> > > > > 
> > > > > 
> > > 
> > > 
> > > 
> 
> 
>