[Spice-devel] [PATCH v2 0/2] RHEL7 improvements

Thu Mar 2 18:16:08 UTC 2017

On Thu, 2017-03-02 at 11:34 +0000, Frediano Ziglio wrote:
> These 2 patches attempt to join images split by RHEL7 graphic
> stack (Mesa) decreasing commands handled by spice-server.
> 
> You can see the difference between the 2 video:
> - https://www.youtube.com/watch?v=OarV6zUmUdg (before)
> - https://www.youtube.com/watch?v=5fTdCCbFeCg (after)
> These video are realized with some additional changes:
> - the statistics from the terminal have some additional
>   "out_messages" counters. These counters show the messages
>   sent to the client(s). These changes are part of my "stat"
>   branch (partially sent couple of days ago);
> - the replay utility, as you can see, was changed to replay
>   using the real time to allow the video code (which is dependent
>   on timing) to work correctly. The patch is currently not in
>   a good shape (enough to be sent);
> - the client utility was changed to remove the delay due to
>   the lip sync. The glitches (present mostly before these patches)
>   are much reduced.
> 
> Note the number of commands sent to the client reduced from 6097
> to 2016 (current year, just a coincidence).
> The network traffic reduced from 72M to 56M. This is due to the fact
> that having a single stream (as you can see VP8 codec was used) the
> compression on the stream is better.
> 
> These patches fix:
> - https://bugzilla.redhat.com/show_bug.cgi?id=1158029;
> - (probably) https://bugzilla.redhat.com/show_bug.cgi?id=1294564.
> 
> Changes since RFC:
> - moved code to DisplayChannel;
> - define a constant for timeout.
> 
> Future possible optimization: not waiting to join if the joined
> command end at the screen bottom or if the last command contain
> a small image (Mesa split at about 64K pixel, 256Kb).
> 
> In some experiments with the modified replay utility I got
> some additional artifacts respect to the RFC version. This is mainly
> due to the way RedWorker handle commands from graphic driver and the
> way the timeout was handled. In the previous version before checking
> for joining timeout the graphic command queue were always checked
> while with this last series is possible that the timeout trigger
> before checking for new command. This however seems to happen mainly
> to me as the replay utility introduce some delay.

To expand on this a little bit for posterity:

Without these patches, The driver on RHEL7 splits updates up into
separate drawable commands that are thin horizontal strips instead of
updating the entire area at once. Our stream-detection heuristics treat
these as multiple separate streams instead of one single larger stream.
For example:

+-------------------------------------+
|                                     |
|  +-------------------------+        |
|  | Stream A                |        |
|  +-------------------------+        |
|  | Stream B                |        |
|  +-------------------------+        |
|  | Stream C                |        |
|  +-------------------------+        |
|                                     |
|                                     |
+-------------------------------------+

There are several problems with this. As mentioned above, it generates
extra commands to transfer to the client. And since these strips are
not perfectly synchronized, it can create visual glitches or "tearing"
of the video.

These patches attempt to delay processing of certain drawable commands
for a short time so that we can compare one command to the next command
to see if they are directly on top of eachother. If they are, then we
join them and wait to see if the next command can also be joined. We
keep joining commands this until we recieve a command that cannot be
joined, at which point we process the joined command. Then we start the
process again.

When these commands are joined successfully, our stream-detection
heuristics work as they did in previous versions of linux and we
generate a single stream for the entire updated area:

+-------------------------------------+
|                                     |
|  +-------------------------+        |
|  |                         |        |
|  |                         |        |
|  | Stream A                |        |
|  |                         |        |
|  |                         |        |
|  +-------------------------+        |
|                                     |
|                                     |
+-------------------------------------+

The second patch attempts to solve the situation where there are no
more drawable updates after we recieve the final joinable drawable
command? We could be waiting for the next non-joinable drawable forever
and never process the delayed commands. So it adds a timeout to process
these delayed commands if this situation occurs. But what value do we
use for the timeout? If we choose a too-long timeout, it increases
latency. If we choose a too-short timeout, it increases the chance that
we may process the delayed commands when there are still joinable
commands that have not been delivered yet. When this happens, we may
get striped streams again (or no stream will be detected). In his
testing with the modified replay utility, Frediano said that he saw
this happening with some frequency. In my testing with live vms (on a
local network), I never observed the issue. 

In a previous version of this patch, the timeout and joining of the
commands happened in the RedWorker rather than in the DisplayChannel.
The disadvantage of that approach is that the responsibility for
handling display commands really belongs to the DisplayChannel rather
than the RedWorker. However, the advantage of that approach was that
the timeout to process delayed commands was handled by the RedWorker
rather than the DisplayChannel. The RedWorker could handle all pending
qxl commands before handling the timeout. 

In this new version of the patch, the display channel handles and joins
the display commands (which is a better design -- this should be the
DisplayChannel's responsibility), but it adds the possibility of a race
since the DisplayChannel can't fetch all pending qxl commands before
handling the timeout. So we may end up processing the delayed commands
even though there are additional pending drawables that could be
joined.

As I said above, I never noticed the race on my testing of a live vm.
And if it does happen, I suspect it would be fairly rare. And the
impact of it happening isn't horrible. So I'm pretty comfortable with
this second approach, I think.

I suppose an alternate approach would be to change our stream
heuristics to be able to detect streams based on more than a single
drawable command. But that seems significantly more difficult, and I'm
not sure it would provide any advantages.

> 
> Frediano Ziglio (2):
>   display-channel: Join drawables to improve rhel7 behaviour
>   display-channel: Handle timeout for joining drawables
> 
>  server/display-channel-private.h |   6 +-
>  server/display-channel.c         | 191
> +++++++++++++++++++++++++++++++-
>  server/red-parse-qxl.h           |   1 +-
>  server/red-worker.c              |  14 +-
>  4 files changed, 205 insertions(+), 7 deletions(-)
> 
> base-commit: cdd1e69b2881eec45327e27b297b0cbab44b033e