[Bug 90819] New: SPICE connection to VM no longer seamless with 12th login

Tue Jun 2 06:30:22 PDT 2015

https://bugs.freedesktop.org/show_bug.cgi?id=90819

            Bug ID: 90819
           Summary: SPICE connection to VM no longer seamless with 12th
                    login
           Product: Spice
           Version: unspecified
          Hardware: Other
                OS: All
            Status: NEW
          Severity: normal
          Priority: medium
         Component: win32 agent
          Assignee: spice-bugs at lists.freedesktop.org
          Reporter: teuf at gnome.org

Moving here the content of https://bugzilla.redhat.com/show_bug.cgi?id=1104697

 Markus Stockhausen 2014-06-04 10:04:06 EDT

Description of problem:

After Live migration of a Windows VM several times between two hypervisor nodes
SPICE and RDP connection suddenly stops working.

Version-Release number of selected component (if applicable):

Ovirt 3.4.1
Hypervisor nodes: Fedora 20
QEMU: 1.6.2
SPICE: 0.12.5
VM: Windows XP
VM drivers: guest tools 0.74

How reproducible:

100% 

Steps to Reproduce:
1. Create a spice based windows VM 
2. Install XP into that VM
3. Migrate VM several times between two hypervisor hosts

Actual results:

4. After 10-20 migrations the VM console cannot be opened any more. Neither
through RDP nor through SPICE

Expected results:

Connection should work.

Additional info:

SPICE log (level 4) attaced

screenshot of spice connection try attached

[reply] [−]
Private
Comment 1 Markus Stockhausen 2014-06-04 10:07:36 EDT

Created attachment 902192 [details]
qemu/spice log

[reply] [−]
Private
Comment 2 Markus Stockhausen 2014-06-04 10:08:23 EDT

Created attachment 902193 [details]
spice console

[reply] [−]
Private
Comment 3 Markus Stockhausen 2014-06-04 10:24:53 EDT

As not clearly explained: I have spice-server 0.12.5 installed from the Fedora
testing repositories. In 0.12.4 we had several endless loops because 64 bit
time stamps were truncated to 32 bit.

[reply] [−]
Private
Comment 4 Markus Stockhausen 2014-06-04 10:46:26 EDT

A working VM shows the following SPICE log lines around the error:

((null):19914): SpiceWorker-Info **:
red_worker.c:9635:display_channel_client_wait_for_init: creating encoder with
id == 0
<----- Here the connection hangs               ---->
<----- Below the lines of a noraml working VM  ---->
((null):19914): Spice-Info **: reds.c:2014:reds_handle_auth_mechanism: Auth
method: 1
((null):19914): Spice-Info **: reds.c:1421:reds_info_new_channel: channel 3:0,
connected successfully, over Secure link
inputs_connect: inputs channel client create

[reply] [−]
Private
Comment 5 Markus Stockhausen 2014-06-04 15:30:01 EDT

Compiled spice-server 0.12.5 from scratch and added further logging
instructions:

Ouput:

((null):10731): SpiceWorker-Info **:
red_worker.c:9635:display_channel_client_wait_for_init: MST creating encoder
with id == 0
((null):10731): SpiceWorker-Info **:
red_worker.c:9637:display_channel_client_wait_for_init: MST glz_encoder_created
((null):10731): SpiceWorker-Info **:
red_worker.c:9668:on_new_display_channel_client: MST
red_channel_client_ack_zero_messages_window
((null):10731): SpiceWorker-Info **:
red_worker.c:9678:on_new_display_channel_client: MST end
((null):10731): SpiceWorker-Info **:
red_worker.c:10652:handle_new_display_channel: MST end
((null):10731): SpiceWorker-Info **:
red_worker.c:11449:handle_dev_display_connect: MST connected
((null):10731): SpiceWorker-Info **:
red_worker.c:10119:display_channel_handle_message: MST before_handle_message
((null):10731): Spice-Info **:
red_channel.c:1531:red_channel_client_handle_message: MST start type=1
<---- The last tracepoint that is called in error case
      The next tracepoint that is called in ok case --->
((null):10731): Spice-Info **: reds.c:2266:reds_accept_ssl_connection: MST
start
((null):10731): Spice-Info **: reds.c:2230:reds_init_client_ssl_connection: MST
start

Conclusion: The spice connection to the bad VM dos not jump into function
reds_accept_ssl_connection() after the display_channel_client_wait_for_init()
call. 

I'm willing to enhance the tracepoints further. Any ideas where to search next.
I'm having some problems to understand the watcher concept and to find a good
point to debug the sent/received packets.

Oved Ourfali 2014-06-05 01:06:52 EDT
Whiteboard: virt
[reply] [−]
Private
Comment 6 Omer Frenkel 2014-06-05 03:06:45 EDT

please attach vdsm.log as well

Flags: needinfo?(mst at collogia.de)
[reply] [−]
Private
Comment 7 Markus Stockhausen 2014-06-05 03:28:24 EDT

Created attachment 902418 [details]
vdsm

Flags: needinfo?(mst at collogia.de)
[reply] [−]
Private
Comment 8 Markus Stockhausen 2014-06-05 03:30:19 EDT

VDSM log from migration target host attached. The log starts before the
migration is initiated and ends after I tried to logon to SPICE console via
webadmin.

[reply] [−]
Private
Comment 9 Markus Stockhausen 2014-06-05 04:42:14 EDT

Remark: The following behaviour can be reproduced even without doing a SPICE
console connection. Just an online migration of the VM to the special spice
debug enabled Fedora 20 hypervisor.

I narrowed the error down a bit in function red_worker_main(). In a normal
working VM the endless loop in this function is permanently running. In the
"damaged" VM this loop is suddenly halted. I placed two debugging messages
before and after the call:

spice_info("before_poll");
num_events = poll(worker->poll_fds, MAX_EVENT_SOURCES, worker->event_timeout);
spice_info("after_poll");

On the bad VM that call suddenly does not return any more data and stalls. the
last log line shows "before_poll". In the good VM that call will always return
and the log is written on and on.

[reply] [−]
Private
Comment 10 Markus Stockhausen 2014-06-05 10:10:13 EDT

Please ignore my last comments. 

I made some further analysis and the outcome is as follows:

When doing live migrations in the above described setup the SPICE performance
might deteriorate on the target host. This can lead to a stall of the console
display. Several problems might occur:

1) SPICE console lags. If you draw windows over the WinXP desktop very fast the
react slowly and are running "behind" the mouse pointer.

2) SPICE console is no longer responsive. You cannot select any item on the
WinXP desktop. Nevertheless the console display can be opened from webadmin.

3) SPICE console stays blank with the text connecting ...

This has been verified with the updated 0.12.5 spice-server and the original
FC20 spice-server 0.12.4

Videos of case 2 are attached.

[reply] [−]
Private
Comment 11 Markus Stockhausen 2014-06-05 10:10:49 EDT

Created attachment 902559 [details]
normal performance before migration

[reply] [−]
Private
Comment 12 Markus Stockhausen 2014-06-05 10:11:18 EDT

Created attachment 902561 [details]
bad performance after migration

Itamar Heim 2014-06-08 02:30:00 EDT
Target Release: --- → 3.5.0
[reply] [−]
Private
Comment 13 Christophe Fergeau 2014-06-10 09:46:43 EDT

(In reply to Markus Stockhausen from comment #5)

> Conclusion: The spice connection to the bad VM dos not jump into function
> reds_accept_ssl_connection() after the
> display_channel_client_wait_for_init() call. 
> 
> I'm willing to enhance the tracepoints further. Any ideas where to search
> next. I'm having some problems to understand the watcher concept and to find
> a good point to debug the sent/received packets.

Have you tried getting a gdb backtrace when the server is hanging?
('debuginfo-install qemu-kvm'  'gdb --pid $qemu_pid' followed by 'thread apply
all bt')

(In reply to Markus Stockhausen from comment #10)
> Please ignore my last comments. 

All of them? There is no actual hang, but 'only' some very slow behaviour?

> 
> This has been verified with the updated 0.12.5 spice-server and the original
> FC20 spice-server 0.12.4
> 

Just to make sure I got you right, 0.12.4 was working ok (save for this 32 bit
timer overflow), and this issue appeared in 0.12.5 ?

[reply] [−]
Private
Comment 14 Markus Stockhausen 2014-06-10 10:10:47 EDT

Some further tests showed that this problem occurs only in Windows XP VMs. To
be precise we see the system performance degrading after each migration.  So we
can divide the error into several stages.

1) After 1-5 migration the SPICE display will get slower. The system seems to
work normally. See video above.

2) Another 5-10 migrations later the system halts completely.

Exchanging SPICE 0.12.5 with 0.12.4 and vice versa does not change the
situation

So to get closer to the error I will follow your advice and analyze what is
going on, when the system stalls completely. Depending on that result I will
open a new BZ or will continue this one.

[reply] [−]
Private
Comment 15 Markus Stockhausen 2014-06-10 10:28:19 EDT

Created attachment 907289 [details]
gdb output

[reply] [−]
Private
Comment 16 Markus Stockhausen 2014-06-10 10:30:26 EDT

GDB Output attached. The current version state in that situation is:

- qemu 1.6.2
- libspice-server 0.12.4

[reply] [−]
Private
Comment 17 Markus Stockhausen 2014-06-10 14:30:03 EDT

BZ1107835 has been created to analyze the major problem -> VM will stall after
some migrations.

Markus Stockhausen 2014-06-10 14:30:33 EDT
Depends On: 1107835
Omer Frenkel 2014-06-11 01:53:52 EDT
Target Release: 3.5.0 → ---
Component: ovirt-engine-webadmin → spice
Version: 3.4 → 20
Assignee: bugs at ovirt.org → cfergeau at redhat.com
Product: oVirt → Fedora
QA Contact: pstehlik at redhat.com → extras-qa at fedoraproject.org
Markus Stockhausen 2014-06-13 15:11:16 EDT
Summary: Spice & RDP connection broken after live migration → SPICE connection
to VM no longer seamless with 12th login
[reply] [−]
Private
Comment 18 Markus Stockhausen 2014-06-13 15:22:36 EDT

During long tests we isolated several bugs we encountered in the Ovirt
environment. Therefore I changed this bug topic to better reflect a really
reproducible testcase. Hope this is ok for the assignees.

Currently it boils down to a very simple and unwanted behaviour.

Server side:

Ovirt 3.4.2
Fedora 20 hyper visor nodes
qemu 1.6.2
libspice-server 0.12.4

Client side: 

Windows 7
virt-viewer 0.6.0

Steps to reproduce:

1) Start a Windows XP SPICE VM from Ovirt webadmin
2) Connect to the console of the VM
3) virt-viewer will start and open the console window
4) Admin can control the VM "seamless" That means the mouse is not trapped in
the conolse window. Simply dragging the mouse over the console window you can
interact with the VM.
5) Logout from console by closing virt-viwer

6) Repeat steps 2-5 exactly 11 times

7) Logon to the VM for the 12th time.

Result:

When admin wants to interact with the console the mouse gets trapped in the
console window. Admin must press SHIFT+F12 to get out of the window.

Expected result: 

Seamless integration. No trapped mouse.

Markus Stockhausen 2014-06-13 15:45:53 EDT
No longer depends On: 1107835
[reply] [−]
Private
Comment 19 Marc-Andre Lureau 2014-06-13 17:21:33 EDT

When the mouse is trapped, can you verify if the vdagent is still running? most
likely, not, could you get the vdagent & vdservice logs when the bug occurs?
thanks

Flags: needinfo?(mst at collogia.de)
[reply] [−]
Private
Comment 20 Markus Stockhausen 2014-06-14 10:35:52 EDT

Created attachment 908784 [details]
vdservice

Flags: needinfo?(mst at collogia.de)
[reply] [−]
Private
Comment 21 Markus Stockhausen 2014-06-14 10:36:22 EDT

Created attachment 908785 [details]
vdagent

[reply] [−]
Private
Comment 22 Markus Stockhausen 2014-06-14 10:37:36 EDT

Logs attached - please don't tell me vdagent/vdservice are stopped if a user
logs into a server too often :(

[reply] [−]
Private
Comment 23 Markus Stockhausen 2014-06-14 10:49:16 EDT

...
    if (!normal_restart && ++_agent_restarts > VD_AGENT_MAX_RESTARTS) {
        vd_printf("Agent restarted too many times");
        ret = false;
        stop();
    }
    if (ret && kill_agent() && launch_agent()) {
        if (time - _last_agent_restart_time >
VD_AGENT_RESTART_COUNT_RESET_INTERVAL) {
            _agent_restarts = 0;
        }
...

That math looks too anticipatory. In that setup vdservice will kill the agent
if a user logs into the console with an interval of < 60 seconds. Is there no
better way to handle that?

[reply] [−]
Private
Comment 24 Markus Stockhausen 2014-06-14 10:55:01 EDT

Windows XP ist not so graceful with agent restarts as newer versions:

  case WAIT_OBJECT_0 + VD_STATIC_EVENTS_COUNT:
    vd_printf("Agent killed");
    if (_system_version == SYS_VER_WIN_XP_CLASS) {
      restart_agent(false);
    } else if (_system_version == SYS_VER_WIN_7_CLASS) {
      kill_agent();
      // Assume agent was killed due to console disconnect, and wait for agent
      // normal restart due to console connect. If the agent is not alive yet,
      // it was killed manually (or crashed), so let's restart it.
      if (WaitForSingleObject(_control_event, VD_AGENT_RESTART_INTERVAL) ==
                       WAIT_OBJECT_0) {
        handle_control_event();
      }
      if (_running && !_agent_alive) {
        restart_agent(false);
      }

Any reason for that?

[reply] [−]
Private
Comment 25 Christophe Fergeau 2014-06-16 04:50:34 EDT

The bit of code you quote was added as part of
https://bugzilla.redhat.com/show_bug.cgi?id=845222
Thanks for the very detailed investigation!
I'm not sure if we have a spice-vdagent-win upstream bugzilla component though
:((

[reply] [−]
Private
Comment 26 Uri Lublin 2014-06-16 05:32:41 EDT

(In reply to Markus Stockhausen from comment #23)

The logic of this code is:
When spice-vdagent exits, spice-vdservice runs a new instance of spice-vdagent.
If something prevents the spice-vdagent from starting, for example a problem
with the virtio-serial device/driver, spice-vdservice tries
VD_AGENT_MAX_RESTARTS consecutive times and stops trying.
The code assumes that if there is no problem, at least one instance out of the
VD_AGENT_MAX_RESTARTS vdagents will live for more than
VD_AGENT_RESTART_COUNT_RESET_INTERVAL.

The problem is partially due to making the windows spice vdagent exit when the
spice client disconnects.

[reply] [−]
Private
Comment 27 Markus Stockhausen 2014-06-26 06:08:25 EDT

(In reply to Christophe Fergeau from comment #25)
> The bit of code you quote was added as part of
> https://bugzilla.redhat.com/show_bug.cgi?id=845222
> Thanks for the very detailed investigation!
> I'm not sure if we have a spice-vdagent-win upstream bugzilla component
> though :((

Is there some better place to ask for a fix (outside RH bugzilla)?

Flags: needinfo?(cfergeau at redhat.com)
[reply] [−]
Private
Comment 28 Cole Robinson 2015-03-31 15:41:02 EDT

Markus, there is an upstream spice w32 agent bug tracker, but no guarantees a
fix will be generated. at least the bug has a better chance of surviving since
this one will be autoclosed when F20 is end-of-life

https://bugs.freedesktop.org/buglist.cgi?component=win32%20agent&product=Spice

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/spice-bugs/attachments/20150602/fc57835f/attachment-0001.html>