[Spice-devel] chance of infinite-loop in red_worker

Thu Dec 12 04:06:53 PST 2013

Hi all,
I encountered 2 times of infinite-loop in red_worker,
Both happened after I shutdown a windows guest in 'spicy' ,
watching "windows is shutting down... " ,
then close the 'spicy' window.

Usually , qemu process should quit after a few seconds,
but there is a low chance to go infinite.

Here is the detail (long but funny, I think :)

=====================  process status ==============================

top - 17:54:37 up 65 days,  8:02,  3 users,  load average: 1.93, 1.69, 1.68
Tasks: 321 total,   2 running, 316 sleeping,   3 stopped,   0 zombie
Cpu(s):  3.6%us,  1.8%sy,  0.0%ni, 94.6%id,  0.0%wa,  0.0%hi,  0.0%si,
 0.0%st
Mem:  49401784k total, 37162056k used, 12239728k free,   600832k buffers
Swap: 39535612k total,    15148k used, 39520464k free, 28640352k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 7409 root      20   0 1620m 1.1g 7660 S  100  2.4  53:03.95
qemu-system-x86   <== our qemu process (pid 7409) keeps consuming 1 cpu on
host

$ ps -p 7409 -L -l
F S   UID   PID  PPID   LWP  C PRI  NI ADDR SZ WCHAN  TTY          TIME CMD
6 S     0  7409     1  7409  0  80   0 - 414790 unix_s pts/1   00:00:15
qemu-system-x86
3 S     0  7409     1  7431  0  80   0 - 414790 kvm_vc pts/1   00:00:22
qemu-system-x86
3 S     0  7409     1  7432  0  80   0 - 414790 kvm_vc pts/1   00:00:21
qemu-system-x86
3 R     0  7409     1  7433 97  80   0 - 414790 ?     pts/1    02:07:27
qemu-system-x86   <== one thread goes mad :)

$ sudo netstat --tcp -np | grep 7409
tcp        1      0 172.168.12.23:10088     172.168.12.168:33085
 CLOSE_WAIT  7409/qemu-system-x8  <== one of the spice channels failed to
clean up

$ sudo netstat  -nlp | grep 7409
tcp   0      0       0.0.0.0:10088           0.0.0.0:*               LISTEN
     7409/qemu-system-x8
unix  2      [ ACC ] STREAM     LISTENING     6310958  7409/qemu-system-x8
/home/grizzlybears/q131/dig_into/qmon_win7qq

================  full qemu command line  and version
=========================

/home/grizzlybears/q131/qemu/x86_64-softmmu/qemu-system-x86_64
  -enable-kvm -L /home/grizzlybears/q131/qemu-kvm-build/usr/share/qemu
-nodefconfig -nodefaults
  -smp 2,sockets=2,cores=8,threads=1 -m 1024
  -drive if=virtio,aio=native,file=/home/grizzlybears/images/win7_qq.qcow2
  -spice
port=10088,addr=0.0.0.0,disable-ticketing,image-compression=auto_glz,streaming-video=filter,disable-copy-paste
  -vga qxl
  -soundhw hda
  -readconfig /home/grizzlybears/q131/qemu/docs/ich9-ehci-uhci.cfg
  -chardev spicevmc,id=charredir0,name=usbredir -device
usb-redir,chardev=charredir0,id=redir0
  -device usb-tablet,id=input0,bus=ehci.0
  -netdev
tap,id=nic1,script=/home/grizzlybears/q131/qemu-ifup,downscript=/home/grizzlybears/q131/qemu-ifdown
-device virtio-net-pci,netdev=nic1,mac=52:54:00:fb:d5:88
  -balloon none -no-hpet -rtc base=localtime,driftfix=none -global
kvm-pit.lost_tick_policy=discard
  -chardev
socket,id=charmonitor,path=/home/grizzlybears/q131/dig_into/qmon_win7qq,server,nowait
-mon chardev=charmonitor,id=monitor,mode=readline

BTW, my source version:
qemu 1.3.1
spice 0.12.3

==============  then  profiling ======================

I use 'perf' to inspect qemu, and found something interesting.

   1 # Events: 78K cycles
   2 #
   3 # Overhead          Command         Shared Object
                     Symbol
   4 # ........  ...............  ....................
 ..............................................
   5 #
   6      9.73%  qemu-system-x86  [kernel.kallsyms]     [k] fget_light
   7             |
   8             --- fget_light
   9                |
  10                |--97.60%-- sys_ioctl
  11                |          system_call_fastpath
  12                |          __ioctl
  13                |          |
  14                |           --100.00%-- spice_timer_queue_cb
  15                |                     red_worker_main
  16                |                     start_thread
  17                |
  18                 --2.40%-- system_call_fastpath
  19                           __ioctl
  20                           |
  21                            --100.00%-- spice_timer_queue_cb
  22                                      red_worker_main
  23                                      start_thread
  24
  25      6.47%  qemu-system-x86  qemu-system-x86_64    [.] ring_add
  26             |
  27             --- ring_add
  28                |
  29                |--96.68%-- ring_add_before
  30                |          _spice_timer_set
  31                |          |
  32                |          |--52.97%-- spice_timer_set
  33                |          |
 red_channel_client_start_ping_timer
  34                |          |          red_channel_client_ping_timer
  35                |          |          spice_timer_queue_cb
  36                |          |          red_worker_main
  37                |          |          start_thread
  38                |          |
  39                |           --47.03%-- spice_timer_queue_cb
...

The whole profiling graph is very long and I attached it in this msg.
>From profiling graph, we can see that , almost in all the pathes ,
there is  'spice_timer_queue_cb()' and 'red_worker_main()'

============= So , let's glimpse at red_worker_main() =============

SPICE_GNUC_NORETURN void *red_worker_main(void *arg)
{
    RedWorker *worker = spice_malloc(sizeof(RedWorker));

    do some initialize ...

    worker->event_timeout = INF_EVENT_WAIT;
    for (;;) {    <== big loop
        int i, num_events;
        unsigned int timers_queue_timeout;

        timers_queue_timeout = spice_timer_queue_get_timeout_ms();
        worker->event_timeout = MIN(red_get_streams_timout(worker),
worker->event_timeout);
        worker->event_timeout = MIN(timers_queue_timeout,
worker->event_timeout);
        num_events = poll(worker->poll_fds, MAX_EVENT_SOURCES,
worker->event_timeout);
        red_handle_streams_timout(worker);

        spice_timer_queue_cb();   <== Yes, here is where cpu  is actualy
consumed

        if (worker->display_channel) {
            /* during migration, in the dest, the display channel can be
initialized
               while the global lz data not since migrate data msg hasn't
been  received yet */

red_channel_apply_clients(&worker->display_channel->common.base,
 red_display_cc_free_glz_drawables);
        }

        worker->event_timeout = INF_EVENT_WAIT;
        if (num_events == -1) {
            if (errno != EINTR) {   spice_error("poll failed, %s",
strerror(errno));  }
        }

        for (i = 0; i < MAX_EVENT_SOURCES; i++) {
            process any 'ready' io watchers  ...
       }

       Clear the poll_fd for any removed watches ...

       if (worker->running) {
            int ring_is_empty;
            red_process_cursor(worker, MAX_PIPE_SIZE, &ring_is_empty);
            red_process_commands(worker, MAX_PIPE_SIZE, &ring_is_empty);
        }
        red_push(worker);
    }
    abort();
}

=============  gdb failed to attach to the live process, I dont know why,
so I made a coredump and  gdb it ==========

(gdb) thread 2
[Switching to thread 2 (Thread 7433)]#0  0x00007fa93df59c87 in ioctl () at
../sysdeps/unix/syscall-template.S:82
82 ../sysdeps/unix/syscall-template.S: No such file or directory.
in ../sysdeps/unix/syscall-template.S
(gdb) bt
#0  0x00007fa93df59c87 in ioctl () at ../sysdeps/unix/syscall-template.S:82
#1  0x00007fa9410000d3 in red_channel_client_ping_timer
(opaque=0x7fa8e4070070) at red_channel.c:752
#2  0x00007fa940f45370 in spice_timer_queue_cb () at spice_timer_queue.c:262
#3  0x00007fa940f289e6 in red_worker_main (arg=0x7fffed47bd00) at
red_worker.c:12265
#4  0x00007fa93ec1a9ca in start_thread (arg=<value optimized out>) at
pthread_create.c:300
#5  0x00007fa93df6221d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#6  0x0000000000000000 in ?? ()
(gdb) frame 3
#3  0x00007fa940f289e6 in red_worker_main (arg=0x7fffed47bd00) at
red_worker.c:12265
12265        spice_timer_queue_cb();
(gdb) l
12260        timers_queue_timeout = spice_timer_queue_get_timeout_ms();
12261        worker->event_timeout = MIN(red_get_streams_timout(worker),
worker->event_timeout);
12262        worker->event_timeout = MIN(timers_queue_timeout,
worker->event_timeout);
12263        num_events = poll(worker->poll_fds, MAX_EVENT_SOURCES,
worker->event_timeout);
12264        red_handle_streams_timout(worker);
12265        spice_timer_queue_cb();
12266
12267        if (worker->display_channel) {
12268            /* during migration, in the dest, the display channel can
be initialized
12269               while the global lz data not since migrate data msg
hasn't been
(gdb) print num_events
$1 = 0     <== no fd is ready, poll() returned for timeout

(gdb) print worker->poll_fds
$4 = {{fd = 20, events = 1, revents = 0}, {fd = 29, events = 5, revents =
0}, {fd = 31, events = 1, revents = 0}, {fd = -1, events = 0, revents = 0}
<repeats 17 times>}
ll /proc/7409/fds   (before I made the coredump:)
lrwx------ 1 root root 64 2013-12-12 18:08 20 -> socket:[6309964]
lrwx------ 1 root root 64 2013-12-12 18:08 29 -> socket:[6299877]
lrwx------ 1 root root 64 2013-12-12 18:08 31 -> socket:[6261651]

(gdb) print worker->event_timeout
$5 = 10  <==  curiously number

(gdb) print worker->running
$7 = 1

=========== Finally , my question is:  ==================

1. The big loop has no 'exit', even 'thread_exit' is not found in spice
source.
How does red_worker thread quit in normal situation , or it just runs until
qemu main thread quit?

2. In my understanding, spice communicate with spice-gtk via tcp,
but here we dont see tcp fds, but 3 unix sockets, what does those unix
sockets for?

3. Despite of that 'poll() timeout' is non-zero,  the big loop still
consumes 100% cpu,
does this mean there is something wrong in timer func --
red_channel_client_ping_timer() ?

Thanks for reading,
more thanks for your hint and idea:)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/spice-devel/attachments/20131212/a22db43d/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: qemu_hung_on_vm_shutdown
Type: application/octet-stream
Size: 115231 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/spice-devel/attachments/20131212/a22db43d/attachment-0001.obj>