[Spice-devel] Postcopy+spice crash
Dr. David Alan Gilbert
dgilbert at redhat.com
Tue Dec 6 10:53:45 UTC 2016
* Gerd Hoffmann (kraxel at redhat.com) wrote:
> Hi,
>
> > >> On a quick glance I'd blame the guest for sending corrupted commands.
> > >> Strange though that it happens on migration only, so there could be
> > >> a host issue too. Or a timing issue triggered by migration.
> > >>
> > >> Which migration phase?
> > >
> > > This is the point at which it switches over in postcopy.
> >
> > It looks like it's the vmstate (post) load phase of the qxl device on
> > destination host.
>
> Dave, can you try "thread apply all bt" so we see the other threads too?
> That should show whenever it happens in post_load
Yes, I already have the full set of threads; you can see the qxl_post_load in
thread 1.
red_dispatcher_loadvm_commands:
id 0, group 0, virt start 0, virt end ffffffffffffffff, generation 0, delta 0
id 1, group 1, virt start 7fbe83c00000, virt end 7fbe87bfe000, generation 0, delta 7fbe83c00000
id 2, group 1, virt start 7fbe7fa00000, virt end 7fbe83a00000, generation 0, delta 7fbe7fa00000
(./x86_64-softmmu/qemu-system-x86_64:22376): Spice-CRITICAL **: red_memslots.c:123:get_virt: slot_id 128 too big, addr=8000000000000000
Thread 12 (Thread 0x7fc0a0df2700 (LWP 22377)):
#0 0x00007fc0aa42f1bd in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007fc0aa42ad02 in _L_lock_791 () from /lib64/libpthread.so.0
#2 0x00007fc0aa42ac08 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x0000556465736839 in qemu_mutex_lock (mutex=mutex at entry=0x556465d76120 <qemu_global_mutex>) at /root/git/qemu/util/qemu-thread-posix.c:64
#4 0x00005564653e69d6 in qemu_mutex_lock_iothread () at /root/git/qemu/cpus.c:1296
#5 0x000055646574596e in call_rcu_thread (opaque=<optimized out>) at /root/git/qemu/util/rcu.c:257
#6 0x00007fc0aa428dc5 in start_thread () from /lib64/libpthread.so.0
#7 0x00007fc0a61786ed in clone () from /lib64/libc.so.6
Thread 11 (Thread 0x7fc09f304700 (LWP 22379)):
#0 0x00007fc0aa42c6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x0000556465736999 in qemu_cond_wait (cond=<optimized out>, mutex=mutex at entry=0x556465d76120 <qemu_global_mutex>) at /root/git/qemu/util/qemu-thread-posix.c:137
#2 0x00005564653e6fe3 in qemu_kvm_wait_io_event (cpu=<optimized out>) at /root/git/qemu/cpus.c:964
#3 qemu_kvm_cpu_thread_fn (arg=0x556466688740) at /root/git/qemu/cpus.c:1003
#4 0x00007fc0aa428dc5 in start_thread () from /lib64/libpthread.so.0
#5 0x00007fc0a61786ed in clone () from /lib64/libc.so.6
Thread 10 (Thread 0x7fc09eb03700 (LWP 22380)):
#0 0x00007fc0aa42c6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x0000556465736999 in qemu_cond_wait (cond=<optimized out>, mutex=mutex at entry=0x556465d76120 <qemu_global_mutex>) at /root/git/qemu/util/qemu-thread-posix.c:137
#2 0x00005564653e6fe3 in qemu_kvm_wait_io_event (cpu=<optimized out>) at /root/git/qemu/cpus.c:964
#3 qemu_kvm_cpu_thread_fn (arg=0x5564666ea960) at /root/git/qemu/cpus.c:1003
#4 0x00007fc0aa428dc5 in start_thread () from /lib64/libpthread.so.0
#5 0x00007fc0a61786ed in clone () from /lib64/libc.so.6
Thread 9 (Thread 0x7fc09e302700 (LWP 22381)):
#0 0x00007fc0aa42c6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x0000556465736999 in qemu_cond_wait (cond=<optimized out>, mutex=mutex at entry=0x556465d76120 <qemu_global_mutex>) at /root/git/qemu/util/qemu-thread-posix.c:137
#2 0x00005564653e6fe3 in qemu_kvm_wait_io_event (cpu=<optimized out>) at /root/git/qemu/cpus.c:964
#3 qemu_kvm_cpu_thread_fn (arg=0x55646670a120) at /root/git/qemu/cpus.c:1003
#4 0x00007fc0aa428dc5 in start_thread () from /lib64/libpthread.so.0
#5 0x00007fc0a61786ed in clone () from /lib64/libc.so.6
Thread 8 (Thread 0x7fc09db01700 (LWP 22382)):
#0 0x00007fc0aa42c6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x0000556465736999 in qemu_cond_wait (cond=<optimized out>, mutex=mutex at entry=0x556465d76120 <qemu_global_mutex>) at /root/git/qemu/util/qemu-thread-posix.c:137
#2 0x00005564653e6fe3 in qemu_kvm_wait_io_event (cpu=<optimized out>) at /root/git/qemu/cpus.c:964
#3 qemu_kvm_cpu_thread_fn (arg=0x5564667298d0) at /root/git/qemu/cpus.c:1003
#4 0x00007fc0aa428dc5 in start_thread () from /lib64/libpthread.so.0
#5 0x00007fc0a61786ed in clone () from /lib64/libc.so.6
Thread 7 (Thread 0x7fbe7f9ff700 (LWP 22383)):
#0 0x00007fc0aa42f49d in read () from /lib64/libpthread.so.0
#1 0x00007fc0a8c36c01 in spice_backtrace_gstack () from /lib64/libspice-server.so.1
#2 0x00007fc0a8c3e4f7 in spice_logv () from /lib64/libspice-server.so.1
#3 0x00007fc0a8c3e655 in spice_log () from /lib64/libspice-server.so.1
#4 0x00007fc0a8bfc6de in get_virt () from /lib64/libspice-server.so.1
#5 0x00007fc0a8bfcb73 in red_get_data_chunks_ptr () from /lib64/libspice-server.so.1
#6 0x00007fc0a8bff3fa in red_get_cursor_cmd () from /lib64/libspice-server.so.1
#7 0x00007fc0a8c0fd79 in handle_dev_loadvm_commands () from /lib64/libspice-server.so.1
#8 0x00007fc0a8bf9523 in dispatcher_handle_recv_read () from /lib64/libspice-server.so.1
#9 0x00007fc0a8c1d5a5 in red_worker_main () from /lib64/libspice-server.so.1
#10 0x00007fc0aa428dc5 in start_thread () from /lib64/libpthread.so.0
#11 0x00007fc0a61786ed in clone () from /lib64/libc.so.6
Thread 6 (Thread 0x7fbe7efff700 (LWP 22385)):
#0 0x00007fc0aa42c6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x0000556465736999 in qemu_cond_wait (cond=cond at entry=0x556466ecde00, mutex=mutex at entry=0x556466ecde30) at /root/git/qemu/util/qemu-thread-posix.c:137
#2 0x0000556465680c5b in vnc_worker_thread_loop (queue=queue at entry=0x556466ecde00) at /root/git/qemu/ui/vnc-jobs.c:228
#3 0x0000556465681198 in vnc_worker_thread (arg=0x556466ecde00) at /root/git/qemu/ui/vnc-jobs.c:335
#4 0x00007fc0aa428dc5 in start_thread () from /lib64/libpthread.so.0
#5 0x00007fc0a61786ed in clone () from /lib64/libc.so.6
Thread 5 (Thread 0x7fc0a05f1700 (LWP 22958)):
#0 0x00007fc0aa42c6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x0000556465736999 in qemu_cond_wait (cond=cond at entry=0x556467b89730, mutex=mutex at entry=0x556467b89708) at /root/git/qemu/util/qemu-thread-posix.c:137
#2 0x000055646540a643 in do_data_decompress (opaque=0x556467b89700) at /root/git/qemu/migration/ram.c:2284
#3 0x00007fc0aa428dc5 in start_thread () from /lib64/libpthread.so.0
#4 0x00007fc0a61786ed in clone () from /lib64/libc.so.6
Thread 4 (Thread 0x7fbe7dfff700 (LWP 22959)):
#0 0x00007fc0aa42c6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x0000556465736999 in qemu_cond_wait (cond=cond at entry=0x556467b897a8, mutex=mutex at entry=0x556467b89780) at /root/git/qemu/util/qemu-thread-posix.c:137
#2 0x000055646540a643 in do_data_decompress (opaque=0x556467b89778) at /root/git/qemu/migration/ram.c:2284
#3 0x00007fc0aa428dc5 in start_thread () from /lib64/libpthread.so.0
#4 0x00007fc0a61786ed in clone () from /lib64/libc.so.6
Thread 3 (Thread 0x7fbe7d7fe700 (LWP 22967)):
#0 0x00007fc0a616ddad in poll () from /lib64/libc.so.6
#1 0x0000556465631868 in poll (__timeout=-1, __nfds=2, __fds=0x7fbe7d7fd990) at /usr/include/bits/poll2.h:46
#2 postcopy_ram_fault_thread (opaque=0x556466c93f10) at /root/git/qemu/migration/postcopy-ram.c:405
#3 0x00007fc0aa428dc5 in start_thread () from /lib64/libpthread.so.0
#4 0x00007fc0a61786ed in clone () from /lib64/libc.so.6
Thread 2 (Thread 0x7fbe7cffd700 (LWP 22968)):
#0 0x00007fc0a6172ba9 in syscall () from /lib64/libc.so.6
#1 0x0000556465736ca6 in futex_wait (val=<optimized out>, ev=<optimized out>) at /root/git/qemu/util/qemu-thread-posix.c:306
#2 qemu_event_wait (ev=ev at entry=0x556466c93f18) at /root/git/qemu/util/qemu-thread-posix.c:422
#3 0x000055646541044d in postcopy_ram_listen_thread (opaque=0x556467e45740) at /root/git/qemu/migration/savevm.c:1485
#4 0x00007fc0aa428dc5 in start_thread () from /lib64/libpthread.so.0
#5 0x00007fc0a61786ed in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x7fc0aead5c40 (LWP 22376)):
#0 0x00007fc0aa42f49d in read () from /lib64/libpthread.so.0
#1 0x00007fc0a8bf9264 in read_safe () from /lib64/libspice-server.so.1
#2 0x00007fc0a8bf9717 in dispatcher_send_message () from /lib64/libspice-server.so.1
#3 0x00007fc0a8bfa0c2 in red_dispatcher_loadvm_commands () from /lib64/libspice-server.so.1
#4 0x000055646556c03d in qxl_spice_loadvm_commands (qxl=qxl at entry=0x55646755b8c0, ext=ext at entry=0x556467a895a0, count=2) at /root/git/qemu/hw/display/qxl.c:219
#5 0x000055646556d15f in qxl_post_load (opaque=0x55646755b8c0, version=<optimized out>) at /root/git/qemu/hw/display/qxl.c:2212
#6 0x000055646562f1b8 in vmstate_load_state (f=f at entry=0x5564666347d0, vmsd=<optimized out>, opaque=0x55646755b8c0, version_id=version_id at entry=21) at /root/git/qemu/migration/vmstate.c:151
#7 0x000055646540f4a1 in vmstate_load (f=0x5564666347d0, se=0x5564676f90a0, version_id=21) at /root/git/qemu/migration/savevm.c:690
#8 0x000055646540f6db in qemu_loadvm_section_start_full (f=f at entry=0x5564666347d0, mis=mis at entry=0x556466c93f10) at /root/git/qemu/migration/savevm.c:1843
#9 0x000055646540f9ac in qemu_loadvm_state_main (f=f at entry=0x5564666347d0, mis=mis at entry=0x556466c93f10) at /root/git/qemu/migration/savevm.c:1900
#10 0x000055646540fd8f in loadvm_handle_cmd_packaged (mis=0x556466c93f10) at /root/git/qemu/migration/savevm.c:1660
#11 loadvm_process_command (f=0x556467e45740) at /root/git/qemu/migration/savevm.c:1723
#12 qemu_loadvm_state_main (f=f at entry=0x556467e45740, mis=mis at entry=0x556466c93f10) at /root/git/qemu/migration/savevm.c:1913
#13 0x0000556465412546 in qemu_loadvm_state (f=f at entry=0x556467e45740) at /root/git/qemu/migration/savevm.c:1973
#14 0x000055646562b4e8 in process_incoming_migration_co (opaque=0x556467e45740) at /root/git/qemu/migration/migration.c:394
#15 0x0000556465746ada in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at /root/git/qemu/util/coroutine-ucontext.c:79
#16 0x00007fc0a60c7cf0 in ?? () from /lib64/libc.so.6
#17 0x00007ffe14885180 in ?? ()
#18 0x0000000000000000 in ?? ()
> > Maybe if you trace qxl device save/load related functions
> > on both src and dst hosts you'll see a difference.
>
> qxl keeps references to certain commands (create surface for example) in
> qxl device memory, so it can replay them in post_load. That possibly
> doesn't work correctly with postcopy.
It should; the device memory is just a RAMBlock that's migrated, so if it's
not arrived yet from the source the qxl code will block until postcopy
drags it across; assuming that is that the qxl code on the source isn't
still trying to write to it's copy at the same time, which at this
point it shouldn't.
Dave
> cheers,
> Gerd
>
--
Dr. David Alan Gilbert / dgilbert at redhat.com / Manchester, UK
More information about the Spice-devel
mailing list