[Spice-devel] Regression: qemu crash of hvm domUs with spice (backtrace included)

Stefano Stabellini stefano.stabellini at eu.citrix.com
Fri May 15 03:26:29 PDT 2015


On Wed, 13 May 2015, Fabio Fantoni wrote:
> Il 12/05/2015 16:44, Stefano Stabellini ha scritto:
> > On Tue, 12 May 2015, Stefano Stabellini wrote:
> > > On Tue, 12 May 2015, Fabio Fantoni wrote:
> > > > Il 12/05/2015 12:26, Fabio Fantoni ha scritto:
> > > > > Il 12/05/2015 11:23, Fabio Fantoni ha scritto:
> > > > > > Il 11/05/2015 17:04, Fabio Fantoni ha scritto:
> > > > > > > Il 21/04/2015 14:53, Stefano Stabellini ha scritto:
> > > > > > > > On Tue, 21 Apr 2015, Fabio Fantoni wrote:
> > > > > > > > > Il 21/04/2015 12:49, Stefano Stabellini ha scritto:
> > > > > > > > > > On Mon, 20 Apr 2015, Fabio Fantoni wrote:
> > > > > > > > > > > I updated xen and qemu from xen 4.5.0 with its upstream
> > > > > > > > > > > qemu
> > > > > > > > > > > included to
> > > > > > > > > > > xen
> > > > > > > > > > > 4.5.1-pre with qemu upstream from stable-4.5 (changed
> > > > > > > > > > > Config.mk
> > > > > > > > > > > to use
> > > > > > > > > > > revision "master").
> > > > > > > > > > > After few minutes I booted windows 7 64 bit domU qemu
> > > > > > > > > > > crash,
> > > > > > > > > > > tried 2 times
> > > > > > > > > > > with same result.
> > > > > > > > > > > 
> > > > > > > > > > > In the domU's qemu log:
> > > > > > > > > > > > qemu-system-i386: malloc.c:3096: sYSMALLOc: Assertion
> > > > > > > > > > > > `(old_top ==
> > > > > > > > > > > > (((mbinptr) (((char *) &((av)->bins[((1) - 1) * 2])) -
> > > > > > > > > > > > __builtin_offsetof
> > > > > > > > > > > > (struct malloc_chunk, fd)))) && old_size == 0) ||
> > > > > > > > > > > > ((unsigned
> > > > > > > > > > > > long)
> > > > > > > > > > > > (old_size) >= (unsigned long)((((__builtin_offsetof
> > > > > > > > > > > > (struct
> > > > > > > > > > > > malloc_chunk,
> > > > > > > > > > > > fd_nextsize))+((2 * (sizeof(size_t))) - 1)) & ~((2 *
> > > > > > > > > > > > (sizeof(size_t))) -
> > > > > > > > > > > > 1))) && ((old_top)->size & 0x1) && ((unsigned
> > > > > > > > > > > > long)old_end &
> > > > > > > > > > > > pagemask)
> > > > > > > > > > > > ==
> > > > > > > > > > > > 0)' failed.
> > > > > > > > > > > > Killing all inferiors
> > > > > > > > > > > In attachment the full backtrace of qemu crash.
> > > > > > > > > > > 
> > > > > > > > > > > With a fast search after I saw the backtrace I found a
> > > > > > > > > > > probable
> > > > > > > > > > > cause of
> > > > > > > > > > > regression (I'm not sure):
> > > > > > > > > > > http://xenbits.xen.org/gitweb/?p=staging/qemu-upstream-4.5-testing.git;a=commit;h=5c3402816aaddb15156c69df73c54abe4e1c76aa
> > > > > > > > > > > spice: make sure we don't overflow ssd->buf
> > > > > > > > > > > 
> > > > > > > > > > > Added also qemu-devel and spice-devel as cc.
> > > > > > > > > > > 
> > > > > > > > > > > If you need more informations/tests tell me and I'll post
> > > > > > > > > > > them.
> > > > > > > > > >     Maybe you could try to revert the offending commit
> > > > > > > > > > (5c3402816aaddb15156c69df73c54abe4e1c76aa)? Or even better
> > > > > > > > > > bisect
> > > > > > > > > > the
> > > > > > > > > > crash?
> > > > > > > > > Thanks for your reply.
> > > > > > > > > 
> > > > > > > > > I reverted to 4.5.0 on dom0 for now on that system because I'm
> > > > > > > > > busy
> > > > > > > > > trying to
> > > > > > > > > found another problem that cause very bad performance without
> > > > > > > > > errors
> > > > > > > > > or
> > > > > > > > > nothing in logs :( I don't know if if xen related, kernel
> > > > > > > > > related or
> > > > > > > > > other for
> > > > > > > > > now.
> > > > > > > > > 
> > > > > > > > > About this regression with spice I'll do further tests in next
> > > > > > > > > days
> > > > > > > > > (probably
> > > > > > > > > starting reverting the spice patch in qemu) but any help is
> > > > > > > > > appreciated.
> > > > > > > > > Based on data I have for now is possible that the problem is
> > > > > > > > > that
> > > > > > > > > qemu try to
> > > > > > > > > allocate other ram or videoram after domU create but with xen
> > > > > > > > > is not
> > > > > > > > > possible?
> > > > > > > > > In the spice related patch I saw something about dynamic
> > > > > > > > > allocation
> > > > > > > > > for
> > > > > > > > > example.
> > > > > > > > It is probably caused by a commit in the range:
> > > > > > > > 
> > > > > > > > 1ebb75b1fee779621b63e84fefa7b07354c43a99..0b8fb1ec3d666d1eb8bbff56c76c5e6daa2789e4
> > > > > > > > 
> > > > > > > > there are only 10 commits in that range. By using git bisect you
> > > > > > > > should
> > > > > > > > be able to narrow it down in just 3 tests.
> > > > > > > Sorry for delay, I was busy with many things, today I retried with
> > > > > > > updated stable-4.5 and also reverting "spice: make sure we don't
> > > > > > > overflow ssd->buf" (in a second test) but in both case regression
> > > > > > > remain
> > > > > > > :(
> > > > > > > Tomorrow probably I'll do other tests.
> > > > > > I did another test, reverting this instead:
> > > > > > http://xenbits.xen.org/gitweb/?p=qemu-upstream-4.5-testing.git;a=commit;h=c9ac5f816bf3a8b56f836b078711dcef6e5c90b8
> > > > > > And now seems I'm unable to reproduce the regression, before happen
> > > > > > after
> > > > > > few seconds up to 1-2 minutes, now I use the same domU 15-20 minutes
> > > > > > without problem.
> > > > > > Probably is the cause of regression even if seems strange that on
> > > > > > unstable
> > > > > > with same patch on tests of some days ago didn't happen.
> > > > > > 
> > > > > > Any ideas?
> > > > > > 
> > > > > > Thanks for any reply and sorry for my bad english.
> > > > > Bad news, qemu crash still happen even if this time in qemu log there
> > > > > is
> > > > > another output, see attachment.
> > > > > After take a look on the other patches I saw:
> > > > > http://xenbits.xen.org/gitweb/?p=qemu-upstream-4.5-testing.git;a=commitdiff;h=7154fba0e51ec985ef621965d1b7120ad424fcbf
> > > > > With "Conflicts: hw/display/vga.c" in description I'll try to revert
> > > > > it
> > > > > instead.
> > > > > 
> > > > > Or someone can tell me another probable test I can try?
> > > > Tried also to revet the patch above with same result, so I retried with
> > > > qemu
> > > > from 4.5.0 and seems the crash happen also in this case...I'm going
> > > > crazy :(
> > Sorry, I missed this bit before. The only thing I could suggest at this
> > point, would be to make sure that you have a clean test environment.
> > Usually this happens when you have some "leftovers" from previous broken
> > tests.
> 
> I use make debball to be sure to track and remove all files on package update.
> Now I retried with latest xen-unstable and the qemu crash didn't happen, more
> exactly I used this:
> https://github.com/Fantu/Xen/commits/rebase/m2r-staging
> Latest test with regression based on latest stable-4.5, more exactly:
> https://github.com/Fantu/Xen/commits/rebase/m2r-testing
> Some days ago on same dom0 and domU I tried with latest stable version (that I
> use on only 2 production servers for now but I not saw the regression), more
> exactly:
> https://github.com/Fantu/Xen/commits/rebase/m2r-stable-4.5
> Dom0 debian 7 with kernel 3.16 from backports, seabios 1.8.1-2 from unstable
> and this xen configure:
> ./configure --prefix=/usr --disable-blktap1 --disable-qemu-traditional
> --disable-rombios --with-system-seabios=/usr/share/seabios/bios-256k.bin
> --with-extra-qemuu-configure-args="--enable-spice --enable-usb-redir"
> --disable-blktap2
> 
> I suppose that there is unexpected case caused by a backports or missed
> patch/es to backports from unstable.
> I not found with a fast look rilevant patch to try to revert, can anyone
> suggest me the more probable point/s for bisect and/or patch to revert or I
> must try full bisect 4.5.0->stable-4.5?

It is possible that this is an intermittent bug: it only shows once
every so many tests. I think you need to go over the full range
4.5.0->stable-4.5. Also it is important that you start from a working
baseline: you need to be sure that 4.5.0 works as expected.


More information about the Spice-devel mailing list