[Spice-devel] Regression: qemu crash of hvm domUs with spice (backtrace included)
Stefano Stabellini
stefano.stabellini at eu.citrix.com
Tue May 12 07:44:07 PDT 2015
On Tue, 12 May 2015, Stefano Stabellini wrote:
> On Tue, 12 May 2015, Fabio Fantoni wrote:
> > Il 12/05/2015 12:26, Fabio Fantoni ha scritto:
> > > Il 12/05/2015 11:23, Fabio Fantoni ha scritto:
> > > > Il 11/05/2015 17:04, Fabio Fantoni ha scritto:
> > > > > Il 21/04/2015 14:53, Stefano Stabellini ha scritto:
> > > > > > On Tue, 21 Apr 2015, Fabio Fantoni wrote:
> > > > > > > Il 21/04/2015 12:49, Stefano Stabellini ha scritto:
> > > > > > > > On Mon, 20 Apr 2015, Fabio Fantoni wrote:
> > > > > > > > > I updated xen and qemu from xen 4.5.0 with its upstream qemu
> > > > > > > > > included to
> > > > > > > > > xen
> > > > > > > > > 4.5.1-pre with qemu upstream from stable-4.5 (changed Config.mk
> > > > > > > > > to use
> > > > > > > > > revision "master").
> > > > > > > > > After few minutes I booted windows 7 64 bit domU qemu crash,
> > > > > > > > > tried 2 times
> > > > > > > > > with same result.
> > > > > > > > >
> > > > > > > > > In the domU's qemu log:
> > > > > > > > > > qemu-system-i386: malloc.c:3096: sYSMALLOc: Assertion
> > > > > > > > > > `(old_top ==
> > > > > > > > > > (((mbinptr) (((char *) &((av)->bins[((1) - 1) * 2])) -
> > > > > > > > > > __builtin_offsetof
> > > > > > > > > > (struct malloc_chunk, fd)))) && old_size == 0) || ((unsigned
> > > > > > > > > > long)
> > > > > > > > > > (old_size) >= (unsigned long)((((__builtin_offsetof (struct
> > > > > > > > > > malloc_chunk,
> > > > > > > > > > fd_nextsize))+((2 * (sizeof(size_t))) - 1)) & ~((2 *
> > > > > > > > > > (sizeof(size_t))) -
> > > > > > > > > > 1))) && ((old_top)->size & 0x1) && ((unsigned long)old_end &
> > > > > > > > > > pagemask)
> > > > > > > > > > ==
> > > > > > > > > > 0)' failed.
> > > > > > > > > > Killing all inferiors
> > > > > > > > > In attachment the full backtrace of qemu crash.
> > > > > > > > >
> > > > > > > > > With a fast search after I saw the backtrace I found a probable
> > > > > > > > > cause of
> > > > > > > > > regression (I'm not sure):
> > > > > > > > > http://xenbits.xen.org/gitweb/?p=staging/qemu-upstream-4.5-testing.git;a=commit;h=5c3402816aaddb15156c69df73c54abe4e1c76aa
> > > > > > > > > spice: make sure we don't overflow ssd->buf
> > > > > > > > >
> > > > > > > > > Added also qemu-devel and spice-devel as cc.
> > > > > > > > >
> > > > > > > > > If you need more informations/tests tell me and I'll post them.
> > > > > > > > Maybe you could try to revert the offending commit
> > > > > > > > (5c3402816aaddb15156c69df73c54abe4e1c76aa)? Or even better bisect
> > > > > > > > the
> > > > > > > > crash?
> > > > > > > Thanks for your reply.
> > > > > > >
> > > > > > > I reverted to 4.5.0 on dom0 for now on that system because I'm busy
> > > > > > > trying to
> > > > > > > found another problem that cause very bad performance without errors
> > > > > > > or
> > > > > > > nothing in logs :( I don't know if if xen related, kernel related or
> > > > > > > other for
> > > > > > > now.
> > > > > > >
> > > > > > > About this regression with spice I'll do further tests in next days
> > > > > > > (probably
> > > > > > > starting reverting the spice patch in qemu) but any help is
> > > > > > > appreciated.
> > > > > > > Based on data I have for now is possible that the problem is that
> > > > > > > qemu try to
> > > > > > > allocate other ram or videoram after domU create but with xen is not
> > > > > > > possible?
> > > > > > > In the spice related patch I saw something about dynamic allocation
> > > > > > > for
> > > > > > > example.
> > > > > > It is probably caused by a commit in the range:
> > > > > >
> > > > > > 1ebb75b1fee779621b63e84fefa7b07354c43a99..0b8fb1ec3d666d1eb8bbff56c76c5e6daa2789e4
> > > > > >
> > > > > > there are only 10 commits in that range. By using git bisect you
> > > > > > should
> > > > > > be able to narrow it down in just 3 tests.
> > > > >
> > > > > Sorry for delay, I was busy with many things, today I retried with
> > > > > updated stable-4.5 and also reverting "spice: make sure we don't
> > > > > overflow ssd->buf" (in a second test) but in both case regression remain
> > > > > :(
> > > > > Tomorrow probably I'll do other tests.
> > > >
> > > > I did another test, reverting this instead:
> > > > http://xenbits.xen.org/gitweb/?p=qemu-upstream-4.5-testing.git;a=commit;h=c9ac5f816bf3a8b56f836b078711dcef6e5c90b8
> > > > And now seems I'm unable to reproduce the regression, before happen after
> > > > few seconds up to 1-2 minutes, now I use the same domU 15-20 minutes
> > > > without problem.
> > > > Probably is the cause of regression even if seems strange that on unstable
> > > > with same patch on tests of some days ago didn't happen.
> > > >
> > > > Any ideas?
> > > >
> > > > Thanks for any reply and sorry for my bad english.
> > >
> > > Bad news, qemu crash still happen even if this time in qemu log there is
> > > another output, see attachment.
> > > After take a look on the other patches I saw:
> > > http://xenbits.xen.org/gitweb/?p=qemu-upstream-4.5-testing.git;a=commitdiff;h=7154fba0e51ec985ef621965d1b7120ad424fcbf
> > > With "Conflicts: hw/display/vga.c" in description I'll try to revert it
> > > instead.
> > >
> > > Or someone can tell me another probable test I can try?
> >
> > Tried also to revet the patch above with same result, so I retried with qemu
> > from 4.5.0 and seems the crash happen also in this case...I'm going crazy :(
Sorry, I missed this bit before. The only thing I could suggest at this
point, would be to make sure that you have a clean test environment.
Usually this happens when you have some "leftovers" from previous broken
tests.
More information about the Spice-devel
mailing list