[Spice-devel] Regression: qemu crash of hvm domUs with spice (backtrace included)

Fabio Fantoni fabio.fantoni at m2r.biz
Wed May 13 06:29:30 PDT 2015


Il 12/05/2015 16:44, Stefano Stabellini ha scritto:
> On Tue, 12 May 2015, Stefano Stabellini wrote:
>> On Tue, 12 May 2015, Fabio Fantoni wrote:
>>> Il 12/05/2015 12:26, Fabio Fantoni ha scritto:
>>>> Il 12/05/2015 11:23, Fabio Fantoni ha scritto:
>>>>> Il 11/05/2015 17:04, Fabio Fantoni ha scritto:
>>>>>> Il 21/04/2015 14:53, Stefano Stabellini ha scritto:
>>>>>>> On Tue, 21 Apr 2015, Fabio Fantoni wrote:
>>>>>>>> Il 21/04/2015 12:49, Stefano Stabellini ha scritto:
>>>>>>>>> On Mon, 20 Apr 2015, Fabio Fantoni wrote:
>>>>>>>>>> I updated xen and qemu from xen 4.5.0 with its upstream qemu
>>>>>>>>>> included to
>>>>>>>>>> xen
>>>>>>>>>> 4.5.1-pre with qemu upstream from stable-4.5 (changed Config.mk
>>>>>>>>>> to use
>>>>>>>>>> revision "master").
>>>>>>>>>> After few minutes I booted windows 7 64 bit domU qemu crash,
>>>>>>>>>> tried 2 times
>>>>>>>>>> with same result.
>>>>>>>>>>
>>>>>>>>>> In the domU's qemu log:
>>>>>>>>>>> qemu-system-i386: malloc.c:3096: sYSMALLOc: Assertion
>>>>>>>>>>> `(old_top ==
>>>>>>>>>>> (((mbinptr) (((char *) &((av)->bins[((1) - 1) * 2])) -
>>>>>>>>>>> __builtin_offsetof
>>>>>>>>>>> (struct malloc_chunk, fd)))) && old_size == 0) || ((unsigned
>>>>>>>>>>> long)
>>>>>>>>>>> (old_size) >= (unsigned long)((((__builtin_offsetof (struct
>>>>>>>>>>> malloc_chunk,
>>>>>>>>>>> fd_nextsize))+((2 * (sizeof(size_t))) - 1)) & ~((2 *
>>>>>>>>>>> (sizeof(size_t))) -
>>>>>>>>>>> 1))) && ((old_top)->size & 0x1) && ((unsigned long)old_end &
>>>>>>>>>>> pagemask)
>>>>>>>>>>> ==
>>>>>>>>>>> 0)' failed.
>>>>>>>>>>> Killing all inferiors
>>>>>>>>>> In attachment the full backtrace of qemu crash.
>>>>>>>>>>
>>>>>>>>>> With a fast search after I saw the backtrace I found a probable
>>>>>>>>>> cause of
>>>>>>>>>> regression (I'm not sure):
>>>>>>>>>> http://xenbits.xen.org/gitweb/?p=staging/qemu-upstream-4.5-testing.git;a=commit;h=5c3402816aaddb15156c69df73c54abe4e1c76aa
>>>>>>>>>> spice: make sure we don't overflow ssd->buf
>>>>>>>>>>
>>>>>>>>>> Added also qemu-devel and spice-devel as cc.
>>>>>>>>>>
>>>>>>>>>> If you need more informations/tests tell me and I'll post them.
>>>>>>>>>     Maybe you could try to revert the offending commit
>>>>>>>>> (5c3402816aaddb15156c69df73c54abe4e1c76aa)? Or even better bisect
>>>>>>>>> the
>>>>>>>>> crash?
>>>>>>>> Thanks for your reply.
>>>>>>>>
>>>>>>>> I reverted to 4.5.0 on dom0 for now on that system because I'm busy
>>>>>>>> trying to
>>>>>>>> found another problem that cause very bad performance without errors
>>>>>>>> or
>>>>>>>> nothing in logs :( I don't know if if xen related, kernel related or
>>>>>>>> other for
>>>>>>>> now.
>>>>>>>>
>>>>>>>> About this regression with spice I'll do further tests in next days
>>>>>>>> (probably
>>>>>>>> starting reverting the spice patch in qemu) but any help is
>>>>>>>> appreciated.
>>>>>>>> Based on data I have for now is possible that the problem is that
>>>>>>>> qemu try to
>>>>>>>> allocate other ram or videoram after domU create but with xen is not
>>>>>>>> possible?
>>>>>>>> In the spice related patch I saw something about dynamic allocation
>>>>>>>> for
>>>>>>>> example.
>>>>>>> It is probably caused by a commit in the range:
>>>>>>>
>>>>>>> 1ebb75b1fee779621b63e84fefa7b07354c43a99..0b8fb1ec3d666d1eb8bbff56c76c5e6daa2789e4
>>>>>>>
>>>>>>> there are only 10 commits in that range. By using git bisect you
>>>>>>> should
>>>>>>> be able to narrow it down in just 3 tests.
>>>>>> Sorry for delay, I was busy with many things, today I retried with
>>>>>> updated stable-4.5 and also reverting "spice: make sure we don't
>>>>>> overflow ssd->buf" (in a second test) but in both case regression remain
>>>>>> :(
>>>>>> Tomorrow probably I'll do other tests.
>>>>> I did another test, reverting this instead:
>>>>> http://xenbits.xen.org/gitweb/?p=qemu-upstream-4.5-testing.git;a=commit;h=c9ac5f816bf3a8b56f836b078711dcef6e5c90b8
>>>>> And now seems I'm unable to reproduce the regression, before happen after
>>>>> few seconds up to 1-2 minutes, now I use the same domU 15-20 minutes
>>>>> without problem.
>>>>> Probably is the cause of regression even if seems strange that on unstable
>>>>> with same patch on tests of some days ago didn't happen.
>>>>>
>>>>> Any ideas?
>>>>>
>>>>> Thanks for any reply and sorry for my bad english.
>>>> Bad news, qemu crash still happen even if this time in qemu log there is
>>>> another output, see attachment.
>>>> After take a look on the other patches I saw:
>>>> http://xenbits.xen.org/gitweb/?p=qemu-upstream-4.5-testing.git;a=commitdiff;h=7154fba0e51ec985ef621965d1b7120ad424fcbf
>>>> With "Conflicts: hw/display/vga.c" in description I'll try to revert it
>>>> instead.
>>>>
>>>> Or someone can tell me another probable test I can try?
>>> Tried also to revet the patch above with same result, so I retried with qemu
>>> from 4.5.0 and seems the crash happen also in this case...I'm going crazy :(
> Sorry, I missed this bit before. The only thing I could suggest at this
> point, would be to make sure that you have a clean test environment.
> Usually this happens when you have some "leftovers" from previous broken
> tests.

I use make debball to be sure to track and remove all files on package 
update.
Now I retried with latest xen-unstable and the qemu crash didn't happen, 
more exactly I used this:
https://github.com/Fantu/Xen/commits/rebase/m2r-staging
Latest test with regression based on latest stable-4.5, more exactly:
https://github.com/Fantu/Xen/commits/rebase/m2r-testing
Some days ago on same dom0 and domU I tried with latest stable version 
(that I use on only 2 production servers for now but I not saw the 
regression), more exactly:
https://github.com/Fantu/Xen/commits/rebase/m2r-stable-4.5
Dom0 debian 7 with kernel 3.16 from backports, seabios 1.8.1-2 from 
unstable and this xen configure:
./configure --prefix=/usr --disable-blktap1 --disable-qemu-traditional 
--disable-rombios --with-system-seabios=/usr/share/seabios/bios-256k.bin 
--with-extra-qemuu-configure-args="--enable-spice --enable-usb-redir" 
--disable-blktap2

I suppose that there is unexpected case caused by a backports or missed 
patch/es to backports from unstable.
I not found with a fast look rilevant patch to try to revert, can anyone 
suggest me the more probable point/s for bisect and/or patch to revert 
or I must try full bisect 4.5.0->stable-4.5?

Thanks for any reply and sorry for my bad english.


More information about the Spice-devel mailing list