[Spice-devel] Fedora 19 and 20 spice guest Xorg crashes

Nahum Shalman nshalman at elys.com
Tue Nov 19 06:15:34 PST 2013


On 11/18/2013 05:32 PM, Dave Airlie wrote:
> On Tue, Nov 19, 2013 at 7:04 AM, Nahum Shalman <nshalman at elys.com> wrote:
>> Context:
>> Host is running qemu-kvm 1.1.2 and spice 0.12.2.
>> Fedora 16 VMs ran rock solid on these same virtualization hosts.
>> The Fedora 19 and 20(testing) VMs are running xf86-video-qxl compiled from
>> the master branch of the git repo.
>>
>> We've been seeing a lot of X server crashes in Fedora 19 and 20, generally
>> after the VM has been running for at least 2-3 days.
>> The last gasp in the Xorg logs from these crashes generally look something
>> like:
>>
>> [1024592.839] Out of memory allocating 261140 bytes
>> [1024592.839] Out of mem - stats
>>
>> [1024592.850] max system bytes =  243257344
>> [1024592.850] system bytes     =  243257344
>> [1024592.850] in use bytes     =  133245384
>>
>> Someone here managed to get a stack trace out of one such crash:
>>
>> (EE) [mi] EQ overflowing. Additional events will be discarded until existing
>> events are processed.
>> (EE)
>> (EE) Backtrace:
>> (EE) 0: /usr/bin/X (mieqEnqueue+0x22b) [0x57691b]
>> (EE) 1: /usr/bin/X (QueuePointerEvents+0x52) [0x44d862]
>> (EE) 2: /usr/lib64/xorg/modules/input/evdev_drv.so (_init+0x2913)
>> [0x7ff0faeb17e3]
>> (EE) 3: /usr/bin/X (DPMSSupported+0xe8) [0x4861f8]
>> (EE) 4: /usr/bin/X (xf86SerialModemClearBits+0x230) [0x4ae7b0]
>> (EE) 5: /lib64/libpthread.so.0 (__restore_rt+0x0) [0x3b7de0ef9f]
>> (EE) 6: /lib64/libpthread.so.0 (__nanosleep_nocancel+0x24) [0x3b7de0e804]
>> (EE) 7: /usr/lib64/xorg/modules/drivers/qxl_drv.so (qxl_handle_oom+0x69)
>> [0x7ff10fceccb9]
>> (EE) 8: /usr/lib64/xorg/modules/drivers/qxl_drv.so (qxl_allocnf+0x48)
>> [0x7ff10fcecd08]
>> (EE) 9: /usr/lib64/xorg/modules/drivers/qxl_drv.so
>> (qxl_bo_alloc_internal+0x76) [0x7ff10fcece06]
>> (EE) 10: /usr/lib64/xorg/modules/drivers/qxl_drv.so (qxl_image_create+0xf2)
>> [0x7ff10fce9782]
>> (EE) 11: /usr/lib64/xorg/modules/drivers/qxl_drv.so
>> (qxl_surface_put_image+0xf5) [0x7ff10fceb045]
>> (EE) 12: /usr/lib64/xorg/modules/drivers/qxl_drv.so (uxa_copy_n_to_n+0x5e7)
>> [0x7ff10fcf7127]
>> (EE) 13: /usr/bin/X (miCopyRegion+0x1ad) [0x574d2d]
>> (EE) 14: /usr/bin/X (miDoCopy+0x456) [0x5752b6]
>> (EE) 15: /usr/lib64/xorg/modules/drivers/qxl_drv.so (uxa_copy_area+0xae)
>> [0x7ff10fcf5efe]
>> (EE) 16: /usr/bin/X (dixDestroyPixmap+0x711) [0x433a31]
>> (EE) 17: /usr/bin/X (SendErrorToClient+0x3f7) [0x436fa7]
>> (EE) 18: /usr/bin/X (_init+0x3aaa) [0x429b4a]
>> (EE) 19: /lib64/libc.so.6 (__libc_start_main+0xf5) [0x3b7d221b75]
>> (EE) 20: /usr/bin/X (_start+0x29) [0x4267b1]
>> (EE) 21: ? (?+0x29) [0x29]
>> (EE)
>> (EE) [mi] These backtraces from mieqEnqueue may point to a culprit higher up
>> the stack.
>> (EE) [mi] mieq is NOT the cause. It is a victim.
>> (EE) [mi] EQ overflow continuing. 100 events have been dropped.
>>
>> His comment was:
>>
>> Examining the stack trace more closely, the functions identified are
>> misleading. The offsets are sometimes larger than the named functions, and
>> point to different functions not listed in the stripped symbol table.
>> Looking at the source, it seems that:
>>
>> (EE) 16: /usr/bin/X (dixDestroyPixmap+0x711) [0x433a31]
>>
>> This is probably ProcCreatePixmap()
>>
>> (EE) 17: /usr/bin/X (SendErrorToClient+0x3f7) [0x436fa7]
>>
>> This is possibly init_screen() or AddScreen()
>>
>> So, it appears the memory allocation fails while setting up a new screen
>> structure. This makes more sense, but still leaves open the question why
>> it's trying to create new screens long after startup.
>>
>> It's hard to recreate the crashes other than by simply booting and using a
>> VM for a few days. One theory we're tossing around is that the memory buffer
>> xf86-video-qxl has to work with is getting fragmented and when the
>> fragmentation gets bad enough an allocation can fail.
>> Our best guess is that this is a bug in the xf86-video-qxl driver. Has
>> anyone else seen similar Xorg crashes?
>>
>> Guidance on how to fix or at least troubleshoot this further would be
>> greatly appreciated.
>>
> Why aren't you running the Fedora packages?

When we were using the Fedora packages under F19 we were able to trigger 
a crash much more easily:
Specifically we had a script that would repeated launch Firefox, 
Thunderbird, Google Chrome, and Gedit, wait 5 seconds, then kill all 4 
applications.
That script could trigger an X server crash within a couple of hours.

When we switched to compiling the master branch, that script didn't 
crash the X server at all, but normal use still causes the crashes 
described in my previous email.

Thanks!
-Nahum


More information about the Spice-devel mailing list