[Spice-devel] investigation into spice kms crash

Mike ipso at snappymail.ca
Wed Oct 15 14:34:39 PDT 2014


Hi David,

	Just to add another data point to this, two of my colleagues are seeing 
the exact same issue, however its happening on a very underutilized 
gigabit LAN, connected to a very underutilized baremetal server. It 
seems to occur about once or twice per day for one (he swears it mostly 
happens when he opens firefox by clicking on a link in thunderbird), and 
about once per week for the other.  They are both on the exact same 
guest OS's hosted on the exact same server.

This just started happening after upgrading the guest OS and server OS 
from Ubuntu 13.04 to 14.04.

Is there anything I can do to help track this down?

Here is sample trace:
> Oct 15 13:48:43 timetrex1 kernel: [108682.060088] WARNING: CPU: 0 PID: 1369 at /build/buildd/linux-3.13.0/drivers/gpu/drm/qxl/qxl_ttm.c:420 qxl_sync_obj_wait+0x172/0x1f0 [qxl]()
> Oct 15 13:48:43 timetrex1 kernel: [108682.060090] sync obj 301 still has outstanding releases 0 0 0 262144 1
> Oct 15 13:48:43 timetrex1 kernel: [108682.060091] Modules linked in: arc4 md4 nls_utf8 cifs kvm_intel snd_hda_intel kvm snd_hda_codec rfcomm bnep bluetooth snd_hwdep crct10dif_pclmul snd_pcm crc32_pclmul ghash_clmulni_intel aesni_intel snd_page_alloc snd_seq_midi snd_seq_midi_event aes_x86_64 snd_rawmidi snd_seq snd_seq_device snd_timer lrw gf128mul snd soundcore glue_helper ablk_helper cryptd serio_raw i2c_piix4 mac_hid nfsd auth_rpcgss nfs_acl nfs lockd parport_pc ppdev lp sunrpc parport fscache qxl ttm drm_kms_helper psmouse drm floppy
> Oct 15 13:48:43 timetrex1 kernel: [108682.060112] CPU: 0 PID: 1369 Comm: Xorg Not tainted 3.13.0-34-generic #60-Ubuntu
> Oct 15 13:48:43 timetrex1 kernel: [108682.060114] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> Oct 15 13:48:43 timetrex1 kernel: [108682.060115]  0000000000000009 ffff8800b8c598b0 ffffffff8171bd94 ffff8800b8c598f8
> Oct 15 13:48:43 timetrex1 kernel: [108682.060117]  ffff8800b8c598e8 ffffffff810676cd 0000000000000001 ffff88003595c740
> Oct 15 13:48:43 timetrex1 kernel: [108682.060119]  0000000000000001 ffff88003595c758 0000000000000000 ffff8800b8c59948
> Oct 15 13:48:43 timetrex1 kernel: [108682.060121] Call Trace:
> Oct 15 13:48:43 timetrex1 kernel: [108682.060139]  [<ffffffff8171bd94>] dump_stack+0x45/0x56
> Oct 15 13:48:43 timetrex1 kernel: [108682.060148]  [<ffffffff810676cd>] warn_slowpath_common+0x7d/0xa0
> Oct 15 13:48:43 timetrex1 kernel: [108682.060150]  [<ffffffff8106773c>] warn_slowpath_fmt+0x4c/0x50
> Oct 15 13:48:43 timetrex1 kernel: [108682.060155]  [<ffffffff8171f853>] ? schedule_hrtimeout_range+0x13/0x20
> Oct 15 13:48:43 timetrex1 kernel: [108682.060158]  [<ffffffffa00ca012>] qxl_sync_obj_wait+0x172/0x1f0 [qxl]
> Oct 15 13:48:43 timetrex1 kernel: [108682.060169]  [<ffffffffa00a7781>] ttm_bo_wait+0x91/0x190 [ttm]
> Oct 15 13:48:43 timetrex1 kernel: [108682.060173]  [<ffffffffa00a9214>] ttm_bo_evict+0x54/0x340 [ttm]
> Oct 15 13:48:43 timetrex1 kernel: [108682.060176]  [<ffffffffa00a96a8>] ttm_mem_evict_first+0x1a8/0x220 [ttm]
> Oct 15 13:48:43 timetrex1 kernel: [108682.060180]  [<ffffffffa00a99b0>] ttm_bo_mem_space+0x290/0x340 [ttm]
> Oct 15 13:48:43 timetrex1 kernel: [108682.060238]  [<ffffffffa00a9dca>] ttm_bo_move_buffer+0xba/0x130 [ttm]
> Oct 15 13:48:43 timetrex1 kernel: [108682.060242]  [<ffffffffa00a9f01>] ttm_bo_validate+0xc1/0x130 [ttm]
> Oct 15 13:48:43 timetrex1 kernel: [108682.060246]  [<ffffffffa00aa1d5>] ttm_bo_init+0x265/0x420 [ttm]
> Oct 15 13:48:43 timetrex1 kernel: [108682.060249]  [<ffffffffa00cc04e>] qxl_bo_create+0x14e/0x1d0 [qxl]
> Oct 15 13:48:43 timetrex1 kernel: [108682.060253]  [<ffffffffa00cbdb0>] ? qxl_fbdev_qobj_is_fb+0x30/0x30 [qxl]
> Oct 15 13:48:43 timetrex1 kernel: [108682.060256]  [<ffffffffa00cc777>] qxl_gem_object_create+0x57/0x100 [qxl]
> Oct 15 13:48:43 timetrex1 kernel: [108682.060261]  [<ffffffffa00cc86d>] qxl_gem_object_create_with_handle+0x4d/0x100 [qxl]
> Oct 15 13:48:43 timetrex1 kernel: [108682.060264]  [<ffffffffa00cf8ca>] qxl_alloc_ioctl+0x3a/0xa0 [qxl]
> Oct 15 13:48:43 timetrex1 kernel: [108682.060271]  [<ffffffff8117ce78>] ? __vma_link_file+0x48/0x80
> Oct 15 13:48:43 timetrex1 kernel: [108682.060292]  [<ffffffffa001cc22>] drm_ioctl+0x502/0x630 [drm]
> Oct 15 13:48:43 timetrex1 kernel: [108682.060299]  [<ffffffff8104f28f>] ? kvm_clock_read+0x1f/0x30
> Oct 15 13:48:43 timetrex1 kernel: [108682.060316]  [<ffffffff8101b7e9>] ? sched_clock+0x9/0x10
> Oct 15 13:48:43 timetrex1 kernel: [108682.060322]  [<ffffffff8109d1ad>] ? sched_clock_local+0x1d/0x80
> Oct 15 13:48:43 timetrex1 kernel: [108682.060327]  [<ffffffff811cfd10>] do_vfs_ioctl+0x2e0/0x4c0
> Oct 15 13:48:43 timetrex1 kernel: [108682.060329]  [<ffffffff8109ddf4>] ? vtime_account_user+0x54/0x60
> Oct 15 13:48:43 timetrex1 kernel: [108682.060331]  [<ffffffff811cff71>] SyS_ioctl+0x81/0xa0
> Oct 15 13:48:43 timetrex1 kernel: [108682.060336]  [<ffffffff8172c97f>] tracesys+0xe1/0xe6
> Oct 15 13:48:43 timetrex1 kernel: [108682.060337] ---[ end trace 5f3bb71d5aebd12e ]---
> Oct 15 13:48:43 timetrex1 kernel: [108682.060339] [TTM] Failed to expire sync object before buffer eviction
> Oct 15 13:48:43 timetrex1 kernel: [108682.060351] qxl 0000:00:02.0: object_init failed for (262144, 0x00000001)
> Oct 15 13:48:43 timetrex1 kernel: [108682.060353] [drm:qxl_gem_object_create] *ERROR* Failed to allocate GEM object (258964, 1, 4096, -12)
> Oct 15 13:48:43 timetrex1 kernel: [108682.060355] [drm:qxl_alloc_ioctl] *ERROR* qxl_alloc_ioctl: failed to create gem ret=-12


On 10/14/2014 08:07 AM, David Mansfield wrote:
> Hi All,
>
> I've been poking around trying to find the cause of an intermittent
> spice kms crash, which I think is the same as these three bugzilla bugs:
>
> https://bugzilla.redhat.com/show_bug.cgi?id=1038275
> https://bugzilla.redhat.com/show_bug.cgi?id=1050034
> https://bugzilla.redhat.com/show_bug.cgi?id=1039063
>
> and an email I sent on 11 Sep. 2014 (Subject: qxl dri related kernel oops).
>
> The code in question seems to be looping for about 150-300ms (in
> qxl_ttm.c:qxl_sync_obj_wait) and if I've understood the architecture
> correctly, it's waiting for the "host" side of QXL to release some objects.
>
> Also, in qxl_fence.c, there is this comment:
>
>     For some reason every so often qxl hw fails to release, things go
> wrong.
>
> In particular, my colleague hits this semi-regularly when using spice
> via a mobile-tether: i.e. high jitter, high latency and periodic network
> dropouts.
>
> Does the "qxl hw" have a discard policy when the client is "missing" and
> is the 150-300ms timing loop in the guest based on anything?
>
>

-- 
Mike


More information about the Spice-devel mailing list