[Spice-devel] Xorg indefinitely hangs in kernelspace

Frediano Ziglio fziglio at redhat.com
Fri Sep 6 20:27:41 UTC 2019


> 
> On Tue, 6 Aug 2019 21:00:10 +0300 From:   Jaak Ristioja <jaak at ristioja.ee>
> > Hello!
> > 
> > I'm writing to report a crash in the QXL / DRM code in the Linux kernel.
> > I originally filed the issue on LaunchPad and more details can be found
> > there, although I doubt whether these details are useful.
> > 
> >   https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1813620
> > 
> > I first experienced these issues with:
> > 
> > * Ubuntu 18.04 (probably kernel 4.15.something)
> > * Ubuntu 18.10 (kernel 4.18.0-13)
> > * Ubuntu 19.04 (kernel 5.0.0-13-generic)
> > * Ubuntu 19.04 (mainline kernel 5.1-rc7)
> > * Ubuntu 19.04 (mainline kernel 5.2.0-050200rc1-generic)
> > 
> > Here is the crash output from dmesg:
> > 
> > [354073.713350] INFO: task Xorg:920 blocked for more than 120 seconds.
> > [354073.717755]       Not tainted 5.2.0-050200rc1-generic #201905191930
> > [354073.722277] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > disables this message.
> > [354073.738332] Xorg            D    0   920    854 0x00404004
> > [354073.738334] Call Trace:
> > [354073.738340]  __schedule+0x2ba/0x650
> > [354073.738342]  schedule+0x2d/0x90
> > [354073.738343]  schedule_preempt_disabled+0xe/0x10
> > [354073.738345]  __ww_mutex_lock.isra.11+0x3e0/0x750
> > [354073.738346]  __ww_mutex_lock_slowpath+0x16/0x20
> > [354073.738347]  ww_mutex_lock+0x34/0x50
> > [354073.738352]  ttm_eu_reserve_buffers+0x1f9/0x2e0 [ttm]
> > [354073.738356]  qxl_release_reserve_list+0x67/0x150 [qxl]
> > [354073.738358]  ? qxl_bo_pin+0xaa/0x190 [qxl]
> > [354073.738359]  qxl_cursor_atomic_update+0x1b0/0x2e0 [qxl]
> > [354073.738367]  drm_atomic_helper_commit_planes+0xb9/0x220
> > [drm_kms_helper]
> > [354073.738371]  drm_atomic_helper_commit_tail+0x2b/0x70 [drm_kms_helper]
> > [354073.738374]  commit_tail+0x67/0x70 [drm_kms_helper]
> > [354073.738378]  drm_atomic_helper_commit+0x113/0x120 [drm_kms_helper]
> > [354073.738390]  drm_atomic_commit+0x4a/0x50 [drm]
> > [354073.738394]  drm_atomic_helper_update_plane+0xe9/0x100 [drm_kms_helper]
> > [354073.738402]  __setplane_atomic+0xd3/0x120 [drm]
> > [354073.738410]  drm_mode_cursor_universal+0x142/0x270 [drm]
> > [354073.738418]  drm_mode_cursor_common+0xcb/0x220 [drm]
> > [354073.738425]  ? drm_mode_cursor_ioctl+0x60/0x60 [drm]
> > [354073.738432]  drm_mode_cursor2_ioctl+0xe/0x10 [drm]
> > [354073.738438]  drm_ioctl_kernel+0xb0/0x100 [drm]
> > [354073.738440]  ? ___sys_recvmsg+0x16c/0x200
> > [354073.738445]  drm_ioctl+0x233/0x410 [drm]
> > [354073.738452]  ? drm_mode_cursor_ioctl+0x60/0x60 [drm]
> > [354073.738454]  ? timerqueue_add+0x57/0x90
> > [354073.738456]  ? enqueue_hrtimer+0x3c/0x90
> > [354073.738458]  do_vfs_ioctl+0xa9/0x640
> > [354073.738459]  ? fput+0x13/0x20
> > [354073.738461]  ? __sys_recvmsg+0x88/0xa0
> > [354073.738462]  ksys_ioctl+0x67/0x90
> > [354073.738463]  __x64_sys_ioctl+0x1a/0x20
> > [354073.738465]  do_syscall_64+0x5a/0x140
> > [354073.738467]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > [354073.738468] RIP: 0033:0x7ffad14d3417
> > [354073.738472] Code: Bad RIP value.
> > [354073.738472] RSP: 002b:00007ffdd5679978 EFLAGS: 00003246 ORIG_RAX:
> > 0000000000000010
> > [354073.738473] RAX: ffffffffffffffda RBX: 000056428a474610 RCX:
> > 00007ffad14d3417
> > [354073.738474] RDX: 00007ffdd56799b0 RSI: 00000000c02464bb RDI:
> > 000000000000000e
> > [354073.738474] RBP: 00007ffdd56799b0 R08: 0000000000000040 R09:
> > 0000000000000010
> > [354073.738475] R10: 000000000000003f R11: 0000000000003246 R12:
> > 00000000c02464bb
> > [354073.738475] R13: 000000000000000e R14: 0000000000000000 R15:
> > 000056428a4721d0
> > [354073.738511] INFO: task kworker/1:0:27625 blocked for more than 120
> > seconds.
> > [354073.745154]       Not tainted 5.2.0-050200rc1-generic #201905191930
> > [354073.751900] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > disables this message.
> > [354073.762197] kworker/1:0     D    0 27625      2 0x80004000
> > [354073.762205] Workqueue: events qxl_client_monitors_config_work_func
> > [qxl]
> > [354073.762206] Call Trace:
> > [354073.762211]  __schedule+0x2ba/0x650
> > [354073.762214]  schedule+0x2d/0x90
> > [354073.762215]  schedule_preempt_disabled+0xe/0x10
> > [354073.762216]  __ww_mutex_lock.isra.11+0x3e0/0x750
> > [354073.762217]  ? __switch_to_asm+0x34/0x70
> > [354073.762218]  ? __switch_to_asm+0x40/0x70
> > [354073.762219]  ? __switch_to_asm+0x40/0x70
> > [354073.762220]  __ww_mutex_lock_slowpath+0x16/0x20
> > [354073.762221]  ww_mutex_lock+0x34/0x50
> > [354073.762235]  drm_modeset_lock+0x35/0xb0 [drm]
> > [354073.762243]  drm_modeset_lock_all_ctx+0x5d/0xe0 [drm]
> > [354073.762251]  drm_modeset_lock_all+0x5e/0xb0 [drm]
> > [354073.762252]  qxl_display_read_client_monitors_config+0x1e1/0x370 [qxl]
> > [354073.762254]  qxl_client_monitors_config_work_func+0x15/0x20 [qxl]
> > [354073.762256]  process_one_work+0x20f/0x410
> > [354073.762257]  worker_thread+0x34/0x400
> > [354073.762259]  kthread+0x120/0x140
> > [354073.762260]  ? process_one_work+0x410/0x410
> > [354073.762261]  ? __kthread_parkme+0x70/0x70
> > [354073.762262]  ret_from_fork+0x35/0x40
> > 
> 
> --- a/drivers/gpu/drm/ttm/ttm_execbuf_util.c
> +++ b/drivers/gpu/drm/ttm/ttm_execbuf_util.c
> @@ -97,8 +97,9 @@ int ttm_eu_reserve_buffers(struct ww_acq
>  			   struct list_head *dups, bool del_lru)
>  {
>  	struct ttm_bo_global *glob;
> -	struct ttm_validate_buffer *entry;
> +	struct ttm_validate_buffer *entry, *last_entry;
>  	int ret;
> +	bool locked = false;
>  
>  	if (list_empty(list))
>  		return 0;
> @@ -112,7 +113,10 @@ int ttm_eu_reserve_buffers(struct ww_acq
>  	list_for_each_entry(entry, list, head) {
>  		struct ttm_buffer_object *bo = entry->bo;
>  
> +		last_entry = entry;
>  		ret = __ttm_bo_reserve(bo, intr, (ticket == NULL), ticket);
> +		if (!ret)
> +			locked = true;
>  		if (!ret && unlikely(atomic_read(&bo->cpu_writers) > 0)) {
>  			reservation_object_unlock(bo->resv);
>  
> @@ -151,6 +155,10 @@ int ttm_eu_reserve_buffers(struct ww_acq
>  				ret = 0;
>  			}
>  		}
> +		if (!ret)
> +			locked = true;
> +		else
> +			locked = false;
>  

locked = !ret; 

?

>  		if (!ret && entry->num_shared)
>  			ret = reservation_object_reserve_shared(bo->resv,
> @@ -163,6 +171,8 @@ int ttm_eu_reserve_buffers(struct ww_acq
>  				ww_acquire_done(ticket);
>  				ww_acquire_fini(ticket);
>  			}
> +			if (locked)
> +				ttm_eu_backoff_reservation_reverse(list, entry);
>  			return ret;
>  		}
>  
> @@ -172,6 +182,8 @@ int ttm_eu_reserve_buffers(struct ww_acq
>  		list_del(&entry->head);
>  		list_add(&entry->head, list);
>  	}
> +	if (locked)
> +		ttm_eu_backoff_reservation_reverse(list, last_entry);
>  
>  	if (del_lru) {
>  		spin_lock(&glob->lru_lock);

Where does it came this patch? Is it already somewhere?
Is it supposed to fix this issue?
Does it affect some other card beside QXL?

Frediano


More information about the dri-devel mailing list