[RFC PATCH v2 0/2] Virtio-GPU suspend and resume
Kim, Dongwon
dongwon.kim at intel.com
Tue Jul 1 02:01:56 UTC 2025
Hi Dmitry,
This is great. I will test this and include the fix in the patch.
> -----Original Message-----
> From: Dmitry Osipenko <dmitry.osipenko at collabora.com>
> Sent: Monday, June 30, 2025 7:00 PM
> To: Kim, Dongwon <dongwon.kim at intel.com>; dri-
> devel at lists.freedesktop.org
> Cc: Kasireddy, Vivek <vivek.kasireddy at intel.com>
> Subject: Re: [RFC PATCH v2 0/2] Virtio-GPU suspend and resume
>
> On 6/18/25 01:41, Kim, Dongwon wrote:
> ...
> >> Have you figured out why 10ms workaround is needed?
> >
> > [Kim, Dongwon] Unfortunately, I don't know why it fails without the
> > delay. I wanted to narrow down further so enabled printk during
> > suspend and resume but hang didn't occur with the timing changes
> > caused by printks. I've also tried more deterministic methods that
> > make it wait based on some kinds of "status" but none of them have
> worked so far. If you have any suggestions on possible condition we can
> check instead of just sleeping, please let me know.
> > 10ms seems to be close to minimum to make it work 100% for several
> > days (rtcwake sleep and wake up every 5 sec).
>
> Was able to reproduce the hang and got a crash backtrace with
> no_console_suspend:
>
> [ 63.824827] PM: suspend entry (deep)
> [ 63.825041] Filesystems sync: 0.000 seconds
> [ 63.990951] Freezing user space processes
> [ 63.992488] Freezing user space processes completed (elapsed 0.001
> seconds)
> [ 63.992775] OOM killer disabled.
> [ 63.992902] Freezing remaining freezable tasks
> [ 63.994099] Freezing remaining freezable tasks completed (elapsed 0.001
> seconds)
> [ 64.002183] Oops: general protection fault, probably for non-canonical
> address 0x2abe0ea26847fb08: 0000 [#1] SMP NOPTI
> [ 64.003172] CPU: 9 UID: 0 PID: 178 Comm: kworker/9:2 Not tainted 6.15.4-
> 00002-g01117b4373b2-dirty #123 PREEMPT(voluntary)
> [ 64.003614] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
> rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
> [ 64.004036] Workqueue: events virtio_gpu_dequeue_ctrl_func
> [ 64.004280] RIP: 0010:virtqueue_get_buf_ctx_split+0x86/0x130
> [ 64.004515] Code: 01 66 23 43 50 0f b7 c0 8b 74 c1 04 8b 44 c1 08 41 89 45
> 00 3b 73 58 0f 83 96 d7 20 ff 89 f0 48 c1 e0 04 48 03 83 80 00 00 00 <4c> 8b 20
> 4d 85 e4 0f 84 5a d7 20 ff 48 89 df e8 46 fc ff ff 0f b7
> [ 64.005227] RSP: 0018:ffffc90000b53d90 EFLAGS: 00010202
> [ 64.005430] RAX: 2abe0ea26847fb08 RBX: ffff888102d58a00 RCX:
> ffff8881255314c0
> [ 64.005698] RDX: 0000000000000000 RSI: 0000000000000008 RDI:
> ffff888102d58a00
> [ 64.005975] RBP: ffffc90000b53db0 R08: 8080808080808080 R09:
> ffff88885b470b40
> [ 64.006273] R10: ffff8881000508c8 R11: fefefefefefefeff R12:
> 0000000000000001
> [ 64.006907] R13: ffffc90000b53dfc R14: ffffc90000b53dfc R15:
> ffff8881032d0568
> [ 64.007205] FS: 0000000000000000(0000) GS:ffff8888d6650000(0000)
> knlGS:0000000000000000
> [ 64.007511] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 64.007732] CR2: 00007efedc4d3000 CR3: 00000001056e9000 CR4:
> 0000000000750ef0
> [ 64.008014] PKRU: 55555554
> [ 64.008123] Call Trace:
> [ 64.008223] <TASK>
> [ 64.008314] virtqueue_get_buf+0x46/0x60
> [ 64.008465] virtio_gpu_dequeue_ctrl_func+0x86/0x2a0
> [ 64.008655] process_one_work+0x18a/0x370
> [ 64.008823] worker_thread+0x31a/0x460
> [ 64.008971] ? _raw_spin_unlock_irqrestore+0x27/0x50
> [ 64.009176] ? srso_alias_return_thunk+0x5/0xfbef5
> [ 64.009369] ? __pfx_worker_thread+0x10/0x10
> [ 64.009532] kthread+0x126/0x230
> [ 64.009662] ? _raw_spin_unlock_irq+0x1f/0x40
> [ 64.009836] ? __pfx_kthread+0x10/0x10
> [ 64.009986] ret_from_fork+0x3a/0x60
> [ 64.010156] ? __pfx_kthread+0x10/0x10
> [ 64.010318] ret_from_fork_asm+0x1a/0x30
> [ 64.010507] </TASK>
> [ 64.010616] Modules linked in:
> [ 64.010785] ---[ end trace 0000000000000000 ]---
>
> ==
>
> The trace tells that virtio queue is active after it has been removed. This
> change fixes the crash, please test:
>
> diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.c
> b/drivers/gpu/drm/virtio/virtgpu_drv.c
> index 03ab78b44ab3..48bb21f33306 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_drv.c
> +++ b/drivers/gpu/drm/virtio/virtgpu_drv.c
> @@ -187,6 +187,10 @@ static int virtgpu_freeze(struct virtio_device *vdev)
> flush_work(&vgdev->ctrlq.dequeue_work);
> flush_work(&vgdev->cursorq.dequeue_work);
> flush_work(&vgdev->config_changed_work);
> + wait_event(vgdev->ctrlq.ack_queue,
> + vgdev->ctrlq.vq->num_free == vgdev->ctrlq.vq->num_max);
> + wait_event(vgdev->cursorq.ack_queue,
> + vgdev->cursorq.vq->num_free ==
> + vgdev->cursorq.vq->num_max);
> vdev->config->del_vqs(vdev);
>
> return 0;
>
> --
> Best regards,
> Dmitry
More information about the dri-devel
mailing list