[PATCH] radeon: add a force flush to delay work when radeon suspend
Christian König
christian.koenig at amd.com
Mon Jan 3 10:20:44 UTC 2022
Am 25.12.21 um 03:56 schrieb 周雪梅:
> Although radeon card fence and wait for gpu to finish processing
> current batch rings,
> there is still a corner case that radeon lockup work queue may not be
> fully flushed,
> and meanwhile the radeon_suspend_kms() function has called
> pci_set_power_state() to
> put device in D3hot state.
>
> Per PCI spec rev 4.0 on 5.3.1.4.1 D3hot State.
> > Configuration and Message requests are the only TLPs accepted by a
> Function in
> > the D3hot state. All other received Requests must be handled as
> Unsupported Requests,
> > and all received Completions may optionally be handled as Unexpected
> Completions.
Well first of all this is the completely wrong place for this. The flush
belongs into the fence code and not here.
Then I don't think that this is a good idea since it might cause deadlocks.
Christian.
>
> This issue will happen in following logs:
>
> 1Unable to handle kernel paging request at virtual address
> 00008800e0008010
> CPU 0 kworker/0:3(131): Oops 0
> pc = [<ffffffff811bea5c>] ra = [<ffffffff81240844>] ps = 0000
> Tainted: G W
> pc is at si_gpu_check_soft_reset+0x3c/0x240
> ra is at si_dma_is_lockup+0x34/0xd0
> v0 = 0000000000000000 t0 = fff08800e0008010 t1 = 0000000000010000
> t2 = 0000000000008010 t3 = fff00007e3c00000 t4 = fff00007e3c00258
> t5 = 000000000000ffff t6 = 0000000000000001 t7 = fff00007ef078000
> s0 = fff00007e3c016e8 s1 = fff00007e3c00000 s2 = fff00007e3c00018
> s3 = fff00007e3c00000 s4 = fff00007fff59d80 s5 = 0000000000000000
> s6 = fff00007ef07bd98
> a0 = fff00007e3c00000 a1 = fff00007e3c016e8 a2 = 0000000000000008
> a3 = 0000000000000001 a4 = 8f5c28f5c28f5c29 a5 = ffffffff810f4338
> t8 = 0000000000000275 t9 = ffffffff809b66f8 t10 = ff6769c5d964b800
> t11= 000000000000b886 pv = ffffffff811bea20 at = 0000000000000000
> gp = ffffffff81d89690 sp = 00000000aa814126
> 4Disabling lock debugging due to kernel taint
> Trace:
> [<ffffffff81240844>] si_dma_is_lockup+0x34/0xd0
> [<ffffffff81119610>] radeon_fence_check_lockup+0xd0/0x290
> [<ffffffff80977010>] process_one_work+0x280/0x550
> [<ffffffff80977350>] worker_thread+0x70/0x7c0
> [<ffffffff80977410>] worker_thread+0x130/0x7c0
> [<ffffffff80982040>] kthread+0x200/0x210
> [<ffffffff809772e0>] worker_thread+0x0/0x7c0
> [<ffffffff80981f8c>] kthread+0x14c/0x210
> [<ffffffff80911658>] ret_from_kernel_thread+0x18/0x20
> [<ffffffff80981e40>] kthread+0x0/0x210
>
> Code: ad3e0008 43f0074a ad7e0018 ad9e0020 8c3001e8 40230101
> <88210000> 4821ed21
>
> So force lockup work queue flush to fix this problem.
>
> Reviewed-by: Su Weiqiang <suweiqiang at wxiat.com>
> Reviewed-by: Zhou Xuemei <zhouxuemei at wxiat.com>
> Signed-off-by: Xu Chenjiao <xuchenjiao at wxiat.com>
> ---
> drivers/gpu/drm/radeon/radeon_device.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/drivers/gpu/drm/radeon/radeon_device.c
> b/drivers/gpu/drm/radeon/radeon_device.c
> index 59c8a6647ff2..cc1c07963116 100644
> --- a/drivers/gpu/drm/radeon/radeon_device.c
> +++ b/drivers/gpu/drm/radeon/radeon_device.c
> @@ -1625,6 +1625,9 @@ int radeon_suspend_kms(struct drm_device *dev,
> bool suspend,
> if (r) {
> /* delay GPU reset to resume */
> radeon_fence_driver_force_completion(rdev, i);
> +} else {
> +/* finish executing delayed work */
> +flush_delayed_work(&rdev->fence_drv[i].lockup_work);
> }
> }
> --
> 2.17.1
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20220103/8ded5e74/attachment.htm>
More information about the amd-gfx
mailing list