[PATCH] drm/amdgpu: only WARN freeing buffers when DMA is unavailable
Christian König
christian.koenig at amd.com
Fri Feb 10 08:07:54 UTC 2023
Hi Evan,
yeah, exactly that's what this warning should prevent. Allocating
buffers temporary for stuff like that is illegal during resume.
I strongly suggest to just remove the MES test. It's abusing the kernel
ring interface in a way we didn't want anyway and is currently replaced
by Shahanks work.
Regards,
Christian.
Am 10.02.23 um 05:12 schrieb Quan, Evan:
>
> [AMD Official Use Only - General]
>
> Hi Jack,
>
> Are you trying to fix the call trace popped up on resuming below?
>
> It seems mes created some bo for its self test and freed it up later
> at the final stage of the resuming process.
>
> All these happened before the in_suspend flag cleared. And that
> triggered the call trace.
>
> Is my understanding correct?
>
> [74084.799260] WARNING: CPU: 2 PID: 2891 at
> drivers/gpu/drm/amd/amdgpu/amdgpu_object.c:425
> amdgpu_bo_free_kernel+0xfc/0x110 [amdgpu]
>
> [74084.811019] Modules linked in: nls_iso8859_1 amdgpu(OE) iommu_v2
> gpu_sched drm_buddy drm_ttm_helper ttm drm_display_helper
> drm_kms_helper i2c_algo_bit fb_sys_fops syscopyarea sysfillrect
> sysimgblt snd_sm
>
> [74084.811042] ip_tables x_tables autofs4 hid_logitech_hidpp
> hid_logitech_dj hid_generic e1000e usbhid ptp uas hid video i2c_i801
> ahci pps_core crc32_pclmul i2c_smbus usb_storage libahci wmi
>
> [74084.914519] CPU: 2 PID: 2891 Comm: kworker/u16:38 Tainted: G
> W IOE 6.0.0-custom #1
>
> [74084.923146] Hardware name: ASUS System Product Name/PRIME Z390-A,
> BIOS 2004 11/02/2021
>
> [74084.931074] Workqueue: events_unbound async_run_entry_fn
>
> [74084.936393] RIP: 0010:amdgpu_bo_free_kernel+0xfc/0x110 [amdgpu]
>
> [74084.942422] Code: 00 4d 85 ed 74 08 49 c7 45 00 00 00 00 00 4d 85
> e4 74 08 49 c7 04 24 00 00 00 00 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc
> cc cc cc <0f> 0b e9 39 ff ff ff 3d 00 fe ff ff 0f 85 75 96 47 00 ebf
>
> [74084.961199] RSP: 0000:ffffbed6812ebb90 EFLAGS: 00010202
>
> [74084.966435] RAX: 0000000000000000 RBX: ffffbed6812ebc50 RCX:
> 0000000000000000
>
> [74084.973578] RDX: ffffbed6812ebc70 RSI: ffffbed6812ebc60 RDI:
> ffffbed6812ebc50
>
> [74084.980725] RBP: ffffbed6812ebbb8 R08: 0000000000000000 R09:
> 00000000000001ff
>
> [74084.987869] R10: ffffbed6812ebb40 R11: 0000000000000000 R12:
> ffffbed6812ebc70
>
> [74084.995015] R13: ffffbed6812ebc60 R14: ffff963a2945cc00 R15:
> ffff9639c7da5630
>
> [74085.002160] FS: 0000000000000000(0000) GS:ffff963d1dc80000(0000)
> knlGS:0000000000000000
>
> [74085.010262] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>
> [74085.016016] CR2: 0000000000000000 CR3: 0000000377c0a001 CR4:
> 00000000003706e0
>
> [74085.023164] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
>
> [74085.030307] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> 0000000000000400
>
> [74085.037453] Call Trace:
>
> [74085.039911] <TASK>
>
> [74085.042023] amdgpu_mes_self_test+0x385/0x460 [amdgpu]
>
> [74085.047293] mes_v11_0_late_init+0x44/0x50 [amdgpu]
>
> [74085.052291] amdgpu_device_ip_late_init+0x50/0x270 [amdgpu]
>
> [74085.058032] amdgpu_device_resume+0xb0/0x2d0 [amdgpu]
>
> [74085.063187] amdgpu_pmops_resume+0x37/0x70 [amdgpu]
>
> [74085.068162] pci_pm_resume+0x68/0x100
>
> [74085.071836] ? pci_legacy_resume+0x80/0x80
>
> [74085.075943] dpm_run_callback+0x4c/0x160
>
> [74085.079873] device_resume+0xad/0x210
>
> [74085.083546] async_resume+0x1e/0x40
>
> [74085.087046] async_run_entry_fn+0x30/0x120
>
> [74085.091152] process_one_work+0x21a/0x3f0
>
> [74085.095173] worker_thread+0x50/0x3e0
>
> [74085.098845] ? process_one_work+0x3f0/0x3f0
>
> [74085.103039] kthread+0xfa/0x130
>
> [74085.106189] ? kthread_complete_and_exit+0x20/0x20
>
> [74085.110993] ret_from_fork+0x1f/0x30
>
> [74085.114576] </TASK>
>
> [74085.116773] ---[ end trace 0000000000000000 ]---
>
> BR
>
> Evan
>
> *From:* amd-gfx <amd-gfx-bounces at lists.freedesktop.org> *On Behalf Of
> *Christian König
> *Sent:* Monday, February 6, 2023 5:00 PM
> *To:* Xiao, Jack <Jack.Xiao at amd.com>; Koenig, Christian
> <Christian.Koenig at amd.com>; amd-gfx at lists.freedesktop.org; Deucher,
> Alexander <Alexander.Deucher at amd.com>
> *Subject:* Re: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA
> is unavailable
>
> Am 06.02.23 um 09:28 schrieb Xiao, Jack:
>
> [AMD Official Use Only - General]
>
> >> >> It's simply not allowed to free up resources
> during suspend since those can't be acquired again during resume.
>
> >> The in_suspend flag is set at the
> beginning of suspend and unset at the end of resume. It can’t
> filter the case you mentioned.
>
>
> Why not? This is exactly what it should do.
>
> [Jack] If freeing up resources during resume, it should not hit
> the issue you described. But only checking in_suspend flag would
> take these cases as warning.
>
>
> No, once more: Freeing up or allocating resources between suspend and
> resume is illegal!
>
> If you free up a resource during resume you should absolutely hit
> that, this is intentional!
>
> Regards,
> Christian.
>
> Regards,
>
> Jack
>
> *From:* Koenig, Christian <Christian.Koenig at amd.com>
> <mailto:Christian.Koenig at amd.com>
> *Sent:* Monday, February 6, 2023 4:06 PM
> *To:* Xiao, Jack <Jack.Xiao at amd.com> <mailto:Jack.Xiao at amd.com>;
> Christian König <ckoenig.leichtzumerken at gmail.com>
> <mailto:ckoenig.leichtzumerken at gmail.com>;
> amd-gfx at lists.freedesktop.org; Deucher, Alexander
> <Alexander.Deucher at amd.com> <mailto:Alexander.Deucher at amd.com>
> *Subject:* Re: [PATCH] drm/amdgpu: only WARN freeing buffers when
> DMA is unavailable
>
> Am 06.02.23 um 08:23 schrieb Xiao, Jack:
>
> [AMD Official Use Only - General]
>
> >> Nope, that is not related to any hw state.
>
> can use other flag.
>
> >> It's simply not allowed to free up resources during suspend
> since those can't be acquired again during resume.
>
> The in_suspend flag is set at the beginning of suspend and
> unset at the end of resume. It can’t filter the case you
> mentioned.
>
>
> Why not? This is exactly what it should do.
>
> Do you know the root cause of these cases hitting the issue?
> So that we can get an exact point to warn the freeing up behavior.
>
>
> Well the root cause are programming errors. See between suspending
> and resuming you should not allocate nor free memory.
>
> Otherwise we can run into trouble. And this check here is one part
> of that, we should probably add another warning during allocation
> of memory. But this here is certainly correct.
>
> Regards,
> Christian.
>
> Thanks,
>
> Jack
>
> *From:* Christian König <ckoenig.leichtzumerken at gmail.com>
> <mailto:ckoenig.leichtzumerken at gmail.com>
> *Sent:* Friday, February 3, 2023 9:20 PM
> *To:* Xiao, Jack <Jack.Xiao at amd.com>
> <mailto:Jack.Xiao at amd.com>; Koenig, Christian
> <Christian.Koenig at amd.com> <mailto:Christian.Koenig at amd.com>;
> amd-gfx at lists.freedesktop.org; Deucher, Alexander
> <Alexander.Deucher at amd.com> <mailto:Alexander.Deucher at amd.com>
> *Subject:* Re: [PATCH] drm/amdgpu: only WARN freeing buffers
> when DMA is unavailable
>
> Nope, that is not related to any hw state.
>
> It's simply not allowed to free up resources during suspend
> since those can't be acquired again during resume.
>
> We had a couple of cases now where this was wrong. If you get
> a warning from that please fix the code which tried to free
> something during suspend instead.
>
> Regards,
> Christian.
>
> Am 03.02.23 um 07:04 schrieb Xiao, Jack:
>
> [AMD Official Use Only - General]
>
> >> It's simply illegal to free up memory during suspend.
>
> Why? In my understanding, the limit was caused by DMA
> shutdown.
>
> Regards,
>
> Jack
>
> *From:* Koenig, Christian <Christian.Koenig at amd.com>
> <mailto:Christian.Koenig at amd.com>
> *Sent:* Thursday, February 2, 2023 7:43 PM
> *To:* Xiao, Jack <Jack.Xiao at amd.com>
> <mailto:Jack.Xiao at amd.com>; amd-gfx at lists.freedesktop.org;
> Deucher, Alexander <Alexander.Deucher at amd.com>
> <mailto:Alexander.Deucher at amd.com>
> *Subject:* AW: [PATCH] drm/amdgpu: only WARN freeing
> buffers when DMA is unavailable
>
> Big NAK to this! This warning is not related in any way to
> the hw state.
>
> It's simply illegal to free up memory during suspend.
>
> Regards,
>
> Christian.
>
> ------------------------------------------------------------------------
>
> *Von:*Xiao, Jack <Jack.Xiao at amd.com>
> *Gesendet:* Donnerstag, 2. Februar 2023 10:54
> *An:* amd-gfx at lists.freedesktop.org
> <amd-gfx at lists.freedesktop.org>; Deucher, Alexander
> <Alexander.Deucher at amd.com>; Koenig, Christian
> <Christian.Koenig at amd.com>
> *Cc:* Xiao, Jack <Jack.Xiao at amd.com>
> *Betreff:* [PATCH] drm/amdgpu: only WARN freeing buffers
> when DMA is unavailable
>
> Reduce waringings, only warn when DMA is unavailable.
>
> Signed-off-by: Jack Xiao <Jack.Xiao at amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> index 2d237f3d3a2e..e3e3764ea697 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> @@ -422,7 +422,8 @@ void amdgpu_bo_free_kernel(struct
> amdgpu_bo **bo, u64 *gpu_addr,
> if (*bo == NULL)
> return;
>
> - WARN_ON(amdgpu_ttm_adev((*bo)->tbo.bdev)->in_suspend);
> + WARN_ON(amdgpu_ttm_adev((*bo)->tbo.bdev)->in_suspend &&
> +
> !amdgpu_ttm_adev((*bo)->tbo.bdev)->ip_blocks[AMD_IP_BLOCK_TYPE_SDMA].status.hw);
>
> if (likely(amdgpu_bo_reserve(*bo, true) == 0)) {
> if (cpu_addr)
> --
> 2.37.3
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20230210/04a1ad35/attachment-0001.htm>
More information about the amd-gfx
mailing list