[PATCH] drm/amdgpu: only WARN freeing buffers when DMA is unavailable

Christian König christian.koenig at amd.com
Fri Feb 10 08:07:54 UTC 2023


Hi Evan,

yeah, exactly that's what this warning should prevent. Allocating 
buffers temporary for stuff like that is illegal during resume.

I strongly suggest to just remove the MES test. It's abusing the kernel 
ring interface in a way we didn't want anyway and is currently replaced 
by Shahanks work.

Regards,
Christian.

Am 10.02.23 um 05:12 schrieb Quan, Evan:
>
> [AMD Official Use Only - General]
>
> Hi Jack,
>
> Are you trying to fix the call trace popped up on resuming below?
>
> It seems mes created some bo for its self test and freed it up later 
> at the final stage of the resuming process.
>
> All these happened before the in_suspend flag cleared. And that 
> triggered the call trace.
>
> Is my understanding correct?
>
> [74084.799260] WARNING: CPU: 2 PID: 2891 at 
> drivers/gpu/drm/amd/amdgpu/amdgpu_object.c:425 
> amdgpu_bo_free_kernel+0xfc/0x110 [amdgpu]
>
> [74084.811019] Modules linked in: nls_iso8859_1 amdgpu(OE) iommu_v2 
> gpu_sched drm_buddy drm_ttm_helper ttm drm_display_helper 
> drm_kms_helper i2c_algo_bit fb_sys_fops syscopyarea sysfillrect 
> sysimgblt snd_sm
>
> [74084.811042]  ip_tables x_tables autofs4 hid_logitech_hidpp 
> hid_logitech_dj hid_generic e1000e usbhid ptp uas hid video i2c_i801 
> ahci pps_core crc32_pclmul i2c_smbus usb_storage libahci wmi
>
> [74084.914519] CPU: 2 PID: 2891 Comm: kworker/u16:38 Tainted: G 
>        W IOE      6.0.0-custom #1
>
> [74084.923146] Hardware name: ASUS System Product Name/PRIME Z390-A, 
> BIOS 2004 11/02/2021
>
> [74084.931074] Workqueue: events_unbound async_run_entry_fn
>
> [74084.936393] RIP: 0010:amdgpu_bo_free_kernel+0xfc/0x110 [amdgpu]
>
> [74084.942422] Code: 00 4d 85 ed 74 08 49 c7 45 00 00 00 00 00 4d 85 
> e4 74 08 49 c7 04 24 00 00 00 00 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc 
> cc cc cc <0f> 0b e9 39 ff ff ff 3d 00 fe ff ff 0f 85 75 96 47 00 ebf
>
> [74084.961199] RSP: 0000:ffffbed6812ebb90 EFLAGS: 00010202
>
> [74084.966435] RAX: 0000000000000000 RBX: ffffbed6812ebc50 RCX: 
> 0000000000000000
>
> [74084.973578] RDX: ffffbed6812ebc70 RSI: ffffbed6812ebc60 RDI: 
> ffffbed6812ebc50
>
> [74084.980725] RBP: ffffbed6812ebbb8 R08: 0000000000000000 R09: 
> 00000000000001ff
>
> [74084.987869] R10: ffffbed6812ebb40 R11: 0000000000000000 R12: 
> ffffbed6812ebc70
>
> [74084.995015] R13: ffffbed6812ebc60 R14: ffff963a2945cc00 R15: 
> ffff9639c7da5630
>
> [74085.002160] FS:  0000000000000000(0000) GS:ffff963d1dc80000(0000) 
> knlGS:0000000000000000
>
> [74085.010262] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>
> [74085.016016] CR2: 0000000000000000 CR3: 0000000377c0a001 CR4: 
> 00000000003706e0
>
> [74085.023164] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
> 0000000000000000
>
> [74085.030307] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 
> 0000000000000400
>
> [74085.037453] Call Trace:
>
> [74085.039911]  <TASK>
>
> [74085.042023] amdgpu_mes_self_test+0x385/0x460 [amdgpu]
>
> [74085.047293] mes_v11_0_late_init+0x44/0x50 [amdgpu]
>
> [74085.052291] amdgpu_device_ip_late_init+0x50/0x270 [amdgpu]
>
> [74085.058032] amdgpu_device_resume+0xb0/0x2d0 [amdgpu]
>
> [74085.063187] amdgpu_pmops_resume+0x37/0x70 [amdgpu]
>
> [74085.068162]  pci_pm_resume+0x68/0x100
>
> [74085.071836]  ? pci_legacy_resume+0x80/0x80
>
> [74085.075943]  dpm_run_callback+0x4c/0x160
>
> [74085.079873]  device_resume+0xad/0x210
>
> [74085.083546]  async_resume+0x1e/0x40
>
> [74085.087046] async_run_entry_fn+0x30/0x120
>
> [74085.091152] process_one_work+0x21a/0x3f0
>
> [74085.095173]  worker_thread+0x50/0x3e0
>
> [74085.098845]  ? process_one_work+0x3f0/0x3f0
>
> [74085.103039]  kthread+0xfa/0x130
>
> [74085.106189]  ? kthread_complete_and_exit+0x20/0x20
>
> [74085.110993]  ret_from_fork+0x1f/0x30
>
> [74085.114576]  </TASK>
>
> [74085.116773] ---[ end trace 0000000000000000 ]---
>
> BR
>
> Evan
>
> *From:* amd-gfx <amd-gfx-bounces at lists.freedesktop.org> *On Behalf Of 
> *Christian König
> *Sent:* Monday, February 6, 2023 5:00 PM
> *To:* Xiao, Jack <Jack.Xiao at amd.com>; Koenig, Christian 
> <Christian.Koenig at amd.com>; amd-gfx at lists.freedesktop.org; Deucher, 
> Alexander <Alexander.Deucher at amd.com>
> *Subject:* Re: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA 
> is unavailable
>
> Am 06.02.23 um 09:28 schrieb Xiao, Jack:
>
>     [AMD Official Use Only - General]
>
>                    >> >> It's simply not allowed to free up resources
>     during suspend since those can't be acquired again during resume.
>
>                                   >> The in_suspend flag is set at the
>     beginning of suspend and unset at the end of resume. It can’t
>     filter the case you mentioned.
>
>
>                    Why not? This is exactly what it should do.
>
>     [Jack] If freeing up resources during resume, it should not hit
>     the issue you described. But only checking in_suspend flag would
>     take these cases as warning.
>
>
> No, once more: Freeing up or allocating resources between suspend and 
> resume is illegal!
>
> If you free up a resource during resume you should absolutely hit 
> that, this is intentional!
>
> Regards,
> Christian.
>
>     Regards,
>
>     Jack
>
>     *From:* Koenig, Christian <Christian.Koenig at amd.com>
>     <mailto:Christian.Koenig at amd.com>
>     *Sent:* Monday, February 6, 2023 4:06 PM
>     *To:* Xiao, Jack <Jack.Xiao at amd.com> <mailto:Jack.Xiao at amd.com>;
>     Christian König <ckoenig.leichtzumerken at gmail.com>
>     <mailto:ckoenig.leichtzumerken at gmail.com>;
>     amd-gfx at lists.freedesktop.org; Deucher, Alexander
>     <Alexander.Deucher at amd.com> <mailto:Alexander.Deucher at amd.com>
>     *Subject:* Re: [PATCH] drm/amdgpu: only WARN freeing buffers when
>     DMA is unavailable
>
>     Am 06.02.23 um 08:23 schrieb Xiao, Jack:
>
>         [AMD Official Use Only - General]
>
>         >> Nope, that is not related to any hw state.
>
>         can use other flag.
>
>         >> It's simply not allowed to free up resources during suspend
>         since those can't be acquired again during resume.
>
>         The in_suspend flag is set at the beginning of suspend and
>         unset at the end of resume. It can’t filter the case you
>         mentioned.
>
>
>     Why not? This is exactly what it should do.
>
>         Do you know the root cause of these cases hitting the issue?
>         So that we can get an exact point to warn the freeing up behavior.
>
>
>     Well the root cause are programming errors. See between suspending
>     and resuming you should not allocate nor free memory.
>
>     Otherwise we can run into trouble. And this check here is one part
>     of that, we should probably add another warning during allocation
>     of memory. But this here is certainly correct.
>
>     Regards,
>     Christian.
>
>         Thanks,
>
>         Jack
>
>         *From:* Christian König <ckoenig.leichtzumerken at gmail.com>
>         <mailto:ckoenig.leichtzumerken at gmail.com>
>         *Sent:* Friday, February 3, 2023 9:20 PM
>         *To:* Xiao, Jack <Jack.Xiao at amd.com>
>         <mailto:Jack.Xiao at amd.com>; Koenig, Christian
>         <Christian.Koenig at amd.com> <mailto:Christian.Koenig at amd.com>;
>         amd-gfx at lists.freedesktop.org; Deucher, Alexander
>         <Alexander.Deucher at amd.com> <mailto:Alexander.Deucher at amd.com>
>         *Subject:* Re: [PATCH] drm/amdgpu: only WARN freeing buffers
>         when DMA is unavailable
>
>         Nope, that is not related to any hw state.
>
>         It's simply not allowed to free up resources during suspend
>         since those can't be acquired again during resume.
>
>         We had a couple of cases now where this was wrong. If you get
>         a warning from that please fix the code which tried to free
>         something during suspend instead.
>
>         Regards,
>         Christian.
>
>         Am 03.02.23 um 07:04 schrieb Xiao, Jack:
>
>             [AMD Official Use Only - General]
>
>             >> It's simply illegal to free up memory during suspend.
>
>             Why? In my understanding, the limit was caused by DMA
>             shutdown.
>
>             Regards,
>
>             Jack
>
>             *From:* Koenig, Christian <Christian.Koenig at amd.com>
>             <mailto:Christian.Koenig at amd.com>
>             *Sent:* Thursday, February 2, 2023 7:43 PM
>             *To:* Xiao, Jack <Jack.Xiao at amd.com>
>             <mailto:Jack.Xiao at amd.com>; amd-gfx at lists.freedesktop.org;
>             Deucher, Alexander <Alexander.Deucher at amd.com>
>             <mailto:Alexander.Deucher at amd.com>
>             *Subject:* AW: [PATCH] drm/amdgpu: only WARN freeing
>             buffers when DMA is unavailable
>
>             Big NAK to this! This warning is not related in any way to
>             the hw state.
>
>             It's simply illegal to free up memory during suspend.
>
>             Regards,
>
>             Christian.
>
>             ------------------------------------------------------------------------
>
>             *Von:*Xiao, Jack <Jack.Xiao at amd.com>
>             *Gesendet:* Donnerstag, 2. Februar 2023 10:54
>             *An:* amd-gfx at lists.freedesktop.org
>             <amd-gfx at lists.freedesktop.org>; Deucher, Alexander
>             <Alexander.Deucher at amd.com>; Koenig, Christian
>             <Christian.Koenig at amd.com>
>             *Cc:* Xiao, Jack <Jack.Xiao at amd.com>
>             *Betreff:* [PATCH] drm/amdgpu: only WARN freeing buffers
>             when DMA is unavailable
>
>             Reduce waringings, only warn when DMA is unavailable.
>
>             Signed-off-by: Jack Xiao <Jack.Xiao at amd.com>
>             ---
>              drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 3 ++-
>              1 file changed, 2 insertions(+), 1 deletion(-)
>
>             diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>             b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>             index 2d237f3d3a2e..e3e3764ea697 100644
>             --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>             +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>             @@ -422,7 +422,8 @@ void amdgpu_bo_free_kernel(struct
>             amdgpu_bo **bo, u64 *gpu_addr,
>                      if (*bo == NULL)
>                              return;
>
>             - WARN_ON(amdgpu_ttm_adev((*bo)->tbo.bdev)->in_suspend);
>             + WARN_ON(amdgpu_ttm_adev((*bo)->tbo.bdev)->in_suspend &&
>             +
>             !amdgpu_ttm_adev((*bo)->tbo.bdev)->ip_blocks[AMD_IP_BLOCK_TYPE_SDMA].status.hw);
>
>                      if (likely(amdgpu_bo_reserve(*bo, true) == 0)) {
>                              if (cpu_addr)
>             -- 
>             2.37.3
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20230210/04a1ad35/attachment-0001.htm>


More information about the amd-gfx mailing list