[PATCH v2] drm/amdgpu: Fix the BO release clear memory warning

Christian König christian.koenig at amd.com
Fri Jun 7 08:10:25 UTC 2024


Am 06.06.24 um 22:19 schrieb Mario Limonciello:
> On 6/6/2024 15:04, Arunpravin Paneer Selvam wrote:
>> This happens when the amdgpu_bo_release_notify running
>> before amdgpu_ttm_set_buffer_funcs_status set the buffer
>> funcs to enabled.
>>
>> check the buffer funcs enablement before calling the fill
>> buffer memory.
>>
>> v2:(Christian)
>>    - Apply it only for GEM buffers and since GEM buffers are only
>>      allocated/freed while the driver is loaded we never run into
>>      the issue to clear with buffer funcs disabled.
>>
>> Log snip:
>> [    6.036477] [drm:amdgpu_fill_buffer [amdgpu]] *ERROR* Trying to 
>> clear memory with ring turned off.
>> [    6.036667] ------------[ cut here ]------------
>> [    6.036668] WARNING: CPU: 3 PID: 370 at 
>> drivers/gpu/drm/amd/amdgpu/amdgpu_object.c:1355 
>> amdgpu_bo_release_notify+0x201/0x220 [amdgpu]
>> [    6.036767] Modules linked in: hid_generic amdgpu(+) amdxcp 
>> drm_exec gpu_sched drm_buddy i2c_algo_bit usbhid drm_suballoc_helper 
>> drm_display_helper hid sd_mod cec rc_core drm_ttm_helper ahci ttm 
>> nvme libahci drm_kms_helper nvme_core r8169 xhci_pci libata t10_pi 
>> xhci_hcd realtek crc32_pclmul crc64_rocksoft mdio_devres crc64 drm 
>> crc32c_intel scsi_mod usbcore thunderbolt crc_t10dif i2c_piix4 libphy 
>> crct10dif_generic crct10dif_pclmul crct10dif_common scsi_common 
>> usb_common video wmi gpio_amdpt gpio_generic button
>> [    6.036793] CPU: 3 PID: 370 Comm: (udev-worker) Not tainted 
>> 6.8.7-dirty #1
>> [    6.036795] Hardware name: ASRock X670E Taichi/X670E Taichi, BIOS 
>> 2.10 03/26/2024
>> [    6.036796] RIP: 0010:amdgpu_bo_release_notify+0x201/0x220 [amdgpu]
>> [    6.036891] Code: 0b e9 af fe ff ff 48 ba ff ff ff ff ff ff ff 7f 
>> 31 f6 4c 89 e7 e8 7f 2f 7a d8 eb 98 e8 18 28 7a d8 eb b2 0f 0b e9 58 
>> fe ff ff <0f> 0b eb a7 be 03 00 00 00 e8 e1 89 4e d8 eb 9b e8 aa 4d 
>> ad d8 66
>> [    6.036892] RSP: 0018:ffffbbe140d1f638 EFLAGS: 00010282
>> [    6.036894] RAX: 00000000ffffffea RBX: ffff90cba9e4e858 RCX: 
>> ffff90dabde38c28
>> [    6.036895] RDX: 0000000000000000 RSI: 00000000ffffdfff RDI: 
>> 0000000000000001
>> [    6.036896] RBP: ffff90cba980ef40 R08: 0000000000000000 R09: 
>> ffffbbe140d1f3c0
>> [    6.036896] R10: ffffbbe140d1f3b8 R11: 0000000000000003 R12: 
>> ffff90cba9e4e800
>> [    6.036897] R13: ffff90cba9e4e958 R14: ffff90cba980ef40 R15: 
>> 0000000000000258
>> [    6.036898] FS:  00007f2bd1679d00(0000) GS:ffff90da7e2c0000(0000) 
>> knlGS:0000000000000000
>> [    6.036899] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [    6.036900] CR2: 000055a9b0f7299d CR3: 000000011bb6e000 CR4: 
>> 0000000000750ef0
>> [    6.036901] PKRU: 55555554
>> [    6.036901] Call Trace:
>> [    6.036903]  <TASK>
>> [    6.036904]  ? amdgpu_bo_release_notify+0x201/0x220 [amdgpu]
>> [    6.036998]  ? __warn+0x81/0x130
>> [    6.037002]  ? amdgpu_bo_release_notify+0x201/0x220 [amdgpu]
>> [    6.037095]  ? report_bug+0x171/0x1a0
>> [    6.037099]  ? handle_bug+0x3c/0x80
>> [    6.037101]  ? exc_invalid_op+0x17/0x70
>> [    6.037103]  ? asm_exc_invalid_op+0x1a/0x20
>> [    6.037107]  ? amdgpu_bo_release_notify+0x201/0x220 [amdgpu]
>> [    6.037199]  ? amdgpu_bo_release_notify+0x14a/0x220 [amdgpu]
>> [    6.037292]  ttm_bo_release+0xff/0x2e0 [ttm]
>> [    6.037297]  ? srso_alias_return_thunk+0x5/0xfbef5
>> [    6.037299]  ? srso_alias_return_thunk+0x5/0xfbef5
>> [    6.037301]  ? ttm_resource_move_to_lru_tail+0x140/0x1e0 [ttm]
>> [    6.037306]  amdgpu_bo_free_kernel+0xcb/0x120 [amdgpu]
>> [    6.037399]  dm_helpers_free_gpu_mem+0x41/0x80 [amdgpu]
>> [    6.037544]  dcn315_clk_mgr_construct+0x198/0x7e0 [amdgpu]
>> [    6.037692]  dc_clk_mgr_create+0x16e/0x5f0 [amdgpu]
>> [    6.037826]  dc_create+0x28a/0x650 [amdgpu]
>> [    6.037958]  amdgpu_dm_init.isra.0+0x2d5/0x1ec0 [amdgpu]
>> [    6.038085]  ? srso_alias_return_thunk+0x5/0xfbef5
>> [    6.038087]  ? prb_read_valid+0x1b/0x30
>> [    6.038089]  ? srso_alias_return_thunk+0x5/0xfbef5
>> [    6.038090]  ? console_unlock+0x78/0x120
>> [    6.038092]  ? srso_alias_return_thunk+0x5/0xfbef5
>> [    6.038094]  ? vprintk_emit+0x175/0x2c0
>> [    6.038095]  ? srso_alias_return_thunk+0x5/0xfbef5
>> [    6.038097]  ? srso_alias_return_thunk+0x5/0xfbef5
>> [    6.038098]  ? dev_printk_emit+0xa5/0xd0
>> [    6.038104]  dm_hw_init+0x12/0x30 [amdgpu]
>> [    6.038209]  amdgpu_device_init+0x1e50/0x2500 [amdgpu]
>> [    6.038308]  ? srso_alias_return_thunk+0x5/0xfbef5
>> [    6.038310]  ? srso_alias_return_thunk+0x5/0xfbef5
>> [    6.038313]  amdgpu_driver_load_kms+0x19/0x190 [amdgpu]
>> [    6.038409]  amdgpu_pci_probe+0x18b/0x510 [amdgpu]
>> [    6.038505]  local_pci_probe+0x42/0xa0
>> [    6.038508]  pci_device_probe+0xc7/0x240
>> [    6.038510]  really_probe+0x19b/0x3e0
>> [    6.038513]  ? __pfx___driver_attach+0x10/0x10
>> [    6.038514]  __driver_probe_device+0x78/0x160
>> [    6.038516]  driver_probe_device+0x1f/0x90
>> [    6.038517]  __driver_attach+0xd2/0x1c0
>> [    6.038519]  bus_for_each_dev+0x85/0xd0
>> [    6.038521]  bus_add_driver+0x116/0x220
>> [    6.038523]  driver_register+0x59/0x100
>> [    6.038525]  ? __pfx_amdgpu_init+0x10/0x10 [amdgpu]
>> [    6.038618]  do_one_initcall+0x58/0x320
>> [    6.038621]  do_init_module+0x60/0x230
>> [    6.038624]  init_module_from_file+0x89/0xe0
>> [    6.038628]  idempotent_init_module+0x120/0x2b0
>> [    6.038630]  __x64_sys_finit_module+0x5e/0xb0
>> [    6.038632]  do_syscall_64+0x84/0x1a0
>> [    6.038634]  ? do_syscall_64+0x90/0x1a0
>> [    6.038635]  ? srso_alias_return_thunk+0x5/0xfbef5
>> [    6.038637]  ? do_syscall_64+0x90/0x1a0
>> [    6.038638]  ? srso_alias_return_thunk+0x5/0xfbef5
>> [    6.038639]  ? do_syscall_64+0x90/0x1a0
>> [    6.038640]  ? srso_alias_return_thunk+0x5/0xfbef5
>> [    6.038642]  ? srso_alias_return_thunk+0x5/0xfbef5
>> [    6.038644]  entry_SYSCALL_64_after_hwframe+0x78/0x80
>> [    6.038645] RIP: 0033:0x7f2bd1e9d059
>> [    6.038647] Code: 08 89 e8 5b 5d c3 66 2e 0f 1f 84 00 00 00 00 00 
>> 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 
>> 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8f 1d 0d 00 f7 d8 64 
>> 89 01 48
>> [    6.038648] RSP: 002b:00007fffaf804878 EFLAGS: 00000246 ORIG_RAX: 
>> 0000000000000139
>> [    6.038650] RAX: ffffffffffffffda RBX: 000055a9b2328d60 RCX: 
>> 00007f2bd1e9d059
>> [    6.038650] RDX: 0000000000000000 RSI: 00007f2bd1fd0509 RDI: 
>> 0000000000000024
>> [    6.038651] RBP: 0000000000000000 R08: 0000000000000040 R09: 
>> 000055a9b23000a0
>> [    6.038652] R10: 0000000000000038 R11: 0000000000000246 R12: 
>> 00007f2bd1fd0509
>> [    6.038652] R13: 0000000000020000 R14: 000055a9b2326f90 R15: 
>> 0000000000000000
>> [    6.038655]  </TASK>
>> [    6.038656] ---[ end trace 0000000000000000 ]---
>>
>> Cc: <stable at vger.kernel.org> # 6.10+
>
> I think the stable tag really won't be needed and could be dropped 
> when this is committed.  This will presumably go into a -fixes PR for 
> 6.10.

Yeah agree. Just make sure that you push this into drm-misc-fixes to be 
sure the patch makes it into 6.10.

Feel free to add Reviewed-by: Christian König <christian.koenig at amd.com>.

Regards,
Christian.

>
>> Fixes: a68c7eaa7a8f ("drm/amdgpu: Enable clear page functionality")
>> Signed-off-by: Arunpravin Paneer Selvam 
>> <Arunpravin.PaneerSelvam at amd.com>
>> Suggested-by: Christian König <christian.koenig at amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c    | 1 +
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 2 --
>>   2 files changed, 1 insertion(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>> index 67c234bcf89f..3adaa4670103 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>> @@ -108,6 +108,7 @@ int amdgpu_gem_object_create(struct amdgpu_device 
>> *adev, unsigned long size,
>>         memset(&bp, 0, sizeof(bp));
>>       *obj = NULL;
>> +    flags |= AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE;
>>         bp.size = size;
>>       bp.byte_align = alignment;
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>> index 8d8c39be6129..c556c8b653fa 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>> @@ -604,8 +604,6 @@ int amdgpu_bo_create(struct amdgpu_device *adev,
>>       if (!amdgpu_bo_support_uswc(bo->flags))
>>           bo->flags &= ~AMDGPU_GEM_CREATE_CPU_GTT_USWC;
>>   -    bo->flags |= AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE;
>> -
>>       bo->tbo.bdev = &adev->mman.bdev;
>>       if (bp->domain & (AMDGPU_GEM_DOMAIN_GWS | AMDGPU_GEM_DOMAIN_OA |
>>                 AMDGPU_GEM_DOMAIN_GDS))
>



More information about the amd-gfx mailing list