[PATCH] Revert "drm/radeon: use GEM references instead of TTMs"

Christian König christian.koenig at amd.com
Wed Oct 2 06:43:08 UTC 2024


Yes that is a known issue with the driver at the moment.

It needs a three line change to init the GEM functions earlier than 
before. I'm currently working on this fix.

Regards,
Christian.

Am 01.10.24 um 15:50 schrieb Mingcong Bai:
> Hi Huacai,
>
> 在 2024-09-29 15:50,Huacai Chen 写道:
>> This reverts commit fd69ef05029f9beb7b031ef96e7a36970806a670.
>>
>> The original patch causes NULL pointer references:
>>
>> [   21.620856] CPU 3 Unable to handle kernel paging request at 
>> virtual address 0000000000000000, era == 9000000004bf61d8, ra == 
>> 9000000004bf61d4
>> [   21.717958] Oops[#1]:
>> [   21.803205] CPU: 3 UID: 0 PID: 706 Comm: Xorg Not tainted 6.11.0+ 
>> #1708
>> [   21.894451] Hardware name: Loongson 
>> Loongson-3A5000-7A1000-1w-CRB/Loongson-LS3A5000-7A1000-1w-CRB, BIOS 
>> vUDK2018-LoongArch-V2.0.0-prebeta9 10/21/2022
>> [   21.996576] pc 9000000004bf61d8 ra 9000000004bf61d4 tp 
>> 9000000110560000 sp 9000000110563d40
>> [   22.094731] a0 000000000000002d a1 9000000000580788 a2 
>> 9000000000584d78 a3 9000000005678f40
>> [   22.193513] a4 9000000005678f38 a5 9000000110563b70 a6 
>> 0000000000000001 a7 0000000000000001
>> [   22.291993] t0 0000000000000000 t1 78315f0d31fceafb t2 
>> 0000000000000000 t3 00000000000003c4
>> [   22.389868] t4 9000000101d65840 t5 0000000000000003 t6 
>> 0000000000000003 t7 ffffffffffffffff
>> [   22.488326] t8 0000000000000001 u0 9000000120c31e20 s9 
>> 9000000110563ec0 s0 90000001107e0868
>> [   22.587345] s1 ffff80000230c000 s2 9000000120c31e48 s3 
>> 9000000120c31e00 s4 90000001063b0000
>> [   22.685908] s5 9000000120c31e20 s6 0000000000000122 s7 
>> 0000000000000100 s8 000055555c079570
>> [   22.785169]    ra: 9000000004bf61d4 drm_gem_object_free+0x24/0x70
>> [   22.881896]   ERA: 9000000004bf61d8 drm_gem_object_free+0x28/0x70
>> [   22.978212]  CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE)
>> [   23.076423]  PRMD: 00000004 (PPLV0 +PIE -PWE)
>> [   23.153679] [drm] amdgpu kernel modesetting enabled.
>> [   23.173074]  EUEN: 00000000 (-FPE -SXE -ASXE -BTE)
>> [   23.365633]  ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7)
>> [   23.459680] ESTAT: 00010000 [PIL] (IS= ECode=1 EsubCode=0)
>> [   23.554473]  BADV: 0000000000000000
>> [   23.646222]  PRID: 0014c010 (Loongson-64bit, Loongson-3A5000)
>> [   23.740356] Modules linked in: amdgpu rfkill nft_fib_inet 
>> nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 
>> nf_reject_ipv6 nft_reject nft_ct drm_exec amdxcps
>> [   23.973584] Process Xorg (pid: 706, threadinfo=000000005fc343eb, 
>> task=000000007bdfdf49)
>> [   24.080528] Stack : 9000000120d86000 ffff8000021bb1c0 
>> 0000000000000000 ffff8000022a6bcc
>> [   24.188191]         0000000000000122 9000000120c31d08 
>> 900000010e04a400 9000000120c31e00
>> [   24.295420]         90000001063b0008 9000000120c31c00 
>> 90000001063b0000 ffff80000219c54c
>> [   24.402622]         00000000000000b4 90000001063b0170 
>> 90000001063b0008 9000000120c31c00
>> [   24.509242]         9000000120c31ce0 90000000043966f8 
>> 000055555c0922c0 000055555c082ac0
>> [   24.615887]         000055555597b000 0000000000000000 
>> 90000001034af840 90000001063f7928
>> [   24.723086]         90000001063b00d0 9000000120c31c00 
>> 90000001063b0008 9000000004396844
>> [   24.830582]         90000001017901a0 90000001017901a0 
>> 900000010e7e6718 00000000000a001b
>> [   24.937455]         90000001228b86c0 9000000003ad5904 
>> 000055555c082da0 0000000000000000
>> [   25.043806]         000055555c082ac0 90000001228b86c0 
>> 0000000000000000 9000000003acfb58
>> [   25.149701]         ...
>> [   25.248708] Call Trace:
>> [   25.248710] [<9000000004bf61d8>] drm_gem_object_free+0x28/0x70
>> [   25.447554] [<ffff8000021bb1bc>] radeon_bo_unref+0x3c/0x60 [radeon]
>> [   25.549201] [<ffff8000022a6bc8>] radeon_vm_fini+0x188/0x2c0 [radeon]
>> [   25.650751] [<ffff80000219c548>] 
>> radeon_driver_postclose_kms+0x188/0x1e0 [radeon]
>> [   25.753856] [<90000000043966f4>] drm_file_free+0x214/0x2a0
>> [   25.854893] [<9000000004396840>] drm_release+0xc0/0x160
>> [   25.954337] [<9000000003ad5900>] __fput+0x100/0x340
>> [   26.052437] [<9000000003acfb54>] sys_close+0x34/0xa0
>> [   26.148701] [<9000000004c04170>] do_syscall+0xb0/0x160
>>
>
> This appears to be a non-LoongArch specific issue as I was able to 
> reproduce this on my Intel platform (H310 chipset, Pentium Gold G5620) 
> with an AMD Radeon R7 240 (Oland) connected via HDMI.
>
> Happy to provide more testing results if needed, but below is the log 
> where the crash occurred:
>
> kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000
> kernel: #PF: supervisor read access in kernel mode
> kernel: #PF: error_code(0x0000) - not-present page
> kernel: PGD 0 P4D 0
> kernel: Oops: Oops: 0000 [#1] PREEMPT SMP PTI
> kernel: CPU: 3 UID: 0 PID: 952 Comm: ddcutil Not tainted 
> 6.11.0-aosc-main-11993-g3efc57369a0c #1
> kernel: Hardware name: System manufacturer System Product Name/PRIME 
> H310M-F R2.0, BIOS 1401 03/31/2020
> kernel: RIP: 0010:drm_gem_object_free+0x10/0x30
> kernel: Code: 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 
> 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 48 8b 87 40 01 00 00 
> <48> 8b 00 48 85 c0 74 06 ff e0 cc 66 90 cc 0f 0b 31 >
> kernel: RSP: 0018:ffffb0f300b23de8 EFLAGS: 00010246
> kernel: RAX: 0000000000000000 RBX: ffff918b0487a000 RCX: 000000000000000c
> kernel: RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff918b1eee2468
> kernel: RBP: ffff918b197d9000 R08: 0000000000000000 R09: 0000000000000000
> kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff918b179cc000
> kernel: R13: ffff918b03ee0800 R14: ffff918b197d9048 R15: ffff918b197d92e0
> kernel: FS:  00007ffb58033b80(0000) GS:ffff918b32d80000(0000) 
> knlGS:0000000000000000
> kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> kernel: CR2: 0000000000000000 CR3: 000000011eda4005 CR4: 00000000003706f0
> kernel: Call Trace:
> kernel:  <TASK>
> kernel:  ? __die+0x23/0x80
> kernel:  ? page_fault_oops+0x14f/0x560
> kernel:  ? exc_page_fault+0x84/0x1c0
> kernel:  ? asm_exc_page_fault+0x26/0x30
> kernel:  ? drm_gem_object_free+0x10/0x30
> kernel:  radeon_bo_unref+0x64/0x80 [radeon]
> kernel:  radeon_vm_fini+0x1d0/0x260 [radeon]
> kernel:  radeon_driver_postclose_kms+0x164/0x190 [radeon]
> kernel:  drm_file_free+0x1f3/0x250
> kernel:  drm_release+0xaa/0x120
> kernel:  __fput+0xdc/0x2a0
> kernel:  __x64_sys_close+0x3c/0x80
> kernel:  do_syscall_64+0x64/0x150
> kernel:  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> kernel: RIP: 0033:0x7ffb57ef9430
> kernel: Code: 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 
> 00 00 0f 1f 44 00 00 80 3d 39 8f 11 00 00 74 17 b8 03 00 00 00 0f 05 
> <48> 3d 00 f0 ff ff 77 48 c3 0f 1f 80 00 00 00 00 48 >
> kernel: RSP: 002b:00007ffd59048868 EFLAGS: 00000202 ORIG_RAX: 
> 0000000000000003
> kernel: RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007ffb57ef9430
> kernel: RDX: 000000055c96b7fe RSI: 0000000000000001 RDI: 0000000000000003
> kernel: RBP: 0000000000000001 R08: 0000000000000007 R09: 000055c96b7fe430
> kernel: R10: a563eae46f2f347c R11: 0000000000000202 R12: 0000000000000000
> kernel: R13: 000055c9634e44b8 R14: 0000000000000010 R15: 000055c96347e698
> kernel:  </TASK>
> kernel: Modules linked in: joydev mousedev input_leds snd_soc_avs 
> snd_soc_hda_codec snd_hda_ext_core intel_rapl_msr iTCO_wdt 
> intel_rapl_common intel_pmc_bxt intel_uncore_frequency snd_soc_core >
> kernel:  drm_ttm_helper ttm video wmi hid_logitech_dj hid_generic 
> sunrpc coretemp i2c_dev
> kernel: CR2: 0000000000000000
> kernel: ---[ end trace 0000000000000000 ]---
> kernel: RIP: 0010:drm_gem_object_free+0x10/0x30
> kernel: Code: 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 
> 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 48 8b 87 40 01 00 00 
> <48> 8b 00 48 85 c0 74 06 ff e0 cc 66 90 cc 0f 0b 31 >
> kernel: RSP: 0018:ffffb0f300b23de8 EFLAGS: 00010246
> kernel: RAX: 0000000000000000 RBX: ffff918b0487a000 RCX: 000000000000000c
> kernel: RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff918b1eee2468
> kernel: RBP: ffff918b197d9000 R08: 0000000000000000 R09: 0000000000000000
> kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff918b179cc000
> kernel: R13: ffff918b03ee0800 R14: ffff918b197d9048 R15: ffff918b197d92e0
> kernel: FS:  00007ffb58033b80(0000) GS:ffff918b32d80000(0000) 
> knlGS:0000000000000000
> kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> kernel: CR2: 0000000000000000 CR3: 000000011eda4005 CR4: 00000000003706f0
>
>> The root cause is obj->funcs is NULL in drm_gem_object_free(). Only
>> fbdev bo is created by radeon_gem_object_create() and has valid 'funcs'.
>>
>> Maybe there is a better way to fix this bug, but since amdgpu driver
>> also use ttm helpers in amdgpu_bo_ref()/amdgpu_bo_unref() now, I think
>> it is also reasonable to just revert the original commit.
>> ---
>>  drivers/gpu/drm/radeon/radeon_gem.c    | 2 +-
>>  drivers/gpu/drm/radeon/radeon_object.c | 7 +++++--
>>  2 files changed, 6 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/radeon/radeon_gem.c 
>> b/drivers/gpu/drm/radeon/radeon_gem.c
>> index 9735f4968b86..210e8d43bb23 100644
>> --- a/drivers/gpu/drm/radeon/radeon_gem.c
>> +++ b/drivers/gpu/drm/radeon/radeon_gem.c
>> @@ -88,7 +88,7 @@ static void radeon_gem_object_free(struct 
>> drm_gem_object *gobj)
>>
>>      if (robj) {
>>          radeon_mn_unregister(robj);
>> -        ttm_bo_put(&robj->tbo);
>> +        radeon_bo_unref(&robj);
>>      }
>>  }
>>
>> diff --git a/drivers/gpu/drm/radeon/radeon_object.c 
>> b/drivers/gpu/drm/radeon/radeon_object.c
>> index d0e4b43d155c..450ff7daa46c 100644
>> --- a/drivers/gpu/drm/radeon/radeon_object.c
>> +++ b/drivers/gpu/drm/radeon/radeon_object.c
>> @@ -256,15 +256,18 @@ struct radeon_bo *radeon_bo_ref(struct 
>> radeon_bo *bo)
>>      if (bo == NULL)
>>          return NULL;
>>
>> -    drm_gem_object_get(&bo->tbo.base);
>> +    ttm_bo_get(&bo->tbo);
>>      return bo;
>>  }
>>
>>  void radeon_bo_unref(struct radeon_bo **bo)
>>  {
>> +    struct ttm_buffer_object *tbo;
>> +
>>      if ((*bo) == NULL)
>>          return;
>> -    drm_gem_object_put(&(*bo)->tbo.base);
>> +    tbo = &((*bo)->tbo);
>> +    ttm_bo_put(tbo);
>>      *bo = NULL;
>>  }
>
> Best Regards,
> Mingcong Bai



More information about the amd-gfx mailing list