<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<p><br>
</p>
<div class="moz-cite-prefix">On 2023-09-25 05:32, Jesse Zhang wrote:<br>
</div>
<blockquote type="cite" cite="mid:20230925093230.3891970-1-jesse.zhang@amd.com">
<pre class="moz-quote-pre" wrap="">Fix the svm_bo refcount warnging by check the refcount before release.
[ 462.649530] ------------[ cut here ]------------
[ 462.649532] refcount_t: underflow; use-after-free.
[ 462.649536] WARNING: CPU: 7 PID: 1936 at lib/refcount.c:28 refcount_warn_saturate+0xf8/0x150
[ 462.649542] Modules linked in: amdgpu(E) amdxcp drm_buddy gpu_sched drm_suballoc_helper drm_ttm_helper ttm(E) drm_display_helper cec rc_core drm_kms_helper i2c_algo_bit syscopyarea sysfillrect sysimgblt rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace fscache netfs tls r8153_ecm cdc_ether usbnet r8152 mii joydev input_leds snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi hid_generic intel_rapl_msr snd_hda_intel intel_rapl_common snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec edac_mce_amd snd_hda_core usbhid snd_hwdep kvm_amd hid kvm snd_pcm sunrpc crct10dif_pclmul ghash_clmulni_intel snd_seq_midi sha512_ssse3 snd_seq_midi_event aesni_intel snd_rawmidi crypto_simd cryptd snd_seq rapl snd_seq_device snd_pci_acp5x snd_timer snd_rn_pci_acp3x wmi_bmof snd_acp_config snd snd_soc_acpi soundcore ccp snd_pci_acp3x k10temp mac_hid amd_pmc sch_fq_codel binfmt_misc msr parport_pc ppdev lp drm parport efi_pstore ip_tables x_tables autofs4 thunderbolt crc32_pclmul nvme i2c_piix4 nvme_core xhci_pci
[ 462.649576] xhci_pci_renesas video wmi
[ 462.649577] CPU: 7 PID: 1936 Comm: kworker/7:3 Tainted: G E 6.3.7+ #25
[ 462.649579] Hardware name: AMD Splinter/Splinter-PHX, BIOS WS43906N_857 09/04/2023
[ 462.649580] Workqueue: events svm_range_deferred_list_work [amdgpu]
[ 462.649771] RIP: 0010:refcount_warn_saturate+0xf8/0x150
[ 462.649773] Code: eb a1 0f b6 1d 7c 58 c7 01 80 fb 01 0f 87 11 64 83 00 83 e3 01 75 8c 48 c7 c7 a0 00 bc b6 c6 05 60 58 c7 01 01 e8 48 1f 9a ff <0f> 0b e9 72 ff ff ff 0f b6 1d 4b 58 c7 01 80 fb 01 0f 87 ce 63 83
[ 462.649773] RSP: 0018:ffffb6660603bd88 EFLAGS: 00010286
[ 462.649774] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000027
[ 462.649775] RDX: 0000000000000027 RSI: ffffb6660603bc48 RDI: ffff91f77e7e1548
[ 462.649776] RBP: ffffb6660603bd90 R08: 0000000000000003 R09: 0000000000000001
[ 462.649776] R10: 0000000000000001 R11: 0000000000000028 R12: ffff91f453fb2000
[ 462.649777] R13: 00000007f7cfc4c4 R14: ffff91f451f2f480 R15: 00000007f7cfc4c1
[ 462.649777] FS: 0000000000000000(0000) GS:ffff91f77e7c0000(0000) knlGS:0000000000000000
[ 462.649778] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 462.649778] CR2: 00007f7cfc4c9000 CR3: 0000000113c52000 CR4: 0000000000750ee0
[ 462.649779] PKRU: 55555554
[ 462.649779] Call Trace:
[ 462.649780] <TASK>
[ 462.649782] ? show_regs+0x6c/0x80
[ 462.649784] ? refcount_warn_saturate+0xf8/0x150
[ 462.649786] ? __warn+0x93/0x150
[ 462.649788] ? refcount_warn_saturate+0xf8/0x150
[ 462.649789] ? report_bug+0x1c6/0x1e0
[ 462.649791] ? irq_work_queue+0x14/0x50
[ 462.649794] ? handle_bug+0x46/0x80
[ 462.649796] ? exc_invalid_op+0x1d/0x80
[ 462.649797] ? asm_exc_invalid_op+0x1f/0x30
[ 462.649799] ? refcount_warn_saturate+0xf8/0x150
[ 462.649800] ? refcount_warn_saturate+0xf8/0x150
[ 462.649801] svm_range_free+0xeb/0xf0 [amdgpu]
[ 462.649907] svm_range_handle_list_op+0x1ae/0x1e0 [amdgpu]
[ 462.650000] svm_range_deferred_list_work+0x149/0x2c0 [amdgpu]
[ 462.650091] process_one_work+0x21c/0x430
[ 462.650094] worker_thread+0x4e/0x3c0
[ 462.650095] ? __pfx_worker_thread+0x10/0x10
[ 462.650096] kthread+0xf2/0x120
[ 462.650098] ? __pfx_kthread+0x10/0x10
[ 462.650099] ret_from_fork+0x29/0x50
[ 462.650101] </TASK>
[ 462.650102] ---[ end trace 0000000000000000 ]---
Signed-off-by: Jesse Zhang <a class="moz-txt-link-rfc2396E" href="mailto:Jesse.Zhang@amd.com"><Jesse.Zhang@amd.com></a>
---
drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index fcdde9f451bb..44c3f22cb4a1 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -436,7 +436,7 @@ void svm_range_bo_unref_async(struct svm_range_bo *svm_bo)
static void svm_range_bo_unref(struct svm_range_bo *svm_bo)
{
- if (svm_bo)
+ if (svm_bo && kref_read(&svm_bo->kref))</pre>
</blockquote>
<p>This just workaround the issue, looks like it is user-after-free
bug or svm_bo refcount leaking, we should fix the root cause.</p>
<p>The kernel seems from tip amd-staging-drm-next, is this
regression?<br>
</p>
<p>Regards,</p>
<p>Philip<br>
</p>
<blockquote type="cite" cite="mid:20230925093230.3891970-1-jesse.zhang@amd.com">
<pre class="moz-quote-pre" wrap="">
kref_put(&svm_bo->kref, svm_range_bo_release);
}
</pre>
</blockquote>
</body>
</html>