<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body>
    <p><br>
    </p>
    <div class="moz-cite-prefix">On 2023-09-25 05:32, Jesse Zhang wrote:<br>
    </div>
    <blockquote type="cite" cite="mid:20230925093230.3891970-1-jesse.zhang@amd.com">
      <pre class="moz-quote-pre" wrap="">Fix the svm_bo refcount warnging by check the refcount before release.

[  462.649530] ------------[ cut here ]------------
[  462.649532] refcount_t: underflow; use-after-free.
[  462.649536] WARNING: CPU: 7 PID: 1936 at lib/refcount.c:28 refcount_warn_saturate+0xf8/0x150
[  462.649542] Modules linked in: amdgpu(E) amdxcp drm_buddy gpu_sched drm_suballoc_helper drm_ttm_helper ttm(E) drm_display_helper cec rc_core drm_kms_helper i2c_algo_bit syscopyarea sysfillrect sysimgblt rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace fscache netfs tls r8153_ecm cdc_ether usbnet r8152 mii joydev input_leds snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi hid_generic intel_rapl_msr snd_hda_intel intel_rapl_common snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec edac_mce_amd snd_hda_core usbhid snd_hwdep kvm_amd hid kvm snd_pcm sunrpc crct10dif_pclmul ghash_clmulni_intel snd_seq_midi sha512_ssse3 snd_seq_midi_event aesni_intel snd_rawmidi crypto_simd cryptd snd_seq rapl snd_seq_device snd_pci_acp5x snd_timer snd_rn_pci_acp3x wmi_bmof snd_acp_config snd snd_soc_acpi soundcore ccp snd_pci_acp3x k10temp mac_hid amd_pmc sch_fq_codel binfmt_misc msr parport_pc ppdev lp drm parport efi_pstore ip_tables x_tables autofs4 thunderbolt crc32_pclmul nvme i2c_piix4 nvme_core xhci_pci
[  462.649576]  xhci_pci_renesas video wmi
[  462.649577] CPU: 7 PID: 1936 Comm: kworker/7:3 Tainted: G            E      6.3.7+ #25
[  462.649579] Hardware name: AMD Splinter/Splinter-PHX, BIOS WS43906N_857 09/04/2023
[  462.649580] Workqueue: events svm_range_deferred_list_work [amdgpu]
[  462.649771] RIP: 0010:refcount_warn_saturate+0xf8/0x150
[  462.649773] Code: eb a1 0f b6 1d 7c 58 c7 01 80 fb 01 0f 87 11 64 83 00 83 e3 01 75 8c 48 c7 c7 a0 00 bc b6 c6 05 60 58 c7 01 01 e8 48 1f 9a ff <0f> 0b e9 72 ff ff ff 0f b6 1d 4b 58 c7 01 80 fb 01 0f 87 ce 63 83
[  462.649773] RSP: 0018:ffffb6660603bd88 EFLAGS: 00010286
[  462.649774] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000027
[  462.649775] RDX: 0000000000000027 RSI: ffffb6660603bc48 RDI: ffff91f77e7e1548
[  462.649776] RBP: ffffb6660603bd90 R08: 0000000000000003 R09: 0000000000000001
[  462.649776] R10: 0000000000000001 R11: 0000000000000028 R12: ffff91f453fb2000
[  462.649777] R13: 00000007f7cfc4c4 R14: ffff91f451f2f480 R15: 00000007f7cfc4c1
[  462.649777] FS:  0000000000000000(0000) GS:ffff91f77e7c0000(0000) knlGS:0000000000000000
[  462.649778] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  462.649778] CR2: 00007f7cfc4c9000 CR3: 0000000113c52000 CR4: 0000000000750ee0
[  462.649779] PKRU: 55555554
[  462.649779] Call Trace:
[  462.649780]  <TASK>
[  462.649782]  ? show_regs+0x6c/0x80
[  462.649784]  ? refcount_warn_saturate+0xf8/0x150
[  462.649786]  ? __warn+0x93/0x150
[  462.649788]  ? refcount_warn_saturate+0xf8/0x150
[  462.649789]  ? report_bug+0x1c6/0x1e0
[  462.649791]  ? irq_work_queue+0x14/0x50
[  462.649794]  ? handle_bug+0x46/0x80
[  462.649796]  ? exc_invalid_op+0x1d/0x80
[  462.649797]  ? asm_exc_invalid_op+0x1f/0x30
[  462.649799]  ? refcount_warn_saturate+0xf8/0x150
[  462.649800]  ? refcount_warn_saturate+0xf8/0x150
[  462.649801]  svm_range_free+0xeb/0xf0 [amdgpu]
[  462.649907]  svm_range_handle_list_op+0x1ae/0x1e0 [amdgpu]
[  462.650000]  svm_range_deferred_list_work+0x149/0x2c0 [amdgpu]
[  462.650091]  process_one_work+0x21c/0x430
[  462.650094]  worker_thread+0x4e/0x3c0
[  462.650095]  ? __pfx_worker_thread+0x10/0x10
[  462.650096]  kthread+0xf2/0x120
[  462.650098]  ? __pfx_kthread+0x10/0x10
[  462.650099]  ret_from_fork+0x29/0x50
[  462.650101]  </TASK>
[  462.650102] ---[ end trace 0000000000000000 ]---

Signed-off-by: Jesse Zhang <a class="moz-txt-link-rfc2396E" href="mailto:Jesse.Zhang@amd.com"><Jesse.Zhang@amd.com></a>
---
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index fcdde9f451bb..44c3f22cb4a1 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -436,7 +436,7 @@ void svm_range_bo_unref_async(struct svm_range_bo *svm_bo)
 
 static void svm_range_bo_unref(struct svm_range_bo *svm_bo)
 {
-       if (svm_bo)
+       if (svm_bo && kref_read(&svm_bo->kref))</pre>
    </blockquote>
    <p>This just workaround the issue, looks like it is user-after-free
      bug or svm_bo refcount leaking, we should fix the root cause.</p>
    <p>The kernel seems from tip amd-staging-drm-next, is this
      regression?<br>
    </p>
    <p>Regards,</p>
    <p>Philip<br>
    </p>
    <blockquote type="cite" cite="mid:20230925093230.3891970-1-jesse.zhang@amd.com">
      <pre class="moz-quote-pre" wrap="">
                kref_put(&svm_bo->kref, svm_range_bo_release);
 }
 
</pre>
    </blockquote>
  </body>
</html>