[PATCH] drm/amdkfd: Fix dmabuf's redundant eviction when unmapping
Eric Huang
jinhuieric.huang at amd.com
Tue Apr 4 14:36:07 UTC 2023
Here is the backtrace from Jira:
Thu Nov 10 13:10:23 2022] Scheduling eviction of pid 97784 in 0 jiffies
[Thu Nov 10 13:10:23 2022] WARNING: CPU: 173 PID: 97784 at
/var/lib/dkms/amdgpu/5.16.9.22.20-1438746~20.04/build/amd/amdgpu/../amdkfd/kfd_device.c:878
kgd2kfd_schedule_evict_and_restore_process+0x104/0x120 [amdgpu]
[Thu Nov 10 13:10:23 2022] Modules linked in: veth amdgpu(OE)
amddrm_ttm_helper(OE) amdttm(OE) iommu_v2 amd_sched(OE) amdkcl(OE)
xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user
xfrm_algo xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack
nf_defrag_ipv6 nf_defrag_ipv4 bpfilter br_netfilter bridge stp llc aufs
overlay binfmt_misc nls_iso8859_1 dm_multipath scsi_dh_rdac scsi_dh_emc
scsi_dh_alua intel_rapl_msr intel_rapl_common amd64_edac edac_mce_amd
kvm_amd kvm efi_pstore rapl ipmi_ssif ccp acpi_ipmi k10temp ipmi_si
ipmi_devintf ipmi_msghandler mac_hid sch_fq_codel msr ip_tables x_tables
autofs4 btrfs blake2b_generic zstd_compress raid10 raid456
async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq
libcrc32c raid1 raid0 multipath linear mlx5_ib ib_uverbs ib_core
crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel
crypto_simd cryptd ast drm_vram_helper drm_ttm_helper ttm mlx5_core
drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops
[Thu Nov 10 13:10:23 2022] pci_hyperv_intf cec psample igb mlxfw
rc_core dca ahci xhci_pci tls drm i2c_algo_bit libahci xhci_pci_renesas
i2c_piix4
[Thu Nov 10 13:10:23 2022] CPU: 173 PID: 97784 Comm: onnxruntime_tes
Tainted: G W OE 5.13.0-30-generic #33~20.04.1-Ubuntu
[Thu Nov 10 13:10:23 2022] Hardware name: GIGABYTE
G482-Z53-YF/MZ52-G40-00, BIOS R12 05/13/2020
[Thu Nov 10 13:10:23 2022] RIP:
0010:kgd2kfd_schedule_evict_and_restore_process+0x104/0x120 [amdgpu]
[Thu Nov 10 13:10:23 2022] Code: 5e 5d c3 4c 89 e7 e8 cb c6 44 df eb e7
49 8b 45 60 48 89 ca 48 c7 c7 38 8b d7 c1 48 89 4d e0 8b b0 20 09 00 00
e8 87 ee 7e df <0f> 0b 48 8b 4d e0 eb 9f 41 be ea ff ff ff eb ba 41 be
ed ff ff ff
[Thu Nov 10 13:10:23 2022] RSP: 0018:ffffb25f2a173978 EFLAGS: 00010086
[Thu Nov 10 13:10:23 2022] RAX: 0000000000000000 RBX: 0000000000000001
RCX: 0000000000000027
[Thu Nov 10 13:10:23 2022] RDX: 0000000000000027 RSI: 00000000fffeffff
RDI: ffff95d06e4a09c8
[Thu Nov 10 13:10:23 2022] RBP: ffffb25f2a173998 R08: ffff95d06e4a09c0
R09: ffffb25f2a173750
[Thu Nov 10 13:10:23 2022] R10: 0000000000000001 R11: 0000000000000001
R12: ffff95c371d74580
[Thu Nov 10 13:10:23 2022] R13: ffff95b1cd3f2000 R14: 0000000000000000
R15: ffff95c371d74580
[Thu Nov 10 13:10:23 2022] FS: 00007fcaff268b00(0000)
GS:ffff95d06e480000(0000) knlGS:0000000000000000
[Thu Nov 10 13:10:23 2022] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Thu Nov 10 13:10:23 2022] CR2: 00007fc643980000 CR3: 00000003e9492000
CR4: 0000000000350ee0
[Thu Nov 10 13:10:23 2022] Call Trace:
[Thu Nov 10 13:10:23 2022] <TASK>
[Thu Nov 10 13:10:23 2022] amdkfd_fence_enable_signaling+0x46/0x50 [amdgpu]
[Thu Nov 10 13:10:23 2022] __dma_fence_enable_signaling+0x52/0xb0
[Thu Nov 10 13:10:23 2022] dma_fence_default_wait+0xa9/0x200
[Thu Nov 10 13:10:23 2022] dma_fence_wait_timeout+0xbd/0xe0
[Thu Nov 10 13:10:23 2022] amddma_resv_wait_timeout+0x6f/0xd0 [amdkcl]
[Thu Nov 10 13:10:23 2022] amdttm_bo_wait+0x39/0x50 [amdttm]
[Thu Nov 10 13:10:23 2022] amdgpu_bo_move+0x41e/0x7b0 [amdgpu]
[Thu Nov 10 13:10:23 2022] ? down_write+0x13/0x50
[Thu Nov 10 13:10:23 2022] ? unmap_mapping_pages+0x68/0x130
[Thu Nov 10 13:10:23 2022] ttm_bo_handle_move_mem+0x7f/0x120 [amdttm]
[Thu Nov 10 13:10:23 2022] amdttm_bo_validate+0xbf/0x100 [amdttm]
[Thu Nov 10 13:10:23 2022] kfd_mem_dmaunmap_attachment+0x131/0x140 [amdgpu]
[Thu Nov 10 13:10:23 2022] unmap_bo_from_gpuvm+0x67/0x80 [amdgpu]
[Thu Nov 10 13:10:23 2022]
amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu+0x114/0x220 [amdgpu]
[Thu Nov 10 13:10:23 2022] ? __mod_memcg_lruvec_state+0x22/0xe0
[Thu Nov 10 13:10:23 2022] kfd_ioctl_unmap_memory_from_gpu+0xe8/0x270
[amdgpu]
[Thu Nov 10 13:10:23 2022] kfd_ioctl+0x23c/0x590 [amdgpu]
[Thu Nov 10 13:10:23 2022] ?
kfd_ioctl_get_process_apertures_new+0x330/0x330 [amdgpu]
[Thu Nov 10 13:10:23 2022] ? exit_to_user_mode_prepare+0x3d/0x1c0
[Thu Nov 10 13:10:23 2022] ? __fget_files+0xa7/0xd0
[Thu Nov 10 13:10:23 2022] __x64_sys_ioctl+0x91/0xc0
[Thu Nov 10 13:10:23 2022] do_syscall_64+0x61/0xb0
[Thu Nov 10 13:10:23 2022] ? do_syscall_64+0x6e/0xb0
[Thu Nov 10 13:10:23 2022] ? do_syscall_64+0x6e/0xb0
[Thu Nov 10 13:10:23 2022] ? do_syscall_64+0x6e/0xb0
[Thu Nov 10 13:10:23 2022] ? do_syscall_64+0x6e/0xb0
[Thu Nov 10 13:10:23 2022] ? asm_sysvec_apic_timer_interrupt+0xa/0x20
[Thu Nov 10 13:10:23 2022] entry_SYSCALL_64_after_hwframe+0x44/0xae
[Thu Nov 10 13:10:23 2022] RIP: 0033:0x7fcaff57b3ab
[Thu Nov 10 13:10:23 2022] Code: 0f 1e fa 48 8b 05 e5 7a 0d 00 64 c7 00
26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10
00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d b5 7a 0d 00 f7 d8
64 89 01 48
[Thu Nov 10 13:10:23 2022] RSP: 002b:00007fffe41e0098 EFLAGS: 00000206
ORIG_RAX: 0000000000000010
[Thu Nov 10 13:10:23 2022] RAX: ffffffffffffffda RBX: 00007fcacc7f7f80
RCX: 00007fcaff57b3ab
[Thu Nov 10 13:10:23 2022] RDX: 00007fffe41e0120 RSI: 00000000c0184b19
RDI: 0000000000000003
[Thu Nov 10 13:10:23 2022] RBP: 00007fffe41e00d0 R08: 0000562e2d5730d0
R09: 0000000000000000
[Thu Nov 10 13:10:23 2022] R10: 0000562e2c928ec0 R11: 0000000000000206
R12: 0000000000000001
[Thu Nov 10 13:10:23 2022] R13: 00007fffe41e04b0 R14: 0000000000000000
R15: 0000562e2d3f5b20
[Thu Nov 10 13:10:23 2022] </TASK>
[Thu Nov 10 13:10:23 2022] ---[ end trace 1464f08f6be60b30 ]---
Regards,
Eric
On 2023-04-04 10:11, Felix Kuehling wrote:
> If we keep the BO in the GTT domain, it means it will not be updated
> if we validate it again later in kfd_mem_dmamap_dmabuf. This means
> we'll use stale DMA addresses when we update the page tables after
> evictions.
>
> I think we'll need to find a different way to avoid triggering the
> eviction fence on the original BO when changing the placement of the
> DMABuf import here. If you need help brainstorming here, please share
> a backtrace from the eviction generated with the debug_evictions
> module param.
>
> Regards,
> Felix
>
>
> Am 2023-04-03 um 13:59 schrieb Eric Huang:
>> dmabuf is allocated/mapped as GTT domain, when dma-unmapping dmabuf
>> changing placement to CPU will trigger memory eviction after calling
>> ttm_bo_validate, and the eviction will cause performance drop.
>> Keeping the correct domain will solve the issue.
>>
>> Signed-off-by: Eric Huang <jinhuieric.huang at amd.com>
>> ---
>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> index a3b09edfd1bf..17b708acb447 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> @@ -642,7 +642,7 @@ kfd_mem_dmaunmap_dmabuf(struct kfd_mem_attachment
>> *attachment)
>> struct ttm_operation_ctx ctx = {.interruptible = true};
>> struct amdgpu_bo *bo = attachment->bo_va->base.bo;
>> - amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_CPU);
>> + amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_GTT);
>> ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
>> }
More information about the amd-gfx
mailing list