[PATCH 2/2] drm/amdkfd: fix KFDSVMRangeTest.PartialUnmapSysMemTest fails
Felix Kuehling
felix.kuehling at amd.com
Tue Aug 17 17:44:22 UTC 2021
Am 2021-08-17 um 5:17 a.m. schrieb Yifan Zhang:
> [ RUN ] KFDSVMRangeTest.PartialUnmapSysMemTest
> /home/yifan/brahma/libhsakmt/tests/kfdtest/src/KFDTestUtil.cpp:245: Failure
> Value of: (hsaKmtAllocMemory(m_Node, m_Size, m_Flags, &m_pBuf))
> Actual: 1
> Expected: HSAKMT_STATUS_SUCCESS
> Which is: 0
> /home/yifan/brahma/libhsakmt/tests/kfdtest/src/KFDTestUtil.cpp:248: Failure
> Value of: (hsaKmtMapMemoryToGPUNodes(m_pBuf, m_Size, __null, mapFlags, 1, &m_Node))
> Actual: 1
> Expected: HSAKMT_STATUS_SUCCESS
> Which is: 0
> /home/yifan/brahma/libhsakmt/tests/kfdtest/src/KFDTestUtil.cpp:306: Failure
> Expected: ((void *)__null) != (ptr), actual: NULL vs NULL
> Segmentation fault (core dumped)
> [ ] Profile: Full Test
> [ ] HW capabilities: 0x9
>
> kernel log:
>
> [ 102.029150] ret_from_fork+0x22/0x30
> [ 102.029158] ---[ end trace 15c34e782714f9a3 ]---
> [ 3613.603598] amdgpu: Address: 0x7f7149ccc000 already allocated by SVM
> [ 3613.610620] show_signal_msg: 27 callbacks suppressed
>
> These is race with deferred actions from previous memory map
> changes (e.g. munmap).Flush pending deffered work to avoid such case.
>
> Signed-off-by: Yifan Zhang <yifan1.zhang at amd.com>
> ---
> drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> index 3177c4a0e753..e1c4abb98b35 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> @@ -1261,6 +1261,10 @@ static int kfd_ioctl_alloc_memory_of_gpu(struct file *filep,
> return -EINVAL;
>
> #if IS_ENABLED(CONFIG_HSA_AMD_SVM)
> + /* Flush pending deferred work to avoid racing with deferred actions
> + * from previous memory map changes (e.g. munmap).
> + */
> + svm_range_list_lock_and_flush_work(svms, current->mm);
> mutex_lock(&svms->lock);
> if (interval_tree_iter_first(&svms->objects,
> args->va_addr >> PAGE_SHIFT,
> @@ -1271,6 +1275,7 @@ static int kfd_ioctl_alloc_memory_of_gpu(struct file *filep,
> return -EADDRINUSE;
> }
> mutex_unlock(&svms->lock);
> + mmap_write_unlock(current->mm);
I think you can probably drop the mmap_write_unlock just after the
mutex_lock above. There is no need to hold that lock any longer. And I
believe the locking doesn't need to be strictly nested either.
With that fixed, the series is
Reviewed-by: Felix Kuehling <Felix.Kuehling at amd.com>
> #endif
> dev = kfd_device_by_id(args->gpu_id);
> if (!dev)
More information about the amd-gfx
mailing list