[PATCH 2/2] drm/amdkfd: fix KFDSVMRangeTest.PartialUnmapSysMemTest fails

Felix Kuehling felix.kuehling at amd.com
Tue Aug 17 17:44:22 UTC 2021


Am 2021-08-17 um 5:17 a.m. schrieb Yifan Zhang:
> [ RUN      ] KFDSVMRangeTest.PartialUnmapSysMemTest
> /home/yifan/brahma/libhsakmt/tests/kfdtest/src/KFDTestUtil.cpp:245: Failure
> Value of: (hsaKmtAllocMemory(m_Node, m_Size, m_Flags, &m_pBuf))
>   Actual: 1
> Expected: HSAKMT_STATUS_SUCCESS
> Which is: 0
> /home/yifan/brahma/libhsakmt/tests/kfdtest/src/KFDTestUtil.cpp:248: Failure
> Value of: (hsaKmtMapMemoryToGPUNodes(m_pBuf, m_Size, __null, mapFlags, 1, &m_Node))
>   Actual: 1
> Expected: HSAKMT_STATUS_SUCCESS
> Which is: 0
> /home/yifan/brahma/libhsakmt/tests/kfdtest/src/KFDTestUtil.cpp:306: Failure
> Expected: ((void *)__null) != (ptr), actual: NULL vs NULL
> Segmentation fault (core dumped)
> [          ] Profile: Full Test
> [          ] HW capabilities: 0x9
>
> kernel log:
>
> [  102.029150]  ret_from_fork+0x22/0x30
> [  102.029158] ---[ end trace 15c34e782714f9a3 ]---
> [ 3613.603598] amdgpu: Address: 0x7f7149ccc000 already allocated by SVM
> [ 3613.610620] show_signal_msg: 27 callbacks suppressed
>
> These is race with deferred actions from previous memory map
> changes (e.g. munmap).Flush pending deffered work to avoid such case.
>
> Signed-off-by: Yifan Zhang <yifan1.zhang at amd.com>
> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> index 3177c4a0e753..e1c4abb98b35 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> @@ -1261,6 +1261,10 @@ static int kfd_ioctl_alloc_memory_of_gpu(struct file *filep,
>  		return -EINVAL;
>  
>  #if IS_ENABLED(CONFIG_HSA_AMD_SVM)
> +	/* Flush pending deferred work to avoid racing with deferred actions
> +	 * from previous memory map changes (e.g. munmap).
> +	 */
> +	svm_range_list_lock_and_flush_work(svms, current->mm);
>  	mutex_lock(&svms->lock);
>  	if (interval_tree_iter_first(&svms->objects,
>  				     args->va_addr >> PAGE_SHIFT,
> @@ -1271,6 +1275,7 @@ static int kfd_ioctl_alloc_memory_of_gpu(struct file *filep,
>  		return -EADDRINUSE;
>  	}
>  	mutex_unlock(&svms->lock);
> +	mmap_write_unlock(current->mm);

I think you can probably drop the mmap_write_unlock just after the
mutex_lock above. There is no need to hold that lock any longer. And I
believe the locking doesn't need to be strictly nested either.

With that fixed, the series is

Reviewed-by: Felix Kuehling <Felix.Kuehling at amd.com>


>  #endif
>  	dev = kfd_device_by_id(args->gpu_id);
>  	if (!dev)


More information about the amd-gfx mailing list