<html><head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> </head> <body> <div class="moz-cite-prefix">On 2024-04-18 10:37, Christian König wrote: </div> <blockquote type="cite" cite="mid:dbc3534e-dc5c-42bb-8e83-66d047481f9a@gmail.com"> Am 18.04.24 um 15:57 schrieb Philip Yang: <blockquote type="cite">RDMA device with limited scatter-gather ability requires contiguous VRAM buffer allocation for RDMA peer direct support. Add a new KFD alloc memory flag and store as bo alloc flag AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS. When pin this bo to export for RDMA peerdirect access, this will set TTM_PL_FLAG_CONTIFUOUS flag, and ask VRAM buddy allocator to get contiguous VRAM. Remove the 2GB max memory block size limit for contiguous allocation. Signed-off-by: Philip Yang <a class="moz-txt-link-rfc2396E" href="mailto:Philip.Yang@amd.com"><Philip.Yang@amd.com></a> --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 4 ++++ drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 9 +++++++-- include/uapi/linux/kfd_ioctl.h | 1 + 3 files changed, 12 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index 0ae9fd844623..ef9154043757 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -1712,6 +1712,10 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu( alloc_flags = AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE; alloc_flags |= (flags & KFD_IOC_ALLOC_MEM_FLAGS_PUBLIC) ? AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED : 0; + + /* For contiguous VRAM allocation */ + if (flags & KFD_IOC_ALLOC_MEM_FLAGS_CONTIGUOUS_BEST_EFFORT) + alloc_flags |= AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS; } xcp_id = fpriv->xcp_id == AMDGPU_XCP_NO_PARTITION ? 0 : fpriv->xcp_id; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c index 4be8b091099a..2f2ae7177771 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c @@ -532,8 +532,13 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager *man, BUG_ON(min_block_size < mm->chunk_size); - /* Limit maximum size to 2GiB due to SG table limitations */ - size = min(remaining_size, 2ULL << 30); + if (place->flags & TTM_PL_FLAG_CONTIGUOUS) + size = remaining_size; + else + /* Limit maximum size to 2GiB due to SG table limitations + * for no contiguous allocation. + */ + size = min(remaining_size, 2ULL << 30); </blockquote> Oh, I totally missed this in the first review. That won't work like that the sg table limit is still there even if the BO is contiguous. We could only fix up the VRAM P2P support to use multiple segments in the sg table. </blockquote> yes, you are right, I didn't test with buffer size > 4GB, struct scatterlist->offset, length is unsigned int, this limits each sg_table entry size < 4GB. I will do more testing, we should still get >4GB contiguous VRAM, will add another patch to fix it inside amdgpu_vram_mgr_alloc_sgt, to split it into multiple sg_table entries, and RDMA peerdirect app should be able to handle this case based on sg_table->nents. Regards, Philip <blockquote type="cite" cite="mid:dbc3534e-dc5c-42bb-8e83-66d047481f9a@gmail.com"> Regards, Christian. <blockquote type="cite"> if ((size >= (u64)pages_per_block << PAGE_SHIFT) && !(size & (((u64)pages_per_block << PAGE_SHIFT) - 1))) diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h index 2040a470ddb4..c1394c162d4e 100644 --- a/include/uapi/linux/kfd_ioctl.h +++ b/include/uapi/linux/kfd_ioctl.h @@ -407,6 +407,7 @@ struct kfd_ioctl_acquire_vm_args { #define KFD_IOC_ALLOC_MEM_FLAGS_COHERENT (1 << 26) #define KFD_IOC_ALLOC_MEM_FLAGS_UNCACHED (1 << 25) #define KFD_IOC_ALLOC_MEM_FLAGS_EXT_COHERENT (1 << 24) +#define KFD_IOC_ALLOC_MEM_FLAGS_CONTIGUOUS_BEST_EFFORT (1 << 23) /* Allocate memory for later SVM (shared virtual memory) mapping. * </blockquote> </blockquote> </body> </html>