[PATCH 1/1] drm/amdgpu: Optimize KFD page table reservation
Felix Kuehling
felix.kuehling at amd.com
Mon Nov 25 19:38:28 UTC 2019
Hi Xinhui,
I sent this patch in July and then forgot about it. Please review it.
You could use this as the basis for your heap-size improvement. Just
move the ESTIMATE_PT_SIZE macro into an appropriate header file so you
can use it in the KFD code.
Regards,
Felix
On 2019-11-25 2:35 p.m., Felix Kuehling wrote:
> Be less pessimistic about estimated page table use for KFD. Most
> allocations use 2MB pages and therefore need less VRAM for page
> tables. This allows more VRAM to be used for applications especially
> on large systems with many GPUs and hundreds of GB of system memory.
>
> Example: 8 GPUs with 32GB VRAM each + 256GB system memory = 512GB
> Old page table reservation per GPU: 1GB
> New page table reservation per GPU: 32MB
>
> Signed-off-by: Felix Kuehling <Felix.Kuehling at amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 15 ++++++++++++++-
> 1 file changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> index a1ed8a8e3752..e43a95514b41 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> @@ -105,11 +105,24 @@ void amdgpu_amdkfd_gpuvm_init_mem_limits(void)
> (kfd_mem_limit.max_ttm_mem_limit >> 20));
> }
>
> +/* Estimate page table size needed to represent a given memory size
> + *
> + * With 4KB pages, we need one 8 byte PTE for each 4KB of memory
> + * (factor 512, >> 9). With 2MB pages, we need one 8 byte PTE for 2MB
> + * of memory (factor 256K, >> 18). ROCm user mode tries to optimize
> + * for 2MB pages for TLB efficiency. However, small allocations and
> + * fragmented system memory still need some 4KB pages. We choose a
> + * compromise that should work in most cases without reserving too
> + * much memory for page tables unnecessarily (factor 16K, >> 14).
> + */
> +#define ESTIMATE_PT_SIZE(mem_size) ((mem_size) >> 14)
> +
> static int amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device *adev,
> uint64_t size, u32 domain, bool sg)
> {
> + uint64_t reserved_for_pt =
> + ESTIMATE_PT_SIZE(amdgpu_amdkfd_total_mem_size);
> size_t acc_size, system_mem_needed, ttm_mem_needed, vram_needed;
> - uint64_t reserved_for_pt = amdgpu_amdkfd_total_mem_size >> 9;
> int ret = 0;
>
> acc_size = ttm_bo_dma_acc_size(&adev->mman.bdev, size,
More information about the amd-gfx
mailing list