[PATCH] drm/amdkfd: Page aligned memory reserve size

Felix Kuehling felix.kuehling at amd.com
Tue Jan 10 18:33:05 UTC 2023


Am 2023-01-10 um 12:11 schrieb Philip Yang:
> Use page aligned size to reserve memory usage because page aligned TTM
> BO size is used to unreserve memory usage, otherwise no page aligned
> size causes memory usage accounting unbalanced.
>
> Change vram_used definition type to int64_t to be able to trigger
> WARN_ONCE(adev && adev->kfd.vram_used < 0, "..."), to help debug the
> accouting issue with warning and backtrace.
>
> Signed-off-by: Philip Yang <Philip.Yang at amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h       | 2 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 9 ++++++---
>   drivers/gpu/drm/amd/amdkfd/kfd_chardev.c         | 7 +++++--
>   3 files changed, 12 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> index fb41869e357a..333780491867 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> @@ -97,7 +97,7 @@ struct amdgpu_amdkfd_fence {
>   
>   struct amdgpu_kfd_dev {
>   	struct kfd_dev *dev;
> -	uint64_t vram_used;
> +	int64_t vram_used;
>   	uint64_t vram_used_aligned;
>   	bool init_complete;
>   	struct work_struct reset_work;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> index 2a118669d0e3..7efee672bc41 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> @@ -1598,6 +1598,7 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
>   	struct amdgpu_bo *bo;
>   	struct drm_gem_object *gobj = NULL;
>   	u32 domain, alloc_domain;
> +	uint64_t aligned_size;
>   	u64 alloc_flags;
>   	int ret;
>   
> @@ -1653,13 +1654,15 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
>   	 * the memory.
>   	 */
>   	if ((*mem)->aql_queue)
> -		size = size >> 1;
> +		size >>= 1;
> +
> +	aligned_size = PAGE_ALIGN(size);

Why do you need a new variable for this? Can't you just update size to 
be page-aligned here? Is the unaligned size still needed anywhere?


>   
>   	(*mem)->alloc_flags = flags;
>   
>   	amdgpu_sync_create(&(*mem)->sync);
>   
> -	ret = amdgpu_amdkfd_reserve_mem_limit(adev, size, flags);
> +	ret = amdgpu_amdkfd_reserve_mem_limit(adev, aligned_size, flags);
>   	if (ret) {
>   		pr_debug("Insufficient memory\n");
>   		goto err_reserve_limit;
> @@ -1725,7 +1728,7 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
>   	/* Don't unreserve system mem limit twice */
>   	goto err_reserve_limit;
>   err_bo_create:
> -	amdgpu_amdkfd_unreserve_mem_limit(adev, size, flags);
> +	amdgpu_amdkfd_unreserve_mem_limit(adev, aligned_size, flags);
>   err_reserve_limit:
>   	mutex_destroy(&(*mem)->lock);
>   	if (gobj)
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> index 6d291aa6386b..e11451100a20 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> @@ -1127,8 +1127,11 @@ static int kfd_ioctl_alloc_memory_of_gpu(struct file *filep,
>   	}
>   
>   	/* Update the VRAM usage count */
> -	if (flags & KFD_IOC_ALLOC_MEM_FLAGS_VRAM)
> -		WRITE_ONCE(pdd->vram_usage, pdd->vram_usage + args->size);
> +	if (flags & KFD_IOC_ALLOC_MEM_FLAGS_VRAM) {
> +		if (flags & KFD_IOC_ALLOC_MEM_FLAGS_AQL_QUEUE_MEM)
> +			args->size >>= 1;

This will return the updated size to user mode. That's probably not what 
you want. It may be harmless, but technically it breaks the ABI. It 
would be better to use a local variable for the updated size.

Regards,
   Felix


> +		WRITE_ONCE(pdd->vram_usage, pdd->vram_usage + PAGE_ALIGN(args->size));
> +	}
>   
>   	mutex_unlock(&p->mutex);
>   


More information about the amd-gfx mailing list