[PATCH] drm/amdkfd: Page aligned memory reserve size

Philip Yang yangp at amd.com
Tue Jan 10 20:44:25 UTC 2023


On 2023-01-10 13:33, Felix Kuehling wrote:
> Am 2023-01-10 um 12:11 schrieb Philip Yang:
>> Use page aligned size to reserve memory usage because page aligned TTM
>> BO size is used to unreserve memory usage, otherwise no page aligned
>> size causes memory usage accounting unbalanced.
>>
>> Change vram_used definition type to int64_t to be able to trigger
>> WARN_ONCE(adev && adev->kfd.vram_used < 0, "..."), to help debug the
>> accouting issue with warning and backtrace.
>>
>> Signed-off-by: Philip Yang <Philip.Yang at amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h       | 2 +-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 9 ++++++---
>>   drivers/gpu/drm/amd/amdkfd/kfd_chardev.c         | 7 +++++--
>>   3 files changed, 12 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>> index fb41869e357a..333780491867 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>> @@ -97,7 +97,7 @@ struct amdgpu_amdkfd_fence {
>>     struct amdgpu_kfd_dev {
>>       struct kfd_dev *dev;
>> -    uint64_t vram_used;
>> +    int64_t vram_used;
>>       uint64_t vram_used_aligned;
>>       bool init_complete;
>>       struct work_struct reset_work;
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> index 2a118669d0e3..7efee672bc41 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> @@ -1598,6 +1598,7 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
>>       struct amdgpu_bo *bo;
>>       struct drm_gem_object *gobj = NULL;
>>       u32 domain, alloc_domain;
>> +    uint64_t aligned_size;
>>       u64 alloc_flags;
>>       int ret;
>>   @@ -1653,13 +1654,15 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
>>        * the memory.
>>        */
>>       if ((*mem)->aql_queue)
>> -        size = size >> 1;
>> +        size >>= 1;
>> +
>> +    aligned_size = PAGE_ALIGN(size);
>
> Why do you need a new variable for this? Can't you just update size to 
> be page-aligned here? Is the unaligned size still needed anywhere?
amdgpu_gem_object_create ->...-> amdgpu_bo_create needs the original 
size for domain GWS etc, as the size is used as number of pages, not bytes.
>
>
>>         (*mem)->alloc_flags = flags;
>>         amdgpu_sync_create(&(*mem)->sync);
>>   -    ret = amdgpu_amdkfd_reserve_mem_limit(adev, size, flags);
>> +    ret = amdgpu_amdkfd_reserve_mem_limit(adev, aligned_size, flags);
>>       if (ret) {
>>           pr_debug("Insufficient memory\n");
>>           goto err_reserve_limit;
>> @@ -1725,7 +1728,7 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
>>       /* Don't unreserve system mem limit twice */
>>       goto err_reserve_limit;
>>   err_bo_create:
>> -    amdgpu_amdkfd_unreserve_mem_limit(adev, size, flags);
>> +    amdgpu_amdkfd_unreserve_mem_limit(adev, aligned_size, flags);
>>   err_reserve_limit:
>>       mutex_destroy(&(*mem)->lock);
>>       if (gobj)
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
>> b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
>> index 6d291aa6386b..e11451100a20 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
>> @@ -1127,8 +1127,11 @@ static int 
>> kfd_ioctl_alloc_memory_of_gpu(struct file *filep,
>>       }
>>         /* Update the VRAM usage count */
>> -    if (flags & KFD_IOC_ALLOC_MEM_FLAGS_VRAM)
>> -        WRITE_ONCE(pdd->vram_usage, pdd->vram_usage + args->size);
>> +    if (flags & KFD_IOC_ALLOC_MEM_FLAGS_VRAM) {
>> +        if (flags & KFD_IOC_ALLOC_MEM_FLAGS_AQL_QUEUE_MEM)
>> +            args->size >>= 1;
>
> This will return the updated size to user mode. That's probably not 
> what you want. It may be harmless, but technically it breaks the ABI. 
> It would be better to use a local variable for the updated size.

Submit v3 patch to fix this.

Thanks,

Philip

>
> Regards,
>   Felix
>
>
>> +        WRITE_ONCE(pdd->vram_usage, pdd->vram_usage + 
>> PAGE_ALIGN(args->size));
>> +    }
>>         mutex_unlock(&p->mutex);


More information about the amd-gfx mailing list