[PATCH] drm/amdkfd: Page aligned memory reserve size
Philip Yang
yangp at amd.com
Tue Jan 10 22:32:12 UTC 2023
On 2023-01-10 15:49, Felix Kuehling wrote:
> Am 2023-01-10 um 15:44 schrieb Philip Yang:
>>
>> On 2023-01-10 13:33, Felix Kuehling wrote:
>>> Am 2023-01-10 um 12:11 schrieb Philip Yang:
>>>> Use page aligned size to reserve memory usage because page aligned TTM
>>>> BO size is used to unreserve memory usage, otherwise no page aligned
>>>> size causes memory usage accounting unbalanced.
>>>>
>>>> Change vram_used definition type to int64_t to be able to trigger
>>>> WARN_ONCE(adev && adev->kfd.vram_used < 0, "..."), to help debug the
>>>> accouting issue with warning and backtrace.
>>>>
>>>> Signed-off-by: Philip Yang <Philip.Yang at amd.com>
>>>> ---
>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 2 +-
>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 9 ++++++---
>>>> drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 7 +++++--
>>>> 3 files changed, 12 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>>>> index fb41869e357a..333780491867 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>>>> @@ -97,7 +97,7 @@ struct amdgpu_amdkfd_fence {
>>>> struct amdgpu_kfd_dev {
>>>> struct kfd_dev *dev;
>>>> - uint64_t vram_used;
>>>> + int64_t vram_used;
>>>> uint64_t vram_used_aligned;
>>>> bool init_complete;
>>>> struct work_struct reset_work;
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>>> index 2a118669d0e3..7efee672bc41 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>>> @@ -1598,6 +1598,7 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
>>>> struct amdgpu_bo *bo;
>>>> struct drm_gem_object *gobj = NULL;
>>>> u32 domain, alloc_domain;
>>>> + uint64_t aligned_size;
>>>> u64 alloc_flags;
>>>> int ret;
>>>> @@ -1653,13 +1654,15 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
>>>> * the memory.
>>>> */
>>>> if ((*mem)->aql_queue)
>>>> - size = size >> 1;
>>>> + size >>= 1;
>>>> +
>>>> + aligned_size = PAGE_ALIGN(size);
>>>
>>> Why do you need a new variable for this? Can't you just update size
>>> to be page-aligned here? Is the unaligned size still needed anywhere?
>> amdgpu_gem_object_create ->...-> amdgpu_bo_create needs the original
>> size for domain GWS etc, as the size is used as number of pages, not
>> bytes.
>
> I don't think GWS is ever allocated through this code path. This type
> of memory is not exposed in the KFD ioctl API. KFD allocates a GWS BO
> using amdgpu_amdkfd_alloc_gws.
yes, as size is used in pr_debug, it is better to show the original size
in debug log to match application, also notice the pr_debug size should
base on aql_queue flag, will send v4 patch with the fix.
Regards,
Philip
>
> Regards,
> Felix
>
>
>>>
>>>
>>>> (*mem)->alloc_flags = flags;
>>>> amdgpu_sync_create(&(*mem)->sync);
>>>> - ret = amdgpu_amdkfd_reserve_mem_limit(adev, size, flags);
>>>> + ret = amdgpu_amdkfd_reserve_mem_limit(adev, aligned_size, flags);
>>>> if (ret) {
>>>> pr_debug("Insufficient memory\n");
>>>> goto err_reserve_limit;
>>>> @@ -1725,7 +1728,7 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
>>>> /* Don't unreserve system mem limit twice */
>>>> goto err_reserve_limit;
>>>> err_bo_create:
>>>> - amdgpu_amdkfd_unreserve_mem_limit(adev, size, flags);
>>>> + amdgpu_amdkfd_unreserve_mem_limit(adev, aligned_size, flags);
>>>> err_reserve_limit:
>>>> mutex_destroy(&(*mem)->lock);
>>>> if (gobj)
>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
>>>> b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
>>>> index 6d291aa6386b..e11451100a20 100644
>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
>>>> @@ -1127,8 +1127,11 @@ static int
>>>> kfd_ioctl_alloc_memory_of_gpu(struct file *filep,
>>>> }
>>>> /* Update the VRAM usage count */
>>>> - if (flags & KFD_IOC_ALLOC_MEM_FLAGS_VRAM)
>>>> - WRITE_ONCE(pdd->vram_usage, pdd->vram_usage + args->size);
>>>> + if (flags & KFD_IOC_ALLOC_MEM_FLAGS_VRAM) {
>>>> + if (flags & KFD_IOC_ALLOC_MEM_FLAGS_AQL_QUEUE_MEM)
>>>> + args->size >>= 1;
>>>
>>> This will return the updated size to user mode. That's probably not
>>> what you want. It may be harmless, but technically it breaks the
>>> ABI. It would be better to use a local variable for the updated size.
>>
>> Submit v3 patch to fix this.
>>
>> Thanks,
>>
>> Philip
>>
>>>
>>> Regards,
>>> Felix
>>>
>>>
>>>> + WRITE_ONCE(pdd->vram_usage, pdd->vram_usage +
>>>> PAGE_ALIGN(args->size));
>>>> + }
>>>> mutex_unlock(&p->mutex);
More information about the amd-gfx
mailing list