[PATCH v2] drm/amdgpu: Modify the contiguous flags behaviour
Paneer Selvam, Arunpravin
arunpravin.paneerselvam at amd.com
Wed Apr 17 14:32:59 UTC 2024
Hi Christian,
On 4/17/2024 6:57 PM, Paneer Selvam, Arunpravin wrote:
> Hi Christian,
>
> On 4/17/2024 12:19 PM, Christian König wrote:
>> Am 17.04.24 um 08:21 schrieb Arunpravin Paneer Selvam:
>>> Now we have two flags for contiguous VRAM buffer allocation.
>>> If the application request for AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS,
>>> it would set the ttm place TTM_PL_FLAG_CONTIGUOUS flag in the
>>> buffer's placement function.
>>>
>>> This patch will change the default behaviour of the two flags.
>>>
>>> When we set AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS
>>> - This means contiguous is not mandatory.
>>> - we will try to allocate the contiguous buffer. Say if the
>>> allocation fails, we fallback to allocate the individual pages.
>>>
>>> When we setTTM_PL_FLAG_CONTIGUOUS
>>> - This means contiguous allocation is mandatory.
>>> - we are setting this in amdgpu_bo_pin_restricted() before bo
>>> validation
>>> and check this flag in the vram manager file.
>>> - if this is set, we should allocate the buffer pages contiguously.
>>> the allocation fails, we return -ENOSPC.
>>>
>>> v2:
>>> - keep the mem_flags and bo->flags check as is(Christian)
>>> - place the TTM_PL_FLAG_CONTIGUOUS flag setting into the
>>> amdgpu_bo_pin_restricted function placement range iteration
>>> loop(Christian)
>>> - rename find_pages with amdgpu_vram_mgr_calculate_pages_per_block
>>> (Christian)
>>> - Keep the kernel BO allocation as is(Christain)
>>> - If BO pin vram allocation failed, we need to return -ENOSPC as
>>> RDMA cannot work with scattered VRAM pages(Philip)
>>>
>>> Signed-off-by: Arunpravin Paneer Selvam
>>> <Arunpravin.PaneerSelvam at amd.com>
>>> Suggested-by: Christian König <christian.koenig at amd.com>
>>> ---
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 8 ++-
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 57
>>> +++++++++++++++-----
>>> 2 files changed, 50 insertions(+), 15 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>> index 8bc79924d171..caaef7b1df49 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>> @@ -153,8 +153,10 @@ void amdgpu_bo_placement_from_domain(struct
>>> amdgpu_bo *abo, u32 domain)
>>> else
>>> places[c].flags |= TTM_PL_FLAG_TOPDOWN;
>>> - if (flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS)
>>> + if (abo->tbo.type == ttm_bo_type_kernel &&
>>> + flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS)
>>> places[c].flags |= TTM_PL_FLAG_CONTIGUOUS;
>>> +
>>> c++;
>>> }
>>> @@ -966,6 +968,10 @@ int amdgpu_bo_pin_restricted(struct amdgpu_bo
>>> *bo, u32 domain,
>>> if (!bo->placements[i].lpfn ||
>>> (lpfn && lpfn < bo->placements[i].lpfn))
>>> bo->placements[i].lpfn = lpfn;
>>> +
>>> + if (bo->flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS &&
>>> + bo->placements[i].mem_type == TTM_PL_VRAM)
>>> + bo->placements[i].flags |= TTM_PL_FLAG_CONTIGUOUS;
>>> }
>>> r = ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
>>
>> Nice work, up till here that looks exactly right as far as I can see.
>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
>>> index 8db880244324..4be8b091099a 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
>>> @@ -88,6 +88,29 @@ static inline u64
>>> amdgpu_vram_mgr_blocks_size(struct list_head *head)
>>> return size;
>>> }
>>> +static inline unsigned long
>>> +amdgpu_vram_mgr_calculate_pages_per_block(struct ttm_buffer_object
>>> *tbo,
>>> + const struct ttm_place *place,
>>> + unsigned long bo_flags)
>>> +{
>>> + unsigned long pages_per_block;
>>> +
>>> + if (bo_flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS) {
>>> + pages_per_block = ~0ul;
>>
>> If I understand it correctly this here enforces the allocation of a
>> contiguous buffer in the way that it says we should have only one
>> giant page for the whole BO.
> yes, for contiguous we don't have the 2MB limit, hence we are setting
> to maximum to avoid restricting the min_block_size to 2MB limitation.
>>
>>> + } else {
>>> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
>>> + pages_per_block = HPAGE_PMD_NR;
>>> +#else
>>> + /* default to 2MB */
>>> + pages_per_block = 2UL << (20UL - PAGE_SHIFT);
>>> +#endif
>>> + pages_per_block = max_t(uint32_t, pages_per_block,
>>> + tbo->page_alignment);
>>> + }
>>> +
>>> + return pages_per_block;
>>> +}
>>> +
>>> /**
>>> * DOC: mem_info_vram_total
>>> *
>>> @@ -451,8 +474,10 @@ static int amdgpu_vram_mgr_new(struct
>>> ttm_resource_manager *man,
>>> struct amdgpu_vram_mgr *mgr = to_vram_mgr(man);
>>> struct amdgpu_device *adev = to_amdgpu_device(mgr);
>>> u64 vis_usage = 0, max_bytes, min_block_size;
>>> + struct amdgpu_bo *bo = ttm_to_amdgpu_bo(tbo);
>>> struct amdgpu_vram_mgr_resource *vres;
>>> u64 size, remaining_size, lpfn, fpfn;
>>> + unsigned long bo_flags = bo->flags;
>>> struct drm_buddy *mm = &mgr->mm;
>>> struct drm_buddy_block *block;
>>> unsigned long pages_per_block;
>>> @@ -468,18 +493,9 @@ static int amdgpu_vram_mgr_new(struct
>>> ttm_resource_manager *man,
>>> if (tbo->type != ttm_bo_type_kernel)
>>> max_bytes -= AMDGPU_VM_RESERVED_VRAM;
>>> - if (place->flags & TTM_PL_FLAG_CONTIGUOUS) {
>>> - pages_per_block = ~0ul;
>>> - } else {
>>> -#ifdef CONFIG_TRANSPARENT_HUGEPAGE
>>> - pages_per_block = HPAGE_PMD_NR;
>>> -#else
>>> - /* default to 2MB */
>>> - pages_per_block = 2UL << (20UL - PAGE_SHIFT);
>>> -#endif
>>> - pages_per_block = max_t(uint32_t, pages_per_block,
>>> - tbo->page_alignment);
>>> - }
>>> + pages_per_block =
>>> + amdgpu_vram_mgr_calculate_pages_per_block(tbo, place,
>>> + bo_flags);
>>> vres = kzalloc(sizeof(*vres), GFP_KERNEL);
>>> if (!vres)
>>> @@ -498,7 +514,7 @@ static int amdgpu_vram_mgr_new(struct
>>> ttm_resource_manager *man,
>>> if (place->flags & TTM_PL_FLAG_TOPDOWN)
>>> vres->flags |= DRM_BUDDY_TOPDOWN_ALLOCATION;
>>> - if (place->flags & TTM_PL_FLAG_CONTIGUOUS)
>>> + if (bo_flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS)
>>> vres->flags |= DRM_BUDDY_CONTIGUOUS_ALLOCATION;
>>
>> And this here just optimizes it in the way that it says try harder to
>> make it look contiguous.
> Here I removed the TTM_PL_FLAG_CONTIGUOUS. AFAIU, in both cases (BO
> pinning and normal contiguous request)
> this flag AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS is always set.
>>
>> Question is if that also works with pages_per_block of 2MiB or does
>> that only kick in when pages_per_block is maximal?
> AFAIU, if this flag AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS is set, we are
> setting the pages_per_block as maximal, thus
> we dont limit the BO. when we set the pages_per_block as maximal, the
> min_block_size would be the tbo->page_alignment,
> and this tbo->page_alignment would be the same value as the required
> size. Required size can be less than 2MB or more than 2MB.
> Below we have check size >= pages_per_block, when the pages_per_block
> is maximal we don't limit the min_block_size.
a small correction, when we set the pages_per_block as maximal, we don't
set the min_block_size, the buddy allocator will set the min_block_size
as roundup_pow_of_two(size).
Thanks,
Arun.
>>
>>> if (fpfn || lpfn != mgr->mm.size)
>>> @@ -529,8 +545,21 @@ static int amdgpu_vram_mgr_new(struct
>>> ttm_resource_manager *man,
>>> min_block_size,
>>> &vres->blocks,
>>> vres->flags);
>>> - if (unlikely(r))
>>> + if (unlikely(r)) {
>>> + if (bo_flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS &&
>>> + !(place->flags & TTM_PL_FLAG_CONTIGUOUS)) {
>>> + /* Fallback to non contiguous allocation */
>>> + vres->flags &= ~DRM_BUDDY_CONTIGUOUS_ALLOCATION;
>>> + bo_flags &= ~AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS;
>>
>> Well I would say that this is a rather hacky implementation and
>> should be avoided if possible.
> sure, I will try to rewrite the code.
>>
>>> +
>>> + pages_per_block =
>>> + amdgpu_vram_mgr_calculate_pages_per_block(tbo,
>>> + place,
>>> + bo_flags);
>>
>> Rather move the AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS handling out of
>> amdgpu_vram_mgr_calculate_pages_per_block().
> sure.
>
> Thanks,
> Arun.
>>
>> Regards,
>> Christian.
>>
>>> + continue;
>>> + }
>>> goto error_free_blocks;
>>> + }
>>> if (size > remaining_size)
>>> remaining_size = 0;
>>
>
More information about the amd-gfx
mailing list