<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<p><br>
</p>
<div class="moz-cite-prefix">On 2024-04-17 10:32, Paneer Selvam,
Arunpravin wrote:<br>
</div>
<blockquote type="cite" cite="mid:ea5d73f1-0bf9-41a1-95b7-6479ea4c6ad8@amd.com">Hi
Christian,
<br>
<br>
On 4/17/2024 6:57 PM, Paneer Selvam, Arunpravin wrote:
<br>
<blockquote type="cite">Hi Christian,
<br>
<br>
On 4/17/2024 12:19 PM, Christian König wrote:
<br>
<blockquote type="cite">Am 17.04.24 um 08:21 schrieb Arunpravin
Paneer Selvam:
<br>
<blockquote type="cite">Now we have two flags for contiguous
VRAM buffer allocation.
<br>
If the application request for
AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS,
<br>
it would set the ttm place TTM_PL_FLAG_CONTIGUOUS flag in
the
<br>
buffer's placement function.
<br>
<br>
This patch will change the default behaviour of the two
flags.
<br>
<br>
When we set AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS
<br>
- This means contiguous is not mandatory.
<br>
- we will try to allocate the contiguous buffer. Say if the
<br>
allocation fails, we fallback to allocate the individual
pages.
<br>
<br>
When we setTTM_PL_FLAG_CONTIGUOUS
<br>
- This means contiguous allocation is mandatory.
<br>
- we are setting this in amdgpu_bo_pin_restricted() before
bo validation
<br>
and check this flag in the vram manager file.
<br>
- if this is set, we should allocate the buffer pages
contiguously.
<br>
the allocation fails, we return -ENOSPC.
<br>
<br>
v2:
<br>
- keep the mem_flags and bo->flags check as
is(Christian)
<br>
- place the TTM_PL_FLAG_CONTIGUOUS flag setting into the
<br>
amdgpu_bo_pin_restricted function placement range
iteration
<br>
loop(Christian)
<br>
- rename find_pages with
amdgpu_vram_mgr_calculate_pages_per_block
<br>
(Christian)
<br>
- Keep the kernel BO allocation as is(Christain)
<br>
- If BO pin vram allocation failed, we need to return
-ENOSPC as
<br>
RDMA cannot work with scattered VRAM pages(Philip)
<br>
<br>
Signed-off-by: Arunpravin Paneer Selvam
<a class="moz-txt-link-rfc2396E" href="mailto:Arunpravin.PaneerSelvam@amd.com"><Arunpravin.PaneerSelvam@amd.com></a>
<br>
Suggested-by: Christian König
<a class="moz-txt-link-rfc2396E" href="mailto:christian.koenig@amd.com"><christian.koenig@amd.com></a>
<br>
---
<br>
drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 8 ++-
<br>
drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 57
+++++++++++++++-----
<br>
2 files changed, 50 insertions(+), 15 deletions(-)
<br>
<br>
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
<br>
index 8bc79924d171..caaef7b1df49 100644
<br>
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
<br>
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
<br>
@@ -153,8 +153,10 @@ void
amdgpu_bo_placement_from_domain(struct amdgpu_bo *abo, u32
domain)
<br>
else
<br>
places[c].flags |= TTM_PL_FLAG_TOPDOWN;
<br>
- if (flags &
AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS)
<br>
+ if (abo->tbo.type == ttm_bo_type_kernel
&&
<br>
+ flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS)
<br>
places[c].flags |= TTM_PL_FLAG_CONTIGUOUS;
<br>
+
<br>
c++;
<br>
}
<br>
@@ -966,6 +968,10 @@ int amdgpu_bo_pin_restricted(struct
amdgpu_bo *bo, u32 domain,
<br>
if (!bo->placements[i].lpfn ||
<br>
(lpfn && lpfn <
bo->placements[i].lpfn))
<br>
bo->placements[i].lpfn = lpfn;
<br>
+
<br>
+ if (bo->flags &
AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS &&
<br>
+ bo->placements[i].mem_type == TTM_PL_VRAM)
<br>
+ bo->placements[i].flags |=
TTM_PL_FLAG_CONTIGUOUS;
<br>
}
<br>
r = ttm_bo_validate(&bo->tbo,
&bo->placement, &ctx);
<br>
</blockquote>
<br>
Nice work, up till here that looks exactly right as far as I
can see.
<br>
<br>
<blockquote type="cite">diff --git
a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
<br>
index 8db880244324..4be8b091099a 100644
<br>
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
<br>
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
<br>
@@ -88,6 +88,29 @@ static inline u64
amdgpu_vram_mgr_blocks_size(struct list_head *head)
<br>
return size;
<br>
}
<br>
+static inline unsigned long
<br>
+amdgpu_vram_mgr_calculate_pages_per_block(struct
ttm_buffer_object *tbo,
<br>
+ const struct ttm_place *place,
<br>
+ unsigned long bo_flags)
<br>
+{
<br>
+ unsigned long pages_per_block;
<br>
+
<br>
+ if (bo_flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS) {
<br>
+ pages_per_block = ~0ul;
<br>
</blockquote>
<br>
If I understand it correctly this here enforces the allocation
of a contiguous buffer in the way that it says we should have
only one giant page for the whole BO.
<br>
</blockquote>
yes, for contiguous we don't have the 2MB limit, hence we are
setting to maximum to avoid restricting the min_block_size to
2MB limitation.
<br>
<blockquote type="cite">
<br>
<blockquote type="cite">+ } else {
<br>
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
<br>
+ pages_per_block = HPAGE_PMD_NR;
<br>
+#else
<br>
+ /* default to 2MB */
<br>
+ pages_per_block = 2UL << (20UL - PAGE_SHIFT);
<br>
+#endif
<br>
+ pages_per_block = max_t(uint32_t, pages_per_block,
<br>
+ tbo->page_alignment);
<br>
+ }
<br>
+
<br>
+ return pages_per_block;
<br>
+}
<br>
+
<br>
/**
<br>
* DOC: mem_info_vram_total
<br>
*
<br>
@@ -451,8 +474,10 @@ static int amdgpu_vram_mgr_new(struct
ttm_resource_manager *man,
<br>
struct amdgpu_vram_mgr *mgr = to_vram_mgr(man);
<br>
struct amdgpu_device *adev = to_amdgpu_device(mgr);
<br>
u64 vis_usage = 0, max_bytes, min_block_size;
<br>
+ struct amdgpu_bo *bo = ttm_to_amdgpu_bo(tbo);
<br>
struct amdgpu_vram_mgr_resource *vres;
<br>
u64 size, remaining_size, lpfn, fpfn;
<br>
+ unsigned long bo_flags = bo->flags;
<br>
struct drm_buddy *mm = &mgr->mm;
<br>
struct drm_buddy_block *block;
<br>
unsigned long pages_per_block;
<br>
@@ -468,18 +493,9 @@ static int amdgpu_vram_mgr_new(struct
ttm_resource_manager *man,
<br>
if (tbo->type != ttm_bo_type_kernel)
<br>
max_bytes -= AMDGPU_VM_RESERVED_VRAM;
<br>
- if (place->flags & TTM_PL_FLAG_CONTIGUOUS) {
<br>
- pages_per_block = ~0ul;
<br>
- } else {
<br>
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
<br>
- pages_per_block = HPAGE_PMD_NR;
<br>
-#else
<br>
- /* default to 2MB */
<br>
- pages_per_block = 2UL << (20UL - PAGE_SHIFT);
<br>
-#endif
<br>
- pages_per_block = max_t(uint32_t, pages_per_block,
<br>
- tbo->page_alignment);
<br>
- }
<br>
+ pages_per_block =
<br>
+ amdgpu_vram_mgr_calculate_pages_per_block(tbo,
place,
<br>
+ bo_flags);
<br>
vres = kzalloc(sizeof(*vres), GFP_KERNEL);
<br>
if (!vres)
<br>
@@ -498,7 +514,7 @@ static int amdgpu_vram_mgr_new(struct
ttm_resource_manager *man,
<br>
if (place->flags & TTM_PL_FLAG_TOPDOWN)
<br>
vres->flags |= DRM_BUDDY_TOPDOWN_ALLOCATION;
<br>
- if (place->flags & TTM_PL_FLAG_CONTIGUOUS)
<br>
+ if (bo_flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS)
<br>
vres->flags |= DRM_BUDDY_CONTIGUOUS_ALLOCATION;
<br>
</blockquote>
<br>
And this here just optimizes it in the way that it says try
harder to make it look contiguous.
<br>
</blockquote>
Here I removed the TTM_PL_FLAG_CONTIGUOUS. AFAIU, in both cases
(BO pinning and normal contiguous request)
<br>
this flag AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS is always set.
<br>
<blockquote type="cite">
<br>
Question is if that also works with pages_per_block of 2MiB or
does that only kick in when pages_per_block is maximal?
<br>
</blockquote>
AFAIU, if this flag AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS is set, we
are setting the pages_per_block as maximal, thus
<br>
we dont limit the BO. when we set the pages_per_block as
maximal, the min_block_size would be the tbo->page_alignment,
<br>
and this tbo->page_alignment would be the same value as the
required size. Required size can be less than 2MB or more than
2MB.
<br>
Below we have check size >= pages_per_block, when the
pages_per_block is maximal we don't limit the min_block_size.
<br>
</blockquote>
a small correction, when we set the pages_per_block as maximal, we
don't set the min_block_size, the buddy allocator will set the
min_block_size
<br>
as roundup_pow_of_two(size).
<br>
</blockquote>
<p>If setting 2MB pages_per_block, without
DRM_BUDDY_CONTIGUOUS_ALLOCATION flag, does buddy alloc from free
2MB blocks first, or buddy split larger blocks into smaller
blocks, as we want to keep larger block for contiguous allocation.</p>
<p>Regards,</p>
<p>Philip </p>
<blockquote type="cite" cite="mid:ea5d73f1-0bf9-41a1-95b7-6479ea4c6ad8@amd.com">
<br>
Thanks,
<br>
Arun.
<br>
<blockquote type="cite">
<blockquote type="cite">
<br>
<blockquote type="cite"> if (fpfn || lpfn !=
mgr->mm.size)
<br>
@@ -529,8 +545,21 @@ static int amdgpu_vram_mgr_new(struct
ttm_resource_manager *man,
<br>
min_block_size,
<br>
&vres->blocks,
<br>
vres->flags);
<br>
- if (unlikely(r))
<br>
+ if (unlikely(r)) {
<br>
+ if (bo_flags &
AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS &&
<br>
+ !(place->flags &
TTM_PL_FLAG_CONTIGUOUS)) {
<br>
+ /* Fallback to non contiguous allocation */
<br>
+ vres->flags &=
~DRM_BUDDY_CONTIGUOUS_ALLOCATION;
<br>
+ bo_flags &=
~AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS;
<br>
</blockquote>
<br>
Well I would say that this is a rather hacky implementation
and should be avoided if possible.
<br>
</blockquote>
sure, I will try to rewrite the code.
<br>
<blockquote type="cite">
<br>
<blockquote type="cite">+
<br>
+ pages_per_block =
<br>
+ amdgpu_vram_mgr_calculate_pages_per_block(tbo,
<br>
+ place,
<br>
+ bo_flags);
<br>
</blockquote>
<br>
Rather move the AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS handling out
of amdgpu_vram_mgr_calculate_pages_per_block().
<br>
</blockquote>
sure.
<br>
<br>
Thanks,
<br>
Arun.
<br>
<blockquote type="cite">
<br>
Regards,
<br>
Christian.
<br>
<br>
<blockquote type="cite">+ continue;
<br>
+ }
<br>
goto error_free_blocks;
<br>
+ }
<br>
if (size > remaining_size)
<br>
remaining_size = 0;
<br>
</blockquote>
<br>
</blockquote>
<br>
</blockquote>
<br>
</blockquote>
</body>
</html>