<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body>
    <p><br>
    </p>
    <div class="moz-cite-prefix">On 2024-04-17 10:32, Paneer Selvam,
      Arunpravin wrote:<br>
    </div>
    <blockquote type="cite" cite="mid:ea5d73f1-0bf9-41a1-95b7-6479ea4c6ad8@amd.com">Hi
      Christian,
      <br>
      <br>
      On 4/17/2024 6:57 PM, Paneer Selvam, Arunpravin wrote:
      <br>
      <blockquote type="cite">Hi Christian,
        <br>
        <br>
        On 4/17/2024 12:19 PM, Christian König wrote:
        <br>
        <blockquote type="cite">Am 17.04.24 um 08:21 schrieb Arunpravin
          Paneer Selvam:
          <br>
          <blockquote type="cite">Now we have two flags for contiguous
            VRAM buffer allocation.
            <br>
            If the application request for
            AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS,
            <br>
            it would set the ttm place TTM_PL_FLAG_CONTIGUOUS flag in
            the
            <br>
            buffer's placement function.
            <br>
            <br>
            This patch will change the default behaviour of the two
            flags.
            <br>
            <br>
            When we set AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS
            <br>
            - This means contiguous is not mandatory.
            <br>
            - we will try to allocate the contiguous buffer. Say if the
            <br>
               allocation fails, we fallback to allocate the individual
            pages.
            <br>
            <br>
            When we setTTM_PL_FLAG_CONTIGUOUS
            <br>
            - This means contiguous allocation is mandatory.
            <br>
            - we are setting this in amdgpu_bo_pin_restricted() before
            bo validation
            <br>
               and check this flag in the vram manager file.
            <br>
            - if this is set, we should allocate the buffer pages
            contiguously.
            <br>
               the allocation fails, we return -ENOSPC.
            <br>
            <br>
            v2:
            <br>
               - keep the mem_flags and bo->flags check as
            is(Christian)
            <br>
               - place the TTM_PL_FLAG_CONTIGUOUS flag setting into the
            <br>
                 amdgpu_bo_pin_restricted function placement range
            iteration
            <br>
                 loop(Christian)
            <br>
               - rename find_pages with
            amdgpu_vram_mgr_calculate_pages_per_block
            <br>
                 (Christian)
            <br>
               - Keep the kernel BO allocation as is(Christain)
            <br>
               - If BO pin vram allocation failed, we need to return
            -ENOSPC as
            <br>
                 RDMA cannot work with scattered VRAM pages(Philip)
            <br>
            <br>
            Signed-off-by: Arunpravin Paneer Selvam
            <a class="moz-txt-link-rfc2396E" href="mailto:Arunpravin.PaneerSelvam@amd.com"><Arunpravin.PaneerSelvam@amd.com></a>
            <br>
            Suggested-by: Christian König
            <a class="moz-txt-link-rfc2396E" href="mailto:christian.koenig@amd.com"><christian.koenig@amd.com></a>
            <br>
            ---
            <br>
              drivers/gpu/drm/amd/amdgpu/amdgpu_object.c   |  8 ++-
            <br>
              drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 57
            +++++++++++++++-----
            <br>
              2 files changed, 50 insertions(+), 15 deletions(-)
            <br>
            <br>
            diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
            b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
            <br>
            index 8bc79924d171..caaef7b1df49 100644
            <br>
            --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
            <br>
            +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
            <br>
            @@ -153,8 +153,10 @@ void
            amdgpu_bo_placement_from_domain(struct amdgpu_bo *abo, u32
            domain)
            <br>
                      else
            <br>
                          places[c].flags |= TTM_PL_FLAG_TOPDOWN;
            <br>
              -        if (flags &
            AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS)
            <br>
            +        if (abo->tbo.type == ttm_bo_type_kernel
            &&
            <br>
            +            flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS)
            <br>
                          places[c].flags |= TTM_PL_FLAG_CONTIGUOUS;
            <br>
            +
            <br>
                      c++;
            <br>
                  }
            <br>
              @@ -966,6 +968,10 @@ int amdgpu_bo_pin_restricted(struct
            amdgpu_bo *bo, u32 domain,
            <br>
                      if (!bo->placements[i].lpfn ||
            <br>
                          (lpfn && lpfn <
            bo->placements[i].lpfn))
            <br>
                          bo->placements[i].lpfn = lpfn;
            <br>
            +
            <br>
            +        if (bo->flags &
            AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS &&
            <br>
            +            bo->placements[i].mem_type == TTM_PL_VRAM)
            <br>
            +            bo->placements[i].flags |=
            TTM_PL_FLAG_CONTIGUOUS;
            <br>
                  }
            <br>
                    r = ttm_bo_validate(&bo->tbo,
            &bo->placement, &ctx);
            <br>
          </blockquote>
          <br>
          Nice work, up till here that looks exactly right as far as I
          can see.
          <br>
          <br>
          <blockquote type="cite">diff --git
            a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
            b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
            <br>
            index 8db880244324..4be8b091099a 100644
            <br>
            --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
            <br>
            +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
            <br>
            @@ -88,6 +88,29 @@ static inline u64
            amdgpu_vram_mgr_blocks_size(struct list_head *head)
            <br>
                  return size;
            <br>
              }
            <br>
              +static inline unsigned long
            <br>
            +amdgpu_vram_mgr_calculate_pages_per_block(struct
            ttm_buffer_object *tbo,
            <br>
            +                      const struct ttm_place *place,
            <br>
            +                      unsigned long bo_flags)
            <br>
            +{
            <br>
            +    unsigned long pages_per_block;
            <br>
            +
            <br>
            +    if (bo_flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS) {
            <br>
            +        pages_per_block = ~0ul;
            <br>
          </blockquote>
          <br>
          If I understand it correctly this here enforces the allocation
          of a contiguous buffer in the way that it says we should have
          only one giant page for the whole BO.
          <br>
        </blockquote>
        yes, for contiguous we don't have the 2MB limit, hence we are
        setting to maximum to avoid restricting the min_block_size to
        2MB limitation.
        <br>
        <blockquote type="cite">
          <br>
          <blockquote type="cite">+    } else {
            <br>
            +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
            <br>
            +        pages_per_block = HPAGE_PMD_NR;
            <br>
            +#else
            <br>
            +        /* default to 2MB */
            <br>
            +        pages_per_block = 2UL << (20UL - PAGE_SHIFT);
            <br>
            +#endif
            <br>
            +        pages_per_block = max_t(uint32_t, pages_per_block,
            <br>
            +                    tbo->page_alignment);
            <br>
            +    }
            <br>
            +
            <br>
            +    return pages_per_block;
            <br>
            +}
            <br>
            +
            <br>
              /**
            <br>
               * DOC: mem_info_vram_total
            <br>
               *
            <br>
            @@ -451,8 +474,10 @@ static int amdgpu_vram_mgr_new(struct
            ttm_resource_manager *man,
            <br>
                  struct amdgpu_vram_mgr *mgr = to_vram_mgr(man);
            <br>
                  struct amdgpu_device *adev = to_amdgpu_device(mgr);
            <br>
                  u64 vis_usage = 0, max_bytes, min_block_size;
            <br>
            +    struct amdgpu_bo *bo = ttm_to_amdgpu_bo(tbo);
            <br>
                  struct amdgpu_vram_mgr_resource *vres;
            <br>
                  u64 size, remaining_size, lpfn, fpfn;
            <br>
            +    unsigned long bo_flags = bo->flags;
            <br>
                  struct drm_buddy *mm = &mgr->mm;
            <br>
                  struct drm_buddy_block *block;
            <br>
                  unsigned long pages_per_block;
            <br>
            @@ -468,18 +493,9 @@ static int amdgpu_vram_mgr_new(struct
            ttm_resource_manager *man,
            <br>
                  if (tbo->type != ttm_bo_type_kernel)
            <br>
                      max_bytes -= AMDGPU_VM_RESERVED_VRAM;
            <br>
              -    if (place->flags & TTM_PL_FLAG_CONTIGUOUS) {
            <br>
            -        pages_per_block = ~0ul;
            <br>
            -    } else {
            <br>
            -#ifdef CONFIG_TRANSPARENT_HUGEPAGE
            <br>
            -        pages_per_block = HPAGE_PMD_NR;
            <br>
            -#else
            <br>
            -        /* default to 2MB */
            <br>
            -        pages_per_block = 2UL << (20UL - PAGE_SHIFT);
            <br>
            -#endif
            <br>
            -        pages_per_block = max_t(uint32_t, pages_per_block,
            <br>
            -                    tbo->page_alignment);
            <br>
            -    }
            <br>
            +    pages_per_block =
            <br>
            +        amdgpu_vram_mgr_calculate_pages_per_block(tbo,
            place,
            <br>
            +                              bo_flags);
            <br>
                    vres = kzalloc(sizeof(*vres), GFP_KERNEL);
            <br>
                  if (!vres)
            <br>
            @@ -498,7 +514,7 @@ static int amdgpu_vram_mgr_new(struct
            ttm_resource_manager *man,
            <br>
                  if (place->flags & TTM_PL_FLAG_TOPDOWN)
            <br>
                      vres->flags |= DRM_BUDDY_TOPDOWN_ALLOCATION;
            <br>
              -    if (place->flags & TTM_PL_FLAG_CONTIGUOUS)
            <br>
            +    if (bo_flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS)
            <br>
                      vres->flags |= DRM_BUDDY_CONTIGUOUS_ALLOCATION;
            <br>
          </blockquote>
          <br>
          And this here just optimizes it in the way that it says try
          harder to make it look contiguous.
          <br>
        </blockquote>
        Here I removed the TTM_PL_FLAG_CONTIGUOUS. AFAIU, in both cases
        (BO pinning and normal contiguous request)
        <br>
        this flag AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS is always set.
        <br>
        <blockquote type="cite">
          <br>
          Question is if that also works with pages_per_block of 2MiB or
          does that only kick in when pages_per_block is maximal?
          <br>
        </blockquote>
        AFAIU, if this flag AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS is set, we
        are setting the pages_per_block as maximal, thus
        <br>
        we dont limit the BO. when we set the pages_per_block as
        maximal, the min_block_size would be the tbo->page_alignment,
        <br>
        and this tbo->page_alignment would be the same value as the
        required size. Required size can be less than 2MB or more than
        2MB.
        <br>
        Below we have check size >= pages_per_block, when the
        pages_per_block is maximal we don't limit the min_block_size.
        <br>
      </blockquote>
      a small correction, when we set the pages_per_block as maximal, we
      don't set the min_block_size, the buddy allocator will set the
      min_block_size
      <br>
      as roundup_pow_of_two(size).
      <br>
    </blockquote>
    <p>If setting 2MB pages_per_block, without
      DRM_BUDDY_CONTIGUOUS_ALLOCATION flag, does buddy alloc from free
      2MB blocks first, or buddy split larger blocks into smaller
      blocks, as we want to keep larger block for contiguous allocation.</p>
    <p>Regards,</p>
    <p>Philip  </p>
    <blockquote type="cite" cite="mid:ea5d73f1-0bf9-41a1-95b7-6479ea4c6ad8@amd.com">
      <br>
      Thanks,
      <br>
      Arun.
      <br>
      <blockquote type="cite">
        <blockquote type="cite">
          <br>
          <blockquote type="cite">        if (fpfn || lpfn !=
            mgr->mm.size)
            <br>
            @@ -529,8 +545,21 @@ static int amdgpu_vram_mgr_new(struct
            ttm_resource_manager *man,
            <br>
                                     min_block_size,
            <br>
                                     &vres->blocks,
            <br>
                                     vres->flags);
            <br>
            -        if (unlikely(r))
            <br>
            +        if (unlikely(r)) {
            <br>
            +            if (bo_flags &
            AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS &&
            <br>
            +                !(place->flags &
            TTM_PL_FLAG_CONTIGUOUS)) {
            <br>
            +                /* Fallback to non contiguous allocation */
            <br>
            +                vres->flags &=
            ~DRM_BUDDY_CONTIGUOUS_ALLOCATION;
            <br>
            +                bo_flags &=
            ~AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS;
            <br>
          </blockquote>
          <br>
          Well I would say that this is a rather hacky implementation
          and should be avoided if possible.
          <br>
        </blockquote>
        sure, I will try to rewrite the code.
        <br>
        <blockquote type="cite">
          <br>
          <blockquote type="cite">+
            <br>
            +                pages_per_block =
            <br>
            + amdgpu_vram_mgr_calculate_pages_per_block(tbo,
            <br>
            +                                          place,
            <br>
            +                                          bo_flags);
            <br>
          </blockquote>
          <br>
          Rather move the AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS handling out
          of amdgpu_vram_mgr_calculate_pages_per_block().
          <br>
        </blockquote>
        sure.
        <br>
        <br>
        Thanks,
        <br>
        Arun.
        <br>
        <blockquote type="cite">
          <br>
          Regards,
          <br>
          Christian.
          <br>
          <br>
          <blockquote type="cite">+                continue;
            <br>
            +            }
            <br>
                          goto error_free_blocks;
            <br>
            +        }
            <br>
                        if (size > remaining_size)
            <br>
                          remaining_size = 0;
            <br>
          </blockquote>
          <br>
        </blockquote>
        <br>
      </blockquote>
      <br>
    </blockquote>
  </body>
</html>