<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body>
    <p><br>
    </p>
    <div class="moz-cite-prefix">On 2024-04-16 02:50, Paneer Selvam,
      Arunpravin wrote:<br>
    </div>
    <blockquote type="cite" cite="mid:a8fd7808-a4ae-4e1e-8aae-be22bd4fd3b1@amd.com">
      <br>
      <br>
      On 4/16/2024 3:32 AM, Philip Yang wrote:
      <br>
      <blockquote type="cite">
        <br>
        <br>
        On 2024-04-14 10:57, Arunpravin Paneer Selvam wrote:
        <br>
        <blockquote type="cite">Now we have two flags for contiguous
          VRAM buffer allocation.
          <br>
          If the application request for
          AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS,
          <br>
          it would set the ttm place TTM_PL_FLAG_CONTIGUOUS flag in the
          <br>
          buffer's placement function.
          <br>
          <br>
          This patch will change the default behaviour of the two flags.
          <br>
        </blockquote>
        This change will simplify the KFD best effort contiguous VRAM
        allocation, because KFD doesn't need set new GEM_ flag.
        <br>
        <blockquote type="cite">When we set
          AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS
          <br>
          - This means contiguous is not mandatory.
          <br>
        </blockquote>
        <br>
        AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS used in couple of places. For
        page table BO, it is fine as BO size is page size 4K. For 64KB
        reserved BOs and F/W size related BOs, do all allocation happen
        at driver initialization before the VRAM is fragmented?
        <br>
        <br>
        <blockquote type="cite">- we will try to allocate the contiguous
          buffer. Say if the
          <br>
             allocation fails, we fallback to allocate the individual
          pages.
          <br>
          <br>
          When we setTTM_PL_FLAG_CONTIGUOUS
          <br>
          - This means contiguous allocation is mandatory.
          <br>
          - we are setting this in amdgpu_bo_pin_restricted() before bo
          validation
          <br>
             and check this flag in the vram manager file.
          <br>
          - if this is set, we should allocate the buffer pages
          contiguously.
          <br>
             the allocation fails, we return -ENOSPC.
          <br>
          <br>
          Signed-off-by: Arunpravin Paneer
          Selvam<a class="moz-txt-link-rfc2396E" href="mailto:Arunpravin.PaneerSelvam@amd.com"><Arunpravin.PaneerSelvam@amd.com></a>
          <br>
          Suggested-by: Christian König<a class="moz-txt-link-rfc2396E" href="mailto:christian.koenig@amd.com"><christian.koenig@amd.com></a>
          <br>
          ---
          <br>
            drivers/gpu/drm/amd/amdgpu/amdgpu_object.c   | 14 +++--
          <br>
            drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 57
          +++++++++++++++-----
          <br>
            2 files changed, 49 insertions(+), 22 deletions(-)
          <br>
          <br>
          diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
          b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
          <br>
          index 8bc79924d171..41926d631563 100644
          <br>
          --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
          <br>
          +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
          <br>
          @@ -153,8 +153,6 @@ void
          amdgpu_bo_placement_from_domain(struct amdgpu_bo *abo, u32
          domain)
          <br>
                    else
          <br>
                        places[c].flags |= TTM_PL_FLAG_TOPDOWN;
          <br>
            -        if (flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS)
          <br>
          -            places[c].flags |= TTM_PL_FLAG_CONTIGUOUS;
          <br>
                    c++;
          <br>
                }
          <br>
            @@ -899,6 +897,8 @@ int amdgpu_bo_pin_restricted(struct
          amdgpu_bo *bo, u32 domain,
          <br>
            {
          <br>
                struct amdgpu_device *adev =
          amdgpu_ttm_adev(bo->tbo.bdev);
          <br>
                struct ttm_operation_ctx ctx = { false, false };
          <br>
          +    struct ttm_place *places = bo->placements;
          <br>
          +    u32 c = 0;
          <br>
                int r, i;
          <br>
                  if (amdgpu_ttm_tt_get_usermm(bo->tbo.ttm))
          <br>
          @@ -921,16 +921,10 @@ int amdgpu_bo_pin_restricted(struct
          amdgpu_bo *bo, u32 domain,
          <br>
                  if (bo->tbo.pin_count) {
          <br>
                    uint32_t mem_type =
          bo->tbo.resource->mem_type;
          <br>
          -        uint32_t mem_flags =
          bo->tbo.resource->placement;
          <br>
                      if (!(domain &
          amdgpu_mem_type_to_domain(mem_type)))
          <br>
                        return -EINVAL;
          <br>
            -        if ((mem_type == TTM_PL_VRAM) &&
          <br>
          -            (bo->flags &
          AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS) &&
          <br>
          -            !(mem_flags & TTM_PL_FLAG_CONTIGUOUS))
          <br>
          -            return -EINVAL;
          <br>
          -
          <br>
        </blockquote>
        This looks like a bug before, but with this patch, the check
        makes sense and is needed.
        <br>
        <blockquote type="cite">          ttm_bo_pin(&bo->tbo);
          <br>
                      if (max_offset != 0) {
          <br>
          @@ -968,6 +962,10 @@ int amdgpu_bo_pin_restricted(struct
          amdgpu_bo *bo, u32 domain,
          <br>
                        bo->placements[i].lpfn = lpfn;
          <br>
                }
          <br>
            +    if (domain & AMDGPU_GEM_DOMAIN_VRAM &&
          <br>
          +        !WARN_ON(places[c].mem_type != TTM_PL_VRAM))
          <br>
          +        places[c].flags |= TTM_PL_FLAG_CONTIGUOUS;
          <br>
          +
          <br>
        </blockquote>
        <br>
        If BO pinned is not allocated with
        AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS, should pin and return
        scattered pages because the RDMA support scattered dmabuf.
        Christian also pointed this out.
        <br>
        <br>
        If (bo->flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS
        &&
        <br>
        <br>
            bo->placements[i].mem_type == TTM_PL_VRAM)
        <br>
                o->placements[i].flags |= TTM_PL_FLAG_CONTIGUOUS;
        <br>
        <blockquote type="cite">      r =
          ttm_bo_validate(&bo->tbo, &bo->placement,
          &ctx);
          <br>
                if (unlikely(r)) {
          <br>
                    dev_err(adev->dev, "%p pin failed\n", bo);
          <br>
          diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
          b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
          <br>
          index 8db880244324..ddbf302878f6 100644
          <br>
          --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
          <br>
          +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
          <br>
          @@ -88,6 +88,30 @@ static inline u64
          amdgpu_vram_mgr_blocks_size(struct list_head *head)
          <br>
                return size;
          <br>
            }
          <br>
            +static inline unsigned long
          <br>
          +amdgpu_vram_find_pages_per_block(struct ttm_buffer_object
          *tbo,
          <br>
          +                 const struct ttm_place *place,
          <br>
          +                 unsigned long bo_flags)
          <br>
          +{
          <br>
          +    unsigned long pages_per_block;
          <br>
          +
          <br>
          +    if (bo_flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS ||
          <br>
          +        place->flags & TTM_PL_FLAG_CONTIGUOUS) {
          <br>
          +        pages_per_block = ~0ul;
          <br>
          +    } else {
          <br>
          +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
          <br>
          +        pages_per_block = HPAGE_PMD_NR;
          <br>
          +#else
          <br>
          +        /* default to 2MB */
          <br>
          +        pages_per_block = 2UL << (20UL - PAGE_SHIFT);
          <br>
          +#endif
          <br>
          +        pages_per_block = max_t(uint32_t, pages_per_block,
          <br>
          +                    tbo->page_alignment);
          <br>
          +    }
          <br>
          +
          <br>
          +    return pages_per_block;
          <br>
          +}
          <br>
          +
          <br>
            /**
          <br>
             * DOC: mem_info_vram_total
          <br>
             *
          <br>
          @@ -451,8 +475,10 @@ static int amdgpu_vram_mgr_new(struct
          ttm_resource_manager *man,
          <br>
                struct amdgpu_vram_mgr *mgr = to_vram_mgr(man);
          <br>
                struct amdgpu_device *adev = to_amdgpu_device(mgr);
          <br>
                u64 vis_usage = 0, max_bytes, min_block_size;
          <br>
          +    struct amdgpu_bo *bo = ttm_to_amdgpu_bo(tbo);
          <br>
                struct amdgpu_vram_mgr_resource *vres;
          <br>
                u64 size, remaining_size, lpfn, fpfn;
          <br>
          +    unsigned long bo_flags = bo->flags;
          <br>
                struct drm_buddy *mm = &mgr->mm;
          <br>
                struct drm_buddy_block *block;
          <br>
                unsigned long pages_per_block;
          <br>
          @@ -468,18 +494,8 @@ static int amdgpu_vram_mgr_new(struct
          ttm_resource_manager *man,
          <br>
                if (tbo->type != ttm_bo_type_kernel)
          <br>
                    max_bytes -= AMDGPU_VM_RESERVED_VRAM;
          <br>
            -    if (place->flags & TTM_PL_FLAG_CONTIGUOUS) {
          <br>
          -        pages_per_block = ~0ul;
          <br>
          -    } else {
          <br>
          -#ifdef CONFIG_TRANSPARENT_HUGEPAGE
          <br>
          -        pages_per_block = HPAGE_PMD_NR;
          <br>
          -#else
          <br>
          -        /* default to 2MB */
          <br>
          -        pages_per_block = 2UL << (20UL - PAGE_SHIFT);
          <br>
          -#endif
          <br>
          -        pages_per_block = max_t(uint32_t, pages_per_block,
          <br>
          -                    tbo->page_alignment);
          <br>
          -    }
          <br>
          +    pages_per_block =
          <br>
          +        amdgpu_vram_find_pages_per_block(tbo, place,
          bo_flags);
          <br>
                  vres = kzalloc(sizeof(*vres), GFP_KERNEL);
          <br>
                if (!vres)
          <br>
          @@ -498,7 +514,8 @@ static int amdgpu_vram_mgr_new(struct
          ttm_resource_manager *man,
          <br>
                if (place->flags & TTM_PL_FLAG_TOPDOWN)
          <br>
                    vres->flags |= DRM_BUDDY_TOPDOWN_ALLOCATION;
          <br>
            -    if (place->flags & TTM_PL_FLAG_CONTIGUOUS)
          <br>
          +    if (bo_flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS ||
          <br>
          +        place->flags & TTM_PL_FLAG_CONTIGUOUS)
          <br>
                    vres->flags |= DRM_BUDDY_CONTIGUOUS_ALLOCATION;
          <br>
                  if (fpfn || lpfn != mgr->mm.size)
          <br>
          @@ -529,8 +546,20 @@ static int amdgpu_vram_mgr_new(struct
          ttm_resource_manager *man,
          <br>
                                   min_block_size,
          <br>
                                   &vres->blocks,
          <br>
                                   vres->flags);
          <br>
          -        if (unlikely(r))
          <br>
          +        if (unlikely(r)) {
          <br>
        </blockquote>
        <br>
        If pin BO failed to allocate contiguous VRAM, this should return
        failure to application, as RDMA cannot work with scattered VRAM
        pages.
        <br>
        <br>
      </blockquote>
      here we can return -ENOSPC when the BO is pinned (i.e
      TTM_PL_FLAG_CONTIGUOUS is set). Please find the below modified
      condition,
      <br>
      if (bo_flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS &&
      <br>
          !(place->flags & TTM_PL_FLAG_CONTIGUOUS)
      <br>
    </blockquote>
    <p>Yes, this change looks good for me.</p>
    <p>Thanks</p>
    <p>Philip<br>
    </p>
    <blockquote type="cite" cite="mid:a8fd7808-a4ae-4e1e-8aae-be22bd4fd3b1@amd.com">Hence if
      the TTM_PL_FLAG_CONTIGUOUS flag is set, we don't proceed for
      allocating individual pages.
      <br>
      <blockquote type="cite">
        <br>
        Regards,
        <br>
        <br>
        Philip
        <br>
        <br>
        <blockquote type="cite">+            if (bo_flags &
          AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS) {
          <br>
          +                /* Fallback to non contiguous allocation */
          <br>
          +                vres->flags &=
          ~DRM_BUDDY_CONTIGUOUS_ALLOCATION;
          <br>
          +                bo_flags &=
          ~AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS;
          <br>
          +
          <br>
          +                pages_per_block =
          <br>
          +                    amdgpu_vram_find_pages_per_block(tbo,
          <br>
          +                                     place,
          <br>
          +                                     bo_flags);
          <br>
          +                continue;
          <br>
          +            }
          <br>
                        goto error_free_blocks;
          <br>
          +        }
          <br>
                      if (size > remaining_size)
          <br>
                        remaining_size = 0;
          <br>
        </blockquote>
      </blockquote>
      <br>
    </blockquote>
  </body>
</html>