<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body>
    <p><br>
    </p>
    <div class="moz-cite-prefix">On 2024-01-11 11:54, Felix Kuehling
      wrote:<br>
    </div>
    <blockquote type="cite" cite="mid:c76951c2-6abc-49a2-be86-301aa87457ce@amd.com">
      <br>
      On 2024-01-10 17:01, Philip Yang wrote:
      <br>
      <blockquote type="cite">While svm range partial migrating to
        system memory, clear dma_addr vram
        <br>
        domain flag, otherwise the future split will get incorrect
        vram_pages
        <br>
        and actual loc.
        <br>
        <br>
        After range spliting, set new range and old range actual_loc:
        <br>
        new range actual_loc is 0 if new->vram_pages is 0.
        <br>
        old range actual_loc is 0 if old->vram_pages -
        new->vram_pages == 0.
        <br>
        <br>
        new range takes svm_bo ref only if vram_pages not equal to 0.
        <br>
        <br>
        Signed-off-by: Philip Yang <a class="moz-txt-link-rfc2396E" href="mailto:Philip.Yang@amd.com"><Philip.Yang@amd.com></a>
        <br>
        ---
        <br>
          drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 20
        +++++++++++++++++++-
        <br>
          drivers/gpu/drm/amd/amdkfd/kfd_svm.c     | 24
        ++++++++++++++----------
        <br>
          2 files changed, 33 insertions(+), 11 deletions(-)
        <br>
        <br>
        diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
        b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
        <br>
        index f856901055d3..dae05f70257b 100644
        <br>
        --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
        <br>
        +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
        <br>
        @@ -563,18 +563,30 @@ svm_migrate_copy_to_ram(struct
        amdgpu_device *adev, struct svm_range *prange,
        <br>
                      struct migrate_vma *migrate, struct dma_fence
        **mfence,
        <br>
                      dma_addr_t *scratch, uint64_t npages)
        <br>
          {
        <br>
        +    struct kfd_process *p = container_of(prange->svms,
        struct kfd_process, svms);
        <br>
              struct device *dev = adev->dev;
        <br>
        +    dma_addr_t *dma_addr;
        <br>
              uint64_t *src;
        <br>
              dma_addr_t *dst;
        <br>
              struct page *dpage;
        <br>
              uint64_t i = 0, j;
        <br>
              uint64_t addr;
        <br>
        +    s32 gpuidx;
        <br>
        +    u64 offset;
        <br>
              int r = 0;
        <br>
                pr_debug("svms 0x%p [0x%lx 0x%lx]\n", prange->svms,
        prange->start,
        <br>
                   prange->last);
        <br>
          -    addr = prange->start << PAGE_SHIFT;
        <br>
      </blockquote>
      <br>
      Is this another bug fix for partial migration? If so, it may be
      worth making that a separate patch.
      <br>
    </blockquote>
    <p>yes, it is another bug I just noticed, the addr is passed to
      alloc system page along with migrate.vma, but addr is ignored for
      normal path, only used for shmem path, maybe it doesn't matter, I
      will put this into a separate patch anyway.<br>
    </p>
    <blockquote type="cite" cite="mid:c76951c2-6abc-49a2-be86-301aa87457ce@amd.com">
      <br>
      <br>
      <blockquote type="cite">+    gpuidx =
        kfd_process_gpuidx_from_gpuid(p, prange->actual_loc);
        <br>
        +    if (gpuidx < 0) {
        <br>
        +        pr_debug("no GPU id 0x%x found\n",
        prange->actual_loc);
        <br>
        +        return -EINVAL;
        <br>
        +    }
        <br>
        +
        <br>
        +    addr = migrate->start;
        <br>
        +    offset = (addr >> PAGE_SHIFT) - prange->start;
        <br>
        +    dma_addr = prange->dma_addr[gpuidx];
        <br>
                src = (uint64_t *)(scratch + npages);
        <br>
              dst = scratch;
        <br>
        @@ -623,6 +635,12 @@ svm_migrate_copy_to_ram(struct
        amdgpu_device *adev, struct svm_range *prange,
        <br>
                      goto out_oom;
        <br>
                  }
        <br>
          +        /* Clear VRAM flag when page is migrated to ram, to
        count vram
        <br>
        +         * pages correctly when spliting the range.
        <br>
        +         */
        <br>
        +        if (dma_addr && (dma_addr[offset + i] &
        SVM_RANGE_VRAM_DOMAIN))
        <br>
        +            dma_addr[offset + i] = 0;
        <br>
        +
        <br>
      </blockquote>
      <br>
      I'm not a big fan of messing with the DMA arrays here, but I don't
      have a good alternative. I think what bothers me is, how the DMA
      address array and handling of vram page count is now spread out
      across so many places. It feels fragile.
      <br>
      <br>
      Maybe it would be good to add a helper in kfd_svm.c:
      svm_get_dma_addr_for_page_count(prange, offset). That way you can
      keep the choice of gpuid and offset calculation in one place in
      kfd_svm.c, close to svm_range_copy_array.
      <br>
    </blockquote>
    <p>vram page counting is only used when spliting range, it is good
      idea to add helper and put close to svm range split and copy
      array, not put in the migration path.</p>
    <p>Regards,</p>
    <p>Philip<br>
    </p>
    <blockquote type="cite" cite="mid:c76951c2-6abc-49a2-be86-301aa87457ce@amd.com">
      <br>
      Other than that, the patch looks good to me.
      <br>
      <br>
      Regards,
      <br>
        Felix
      <br>
      <br>
      <br>
      <blockquote type="cite">          pr_debug_ratelimited("dma
        mapping dst to 0x%llx, pfn 0x%lx\n",
        <br>
                               dst[i] >> PAGE_SHIFT,
        page_to_pfn(dpage));
        <br>
          diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
        b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
        <br>
        index cc24f30f88fb..35ee9e648cca 100644
        <br>
        --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
        <br>
        +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
        <br>
        @@ -362,7 +362,6 @@ svm_range *svm_range_new(struct
        svm_range_list *svms, uint64_t start,
        <br>
              INIT_LIST_HEAD(&prange->child_list);
        <br>
              atomic_set(&prange->invalid, 0);
        <br>
              prange->validate_timestamp = 0;
        <br>
        -    prange->vram_pages = 0;
        <br>
              mutex_init(&prange->migrate_mutex);
        <br>
              mutex_init(&prange->lock);
        <br>
          @@ -980,9 +979,14 @@ svm_range_split_pages(struct svm_range
        *new, struct svm_range *old,
        <br>
                  if (r)
        <br>
                      return r;
        <br>
              }
        <br>
        -    if (old->actual_loc)
        <br>
        +    if (old->actual_loc && new->vram_pages) {
        <br>
                  old->vram_pages -= new->vram_pages;
        <br>
        -
        <br>
        +        new->actual_loc = old->actual_loc;
        <br>
        +        if (!old->vram_pages)
        <br>
        +            old->actual_loc = 0;
        <br>
        +    }
        <br>
        +    pr_debug("new->vram_pages 0x%llx loc 0x%x
        old->vram_pages 0x%llx loc 0x%x\n",
        <br>
        +         new->vram_pages, new->actual_loc,
        old->vram_pages, old->actual_loc);
        <br>
              return 0;
        <br>
          }
        <br>
          @@ -1002,13 +1006,14 @@ svm_range_split_nodes(struct svm_range
        *new, struct svm_range *old,
        <br>
                  new->offset = old->offset + npages;
        <br>
              }
        <br>
          -    new->svm_bo = svm_range_bo_ref(old->svm_bo);
        <br>
        -    new->ttm_res = old->ttm_res;
        <br>
        -
        <br>
        -    spin_lock(&new->svm_bo->list_lock);
        <br>
        -    list_add(&new->svm_bo_list,
        &new->svm_bo->range_list);
        <br>
        -    spin_unlock(&new->svm_bo->list_lock);
        <br>
        +    if (new->vram_pages) {
        <br>
        +        new->svm_bo = svm_range_bo_ref(old->svm_bo);
        <br>
        +        new->ttm_res = old->ttm_res;
        <br>
          +        spin_lock(&new->svm_bo->list_lock);
        <br>
        +        list_add(&new->svm_bo_list,
        &new->svm_bo->range_list);
        <br>
        +        spin_unlock(&new->svm_bo->list_lock);
        <br>
        +    }
        <br>
              return 0;
        <br>
          }
        <br>
          @@ -1058,7 +1063,6 @@ svm_range_split_adjust(struct svm_range
        *new, struct svm_range *old,
        <br>
              new->flags = old->flags;
        <br>
              new->preferred_loc = old->preferred_loc;
        <br>
              new->prefetch_loc = old->prefetch_loc;
        <br>
        -    new->actual_loc = old->actual_loc;
        <br>
              new->granularity = old->granularity;
        <br>
              new->mapped_to_gpu = old->mapped_to_gpu;
        <br>
              bitmap_copy(new->bitmap_access, old->bitmap_access,
        MAX_GPU_INSTANCE);
        <br>
      </blockquote>
    </blockquote>
  </body>
</html>