<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body>
    <p><br>
    </p>
    <div class="moz-cite-prefix">On 2022-07-25 10:34, Felix Kuehling
      wrote:<br>
    </div>
    <blockquote type="cite" cite="mid:044e7f8e-9fa5-d78b-39f8-84bce155d4c6@amd.com">Am
      2022-07-25 um 08:23 schrieb Philip Yang:
      <br>
      <blockquote type="cite">This will be used to split giant svm range
        into smaller ranges, to
        <br>
        support VRAM overcommitment by giant range and improve GPU retry
        fault
        <br>
        recover on giant range.
        <br>
        <br>
        Signed-off-by: Philip Yang <a class="moz-txt-link-rfc2396E" href="mailto:Philip.Yang@amd.com"><Philip.Yang@amd.com></a>
        <br>
        ---
        <br>
          drivers/gpu/drm/amd/amdkfd/kfd_migrate.c |  2 ++
        <br>
          drivers/gpu/drm/amd/amdkfd/kfd_svm.c     | 15 +++++++++++++++
        <br>
          drivers/gpu/drm/amd/amdkfd/kfd_svm.h     |  3 +++
        <br>
          3 files changed, 20 insertions(+)
        <br>
        <br>
        diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
        b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
        <br>
        index 9667015a6cbc..b1f87aa6138b 100644
        <br>
        --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
        <br>
        +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
        <br>
        @@ -1019,6 +1019,8 @@ int svm_migrate_init(struct amdgpu_device
        *adev)
        <br>
               
        amdgpu_amdkfd_reserve_system_mem(SVM_HMM_PAGE_STRUCT_SIZE(size));
        <br>
          +    svm_range_set_max_pages(adev);
        <br>
        +
        <br>
              pr_info("HMM registered %ldMB device memory\n", size
        >> 20);
        <br>
                return 0;
        <br>
        diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
        b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
        <br>
        index b592aee6d9d6..cf9565ddddf8 100644
        <br>
        --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
        <br>
        +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
        <br>
        @@ -46,6 +46,11 @@
        <br>
           */
        <br>
          #define AMDGPU_SVM_RANGE_RETRY_FAULT_PENDING    (2UL *
        NSEC_PER_MSEC)
        <br>
          +/* Giant svm range split into smaller ranges based on this,
        it is decided using
        <br>
        + * minimum of all dGPU/APU 1/32 VRAM size, between 2MB to 1GB
        and align to 2MB.
        <br>
        + */
        <br>
        +uint64_t max_svm_range_pages;
        <br>
        +
        <br>
          struct criu_svm_metadata {
        <br>
              struct list_head list;
        <br>
              struct kfd_criu_svm_range_priv_data data;
        <br>
        @@ -1869,6 +1874,16 @@ static struct svm_range
        *svm_range_clone(struct svm_range *old)
        <br>
                return new;
        <br>
          }
        <br>
        +__init void svm_range_set_max_pages(struct amdgpu_device *adev)
        <br>
      </blockquote>
      <br>
      Why is this marked as __init? This can run much later than module
      init.
      <br>
      <br>
    </blockquote>
    <p>Thanks, this was called in amdgpu_amdkfd_init from module init,
      forget to remove __init after calling from svm_migrate_init, which
      is called from pci_probe.</p>
    <p>Regards,</p>
    <p>Philip<br>
    </p>
    <blockquote type="cite" cite="mid:044e7f8e-9fa5-d78b-39f8-84bce155d4c6@amd.com">
      <br>
      <blockquote type="cite">+{
        <br>
        +    uint64_t pages;
        <br>
        +
        <br>
        +    /* 1/32 VRAM size in pages */
        <br>
        +    pages = adev->gmc.real_vram_size >> 17;
        <br>
        +    pages = clamp(pages, 1ULL << 9, 1ULL << 18);
        <br>
        +    max_svm_range_pages = min_not_zero(max_svm_range_pages,
        pages);
        <br>
        +    max_svm_range_pages = ALIGN(max_svm_range_pages, 1ULL
        << 9);
        <br>
      </blockquote>
      <br>
      I'd recommend updating max_svm_range_pages with a single
      WRITE_ONCE to avoid race conditions with GPU hot-plug.
      <br>
      <br>
      Regards,
      <br>
        Felix
      <br>
      <br>
      <br>
      <blockquote type="cite">+}
        <br>
            /**
        <br>
           * svm_range_add - add svm range and handle overlap
        <br>
        diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.h
        b/drivers/gpu/drm/amd/amdkfd/kfd_svm.h
        <br>
        index eab7f6d3b13c..346a41bf8dbf 100644
        <br>
        --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.h
        <br>
        +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.h
        <br>
        @@ -204,6 +204,9 @@ void
        svm_range_list_lock_and_flush_work(struct svm_range_list *svms,
        struct mm_s
        <br>
          #define KFD_IS_SVM_API_SUPPORTED(dev) ((dev)->pgmap.type !=
        0)
        <br>
            void svm_range_bo_unref_async(struct svm_range_bo *svm_bo);
        <br>
        +
        <br>
        +__init void svm_range_set_max_pages(struct amdgpu_device
        *adev);
        <br>
        +
        <br>
          #else
        <br>
            struct kfd_process;
        <br>
      </blockquote>
    </blockquote>
  </body>
</html>