<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body>
    <p><br>
    </p>
    <div class="moz-cite-prefix">On 2022-05-18 17:31, Felix Kuehling
      wrote:<br>
    </div>
    <blockquote type="cite" cite="mid:63fe88f7-fa3c-67be-73ab-8fed555e4c52@amd.com">On
      2022-05-18 13:55, philip yang wrote:
      <br>
      <blockquote type="cite">
        <br>
        <br>
        On 2022-05-17 19:11, Alex Sierra wrote:
        <br>
        <blockquote type="cite">TTM used to track the "acc_size" of all
          BOs internally. We needed to
          <br>
          keep track of it in our memory reservation to avoid TTM
          running out
          <br>
          of memory in its own accounting. However, that "acc_size"
          accounting
          <br>
          has since been removed from TTM. Therefore we don't really
          need to
          <br>
          track it any more.
          <br>
        </blockquote>
        <br>
        acc_size is size of amdgpu_bo data structure plus size of pages
        array and dma_address array, it is needed for each BO, so should
        track as system_mem_needed. It can be removed from
        ttm_mem_needed as this is not allocated by TTM as GTT memory.
        <br>
        <br>
      </blockquote>
      You have a point, I didn't think of that. The fact that TTM isn't
      tracking the data structure sizes any more doesn't mean, we
      shouldn't account for it in our own system memory usage.
      <br>
      <br>
      That said, do we actually have DMA address arrays for VRAM
      allocations?
      <br>
      <br>
      Also, acc_size doesn't track the extra dmabuf BOs we create for
      DMA mappings on multiple GPUs. So I'm not sure how useful the
      acc_size tracking is at this point. The system memory limit is
      currently 15/16 of total memory. Maybe that leaves enough reserve
      for data structure sizes?
      <br>
    </blockquote>
    Based on the calculation, set 15/16 free system memory limit is
    enough to prevent OOM killer, as acc_size is up to 1/64 of memory
    size on 8GPU system, so it is safe to simplify and remove data
    structure acc_size.
    <p>One nit-pick below. This patch is Reviewed-by: Philip Yang
      <a class="moz-txt-link-rfc2396E" href="mailto:philip.yang@amd.com"><philip.yang@amd.com></a></p>
    <blockquote type="cite" cite="mid:63fe88f7-fa3c-67be-73ab-8fed555e4c52@amd.com">
      <br>
      Regards,
      <br>
        Felix
      <br>
      <br>
      <br>
      <blockquote type="cite">Regards,
        <br>
        <br>
        Philip
        <br>
        <br>
        <blockquote type="cite">Signed-off-by: Alex
          Sierra<a class="moz-txt-link-rfc2396E" href="mailto:alex.sierra@amd.com"><alex.sierra@amd.com></a>
          <br>
          ---
          <br>
            .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 57
          ++++++-------------
          <br>
            1 file changed, 16 insertions(+), 41 deletions(-)
          <br>
          <br>
          diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
          b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
          <br>
          index fada3b149361..e985cf9c7ec0 100644
          <br>
          --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
          <br>
          +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
          <br>
          @@ -108,17 +108,8 @@ void
          amdgpu_amdkfd_reserve_system_mem(uint64_t size)
          <br>
             * compromise that should work in most cases without
          reserving too
          <br>
             * much memory for page tables unnecessarily (factor 16K,
          >> 14).
          <br>
             */
          <br>
          -#define ESTIMATE_PT_SIZE(mem_size) ((mem_size) >> 14)
          <br>
          -
          <br>
          -static size_t amdgpu_amdkfd_acc_size(uint64_t size)
          <br>
          -{
          <br>
          -    size >>= PAGE_SHIFT;
          <br>
          -    size *= sizeof(dma_addr_t) + sizeof(void *);
          <br>
            -    return __roundup_pow_of_two(sizeof(struct amdgpu_bo)) +
          <br>
          -        __roundup_pow_of_two(sizeof(struct ttm_tt)) +
          <br>
          -        PAGE_ALIGN(size);
          <br>
          -}
          <br>
          +#define ESTIMATE_PT_SIZE(mem_size) ((mem_size) >> 14)
          <br>
              /**
          <br>
             * amdgpu_amdkfd_reserve_mem_limit() - Decrease available
          memory by size
          <br>
        </blockquote>
      </blockquote>
    </blockquote>
    <p>Remove "including any reserved for control structures" from
      function description.</p>
    <p>Regards,</p>
    <p>Philip<br>
    </p>
    <blockquote type="cite" cite="mid:63fe88f7-fa3c-67be-73ab-8fed555e4c52@amd.com">
      <blockquote type="cite">
        <blockquote type="cite">@@ -136,28 +127,22 @@ static int
          amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device *adev,
          <br>
            {
          <br>
                uint64_t reserved_for_pt =
          <br>
                    ESTIMATE_PT_SIZE(amdgpu_amdkfd_total_mem_size);
          <br>
          -    size_t acc_size, system_mem_needed, ttm_mem_needed,
          vram_needed;
          <br>
          +    size_t system_mem_needed, ttm_mem_needed, vram_needed;
          <br>
                int ret = 0;
          <br>
            -    acc_size = amdgpu_amdkfd_acc_size(size);
          <br>
          -
          <br>
          +    system_mem_needed = 0;
          <br>
          +    ttm_mem_needed = 0;
          <br>
                vram_needed = 0;
          <br>
                if (alloc_flag & KFD_IOC_ALLOC_MEM_FLAGS_GTT) {
          <br>
          -        system_mem_needed = acc_size + size;
          <br>
          -        ttm_mem_needed = acc_size + size;
          <br>
          +        system_mem_needed = size;
          <br>
          +        ttm_mem_needed = size;
          <br>
                } else if (alloc_flag &
          KFD_IOC_ALLOC_MEM_FLAGS_VRAM) {
          <br>
          -        system_mem_needed = acc_size;
          <br>
          -        ttm_mem_needed = acc_size;
          <br>
                    vram_needed = size;
          <br>
                } else if (alloc_flag &
          KFD_IOC_ALLOC_MEM_FLAGS_USERPTR) {
          <br>
          -        system_mem_needed = acc_size + size;
          <br>
          -        ttm_mem_needed = acc_size;
          <br>
          -    } else if (alloc_flag &
          <br>
          -           (KFD_IOC_ALLOC_MEM_FLAGS_DOORBELL |
          <br>
          -            KFD_IOC_ALLOC_MEM_FLAGS_MMIO_REMAP)) {
          <br>
          -        system_mem_needed = acc_size;
          <br>
          -        ttm_mem_needed = acc_size;
          <br>
          -    } else {
          <br>
          +        system_mem_needed = size;
          <br>
          +    } else if (!(alloc_flag &
          <br>
          +                (KFD_IOC_ALLOC_MEM_FLAGS_DOORBELL |
          <br>
          +                 KFD_IOC_ALLOC_MEM_FLAGS_MMIO_REMAP))) {
          <br>
                    pr_err("%s: Invalid BO type %#x\n", __func__,
          alloc_flag);
          <br>
                    return -ENOMEM;
          <br>
                }
          <br>
          @@ -193,28 +178,18 @@ static int
          amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device *adev,
          <br>
            static void unreserve_mem_limit(struct amdgpu_device *adev,
          <br>
                    uint64_t size, u32 alloc_flag)
          <br>
            {
          <br>
          -    size_t acc_size;
          <br>
          -
          <br>
          -    acc_size = amdgpu_amdkfd_acc_size(size);
          <br>
          -
          <br>
                spin_lock(&kfd_mem_limit.mem_limit_lock);
          <br>
                  if (alloc_flag & KFD_IOC_ALLOC_MEM_FLAGS_GTT) {
          <br>
          -        kfd_mem_limit.system_mem_used -= (acc_size + size);
          <br>
          -        kfd_mem_limit.ttm_mem_used -= (acc_size + size);
          <br>
          +        kfd_mem_limit.system_mem_used -= size;
          <br>
          +        kfd_mem_limit.ttm_mem_used -= size;
          <br>
                } else if (alloc_flag &
          KFD_IOC_ALLOC_MEM_FLAGS_VRAM) {
          <br>
          -        kfd_mem_limit.system_mem_used -= acc_size;
          <br>
          -        kfd_mem_limit.ttm_mem_used -= acc_size;
          <br>
                    adev->kfd.vram_used -= size;
          <br>
                } else if (alloc_flag &
          KFD_IOC_ALLOC_MEM_FLAGS_USERPTR) {
          <br>
          -        kfd_mem_limit.system_mem_used -= (acc_size + size);
          <br>
          -        kfd_mem_limit.ttm_mem_used -= acc_size;
          <br>
          -    } else if (alloc_flag &
          <br>
          -           (KFD_IOC_ALLOC_MEM_FLAGS_DOORBELL |
          <br>
          -            KFD_IOC_ALLOC_MEM_FLAGS_MMIO_REMAP)) {
          <br>
          -        kfd_mem_limit.system_mem_used -= acc_size;
          <br>
          -        kfd_mem_limit.ttm_mem_used -= acc_size;
          <br>
          -    } else {
          <br>
          +        kfd_mem_limit.system_mem_used -= size;
          <br>
          +    } else if (!(alloc_flag &
          <br>
          +                (KFD_IOC_ALLOC_MEM_FLAGS_DOORBELL |
          <br>
          +                 KFD_IOC_ALLOC_MEM_FLAGS_MMIO_REMAP))) {
          <br>
                    pr_err("%s: Invalid BO type %#x\n", __func__,
          alloc_flag);
          <br>
                    goto release;
          <br>
                }
          <br>
        </blockquote>
      </blockquote>
    </blockquote>
  </body>
</html>