<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body>
    <p><br>
    </p>
    <div class="moz-cite-prefix">On 2023-08-11 21:39, Lazar, Lijo wrote:<br>
    </div>
    <blockquote type="cite" cite="mid:BYAPR12MB4614B9FB9A931ACD336212929711A@BYAPR12MB4614.namprd12.prod.outlook.com">
      
      <p style="font-family:Arial;font-size:10pt;color:#0000FF;margin:5pt;font-style:normal;font-weight:normal;text-decoration:none;" align="Left">
        [AMD Official Use Only - General]<br>
      </p>
      <br>
      <div>
        <div style="" dir="auto">A dynamic partition switch could happen
          later.  The switch could still be successful in terms of
          hardware,</div>
      </div>
    </blockquote>
    [JZ] Only ignore render node assignment, and remove visibility in
    user space, xcp continues to be generated as usual. so switch should
    work as usual<br>
    <blockquote type="cite" cite="mid:BYAPR12MB4614B9FB9A931ACD336212929711A@BYAPR12MB4614.namprd12.prod.outlook.com">
      <div>
        <div style="" dir="auto"> and hence gives a false feeling of
          success even if there are no render nodes available for any
          app to make use of the partition<span style="font-size: 12pt;">.</span></div>
      </div>
    </blockquote>
    [JZ] from driver prospective, the switch is real success, treat the
    last one harvested in user space.. there is warning in kernel log,
    and final solution for more than 64 nodes is on-going<br>
    <blockquote type="cite" cite="mid:BYAPR12MB4614B9FB9A931ACD336212929711A@BYAPR12MB4614.namprd12.prod.outlook.com">
      <div>
        <div style="" dir="auto"><br>
        </div>
        <div style="" dir="auto">Also, a kfd node is not expe<span>cted
            to have a valid xcp pointer on devices without partition.</span></div>
      </div>
    </blockquote>
    [JZ] won't affect <span>xcp pointer, only  </span><span>ddev. </span><span></span>
    <blockquote type="cite" cite="mid:BYAPR12MB4614B9FB9A931ACD336212929711A@BYAPR12MB4614.namprd12.prod.outlook.com">
      <div>
        <div style="" dir="auto"><span> This access could break then
            gpu->xcp->ddev.</span></div>
      </div>
    </blockquote>
    [JZ] <span>added skip when ddev==NULL</span>
    <blockquote type="cite" cite="mid:BYAPR12MB4614B9FB9A931ACD336212929711A@BYAPR12MB4614.namprd12.prod.outlook.com">
      <div>
        <div style="" dir="auto"><br>
        </div>
        <div id="ms-outlook-mobile-signature" dir="auto">Thanks,<br>
          Lijo</div>
        <hr style="display:inline-block;width:98%" tabindex="-1">
        <div id="divRplyFwdMsg" dir="ltr"><font style="font-size:11pt" face="Calibri, sans-serif" color="#000000"><b>From:</b>
            amd-gfx <a class="moz-txt-link-rfc2396E" href="mailto:amd-gfx-bounces@lists.freedesktop.org"><amd-gfx-bounces@lists.freedesktop.org></a> on
            behalf of James Zhu <a class="moz-txt-link-rfc2396E" href="mailto:James.Zhu@amd.com"><James.Zhu@amd.com></a><br>
            <b>Sent:</b> Saturday, August 12, 2023 2:36:27 AM<br>
            <b>To:</b> <a class="moz-txt-link-abbreviated" href="mailto:amd-gfx@lists.freedesktop.org">amd-gfx@lists.freedesktop.org</a>
            <a class="moz-txt-link-rfc2396E" href="mailto:amd-gfx@lists.freedesktop.org"><amd-gfx@lists.freedesktop.org></a><br>
            <b>Cc:</b> Lin, Amber <a class="moz-txt-link-rfc2396E" href="mailto:Amber.Lin@amd.com"><Amber.Lin@amd.com></a>; Zhu, James
            <a class="moz-txt-link-rfc2396E" href="mailto:James.Zhu@amd.com"><James.Zhu@amd.com></a>; Kasiviswanathan, Harish
            <a class="moz-txt-link-rfc2396E" href="mailto:Harish.Kasiviswanathan@amd.com"><Harish.Kasiviswanathan@amd.com></a>; Koenig, Christian
            <a class="moz-txt-link-rfc2396E" href="mailto:Christian.Koenig@amd.com"><Christian.Koenig@amd.com></a><br>
            <b>Subject:</b> [PATCH v3] drm/amdgpu: skip xcp drm device
            allocation when out of drm resource</font>
          <div> </div>
        </div>
        <div class="BodyFragment"><font size="2"><span style="font-size:11pt;">
              <div class="PlainText">Return 0 when drm device alloc
                failed with -ENOSPC in<br>
                order to  allow amdgpu drive loading. But the xcp
                without<br>
                drm device node assigned won't be visiable in user
                space.<br>
                This helps amdgpu driver loading on system which has
                more<br>
                than 64 nodes, the current limitation.<br>
                <br>
                The proposal to add more drm nodes is discussed in
                public,<br>
                which will support up to 2^20 nodes totally.<br>
                kernel drm:<br>
                <a href="https://lore.kernel.org/lkml/20230724211428.3831636-1-michal.winiarski@intel.com/T/" moz-do-not-send="true" class="moz-txt-link-freetext">https://lore.kernel.org/lkml/20230724211428.3831636-1-michal.winiarski@intel.com/T/</a><br>
                libdrm:<br>
                <a href="https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/305" moz-do-not-send="true" class="moz-txt-link-freetext">https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/305</a><br>
                <br>
                Signed-off-by: James Zhu <a class="moz-txt-link-rfc2396E" href="mailto:James.Zhu@amd.com"><James.Zhu@amd.com></a><br>
                Acked-by: Christian König
                <a class="moz-txt-link-rfc2396E" href="mailto:christian.koenig@amd.com"><christian.koenig@amd.com></a><br>
                <br>
                -v2: added warning message<br>
                -v3: use dev_warn<br>
                ---<br>
                 drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c   | 13
                ++++++++++++-<br>
                 drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 10
                +++++++++-<br>
                 2 files changed, 21 insertions(+), 2 deletions(-)<br>
                <br>
                diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c
                b/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c<br>
                index 9c9cca129498..565a1fa436d4 100644<br>
                --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c<br>
                +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c<br>
                @@ -239,8 +239,13 @@ static int
                amdgpu_xcp_dev_alloc(struct amdgpu_device *adev)<br>
                 <br>
                         for (i = 1; i < MAX_XCP; i++) {<br>
                                 ret =
                amdgpu_xcp_drm_dev_alloc(&p_ddev);<br>
                -               if (ret)<br>
                +               if (ret == -ENOSPC) {<br>
                +                       dev_warn(adev->dev,<br>
                +                       "Skip xcp node #%d when out of
                drm node resource.", i);<br>
                +                       return 0;<br>
                +               } else if (ret) {<br>
                                         return ret;<br>
                +               }<br>
                 <br>
                                 /* Redirect all IOCTLs to the primary
                device */<br>
                                 adev->xcp_mgr->xcp[i].rdev =
                p_ddev->render->dev;<br>
                @@ -328,6 +333,9 @@ int amdgpu_xcp_dev_register(struct
                amdgpu_device *adev,<br>
                                 return 0;<br>
                 <br>
                         for (i = 1; i < MAX_XCP; i++) {<br>
                +               if (!adev->xcp_mgr->xcp[i].ddev)<br>
                +                       break;<br>
                +<br>
                                 ret =
                drm_dev_register(adev->xcp_mgr->xcp[i].ddev,
                ent->driver_data);<br>
                                 if (ret)<br>
                                         return ret;<br>
                @@ -345,6 +353,9 @@ void amdgpu_xcp_dev_unplug(struct
                amdgpu_device *adev)<br>
                                 return;<br>
                 <br>
                         for (i = 1; i < MAX_XCP; i++) {<br>
                +               if (!adev->xcp_mgr->xcp[i].ddev)<br>
                +                       break;<br>
                +<br>
                                 p_ddev =
                adev->xcp_mgr->xcp[i].ddev;<br>
                                 drm_dev_unplug(p_ddev);<br>
                                 p_ddev->render->dev =
                adev->xcp_mgr->xcp[i].rdev;<br>
                diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
                b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c<br>
                index 3b0749390388..310df98ba46a 100644<br>
                --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c<br>
                +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c<br>
                @@ -1969,8 +1969,16 @@ int
                kfd_topology_add_device(struct kfd_node *gpu)<br>
                         int i;<br>
                         const char *asic_name =
                amdgpu_asic_name[gpu->adev->asic_type];<br>
                 <br>
                +<br>
                         gpu_id = kfd_generate_gpu_id(gpu);<br>
                -       pr_debug("Adding new GPU (ID: 0x%x) to
                topology\n", gpu_id);<br>
                +       if (!gpu->xcp->ddev) {<br>
                +               dev_warn(gpu->adev->dev,<br>
                +               "Won't add GPU (ID: 0x%x) to topology
                since it has no drm node assigned.",<br>
                +               gpu_id);<br>
                +               return 0;<br>
                +       } else {<br>
                +               pr_debug("Adding new GPU (ID: 0x%x) to
                topology\n", gpu_id);<br>
                +       }<br>
                 <br>
                         /* Check to see if this gpu device exists in
                the topology_device_list.<br>
                          * If so, assign the gpu to that device,<br>
                -- <br>
                2.34.1<br>
                <br>
              </div>
            </span></font></div>
      </div>
    </blockquote>
  </body>
</html>