<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<p><br>
</p>
<div class="moz-cite-prefix">On 2023-08-11 21:39, Lazar, Lijo wrote:<br>
</div>
<blockquote type="cite" cite="mid:BYAPR12MB4614B9FB9A931ACD336212929711A@BYAPR12MB4614.namprd12.prod.outlook.com">
<p style="font-family:Arial;font-size:10pt;color:#0000FF;margin:5pt;font-style:normal;font-weight:normal;text-decoration:none;" align="Left">
[AMD Official Use Only - General]<br>
</p>
<br>
<div>
<div style="" dir="auto">A dynamic partition switch could happen
later. The switch could still be successful in terms of
hardware,</div>
</div>
</blockquote>
[JZ] Only ignore render node assignment, and remove visibility in
user space, xcp continues to be generated as usual. so switch should
work as usual<br>
<blockquote type="cite" cite="mid:BYAPR12MB4614B9FB9A931ACD336212929711A@BYAPR12MB4614.namprd12.prod.outlook.com">
<div>
<div style="" dir="auto"> and hence gives a false feeling of
success even if there are no render nodes available for any
app to make use of the partition<span style="font-size: 12pt;">.</span></div>
</div>
</blockquote>
[JZ] from driver prospective, the switch is real success, treat the
last one harvested in user space.. there is warning in kernel log,
and final solution for more than 64 nodes is on-going<br>
<blockquote type="cite" cite="mid:BYAPR12MB4614B9FB9A931ACD336212929711A@BYAPR12MB4614.namprd12.prod.outlook.com">
<div>
<div style="" dir="auto"><br>
</div>
<div style="" dir="auto">Also, a kfd node is not expe<span>cted
to have a valid xcp pointer on devices without partition.</span></div>
</div>
</blockquote>
[JZ] won't affect <span>xcp pointer, only </span><span>ddev. </span><span></span>
<blockquote type="cite" cite="mid:BYAPR12MB4614B9FB9A931ACD336212929711A@BYAPR12MB4614.namprd12.prod.outlook.com">
<div>
<div style="" dir="auto"><span> This access could break then
gpu->xcp->ddev.</span></div>
</div>
</blockquote>
[JZ] <span>added skip when ddev==NULL</span>
<blockquote type="cite" cite="mid:BYAPR12MB4614B9FB9A931ACD336212929711A@BYAPR12MB4614.namprd12.prod.outlook.com">
<div>
<div style="" dir="auto"><br>
</div>
<div id="ms-outlook-mobile-signature" dir="auto">Thanks,<br>
Lijo</div>
<hr style="display:inline-block;width:98%" tabindex="-1">
<div id="divRplyFwdMsg" dir="ltr"><font style="font-size:11pt" face="Calibri, sans-serif" color="#000000"><b>From:</b>
amd-gfx <a class="moz-txt-link-rfc2396E" href="mailto:amd-gfx-bounces@lists.freedesktop.org"><amd-gfx-bounces@lists.freedesktop.org></a> on
behalf of James Zhu <a class="moz-txt-link-rfc2396E" href="mailto:James.Zhu@amd.com"><James.Zhu@amd.com></a><br>
<b>Sent:</b> Saturday, August 12, 2023 2:36:27 AM<br>
<b>To:</b> <a class="moz-txt-link-abbreviated" href="mailto:amd-gfx@lists.freedesktop.org">amd-gfx@lists.freedesktop.org</a>
<a class="moz-txt-link-rfc2396E" href="mailto:amd-gfx@lists.freedesktop.org"><amd-gfx@lists.freedesktop.org></a><br>
<b>Cc:</b> Lin, Amber <a class="moz-txt-link-rfc2396E" href="mailto:Amber.Lin@amd.com"><Amber.Lin@amd.com></a>; Zhu, James
<a class="moz-txt-link-rfc2396E" href="mailto:James.Zhu@amd.com"><James.Zhu@amd.com></a>; Kasiviswanathan, Harish
<a class="moz-txt-link-rfc2396E" href="mailto:Harish.Kasiviswanathan@amd.com"><Harish.Kasiviswanathan@amd.com></a>; Koenig, Christian
<a class="moz-txt-link-rfc2396E" href="mailto:Christian.Koenig@amd.com"><Christian.Koenig@amd.com></a><br>
<b>Subject:</b> [PATCH v3] drm/amdgpu: skip xcp drm device
allocation when out of drm resource</font>
<div> </div>
</div>
<div class="BodyFragment"><font size="2"><span style="font-size:11pt;">
<div class="PlainText">Return 0 when drm device alloc
failed with -ENOSPC in<br>
order to allow amdgpu drive loading. But the xcp
without<br>
drm device node assigned won't be visiable in user
space.<br>
This helps amdgpu driver loading on system which has
more<br>
than 64 nodes, the current limitation.<br>
<br>
The proposal to add more drm nodes is discussed in
public,<br>
which will support up to 2^20 nodes totally.<br>
kernel drm:<br>
<a href="https://lore.kernel.org/lkml/20230724211428.3831636-1-michal.winiarski@intel.com/T/" moz-do-not-send="true" class="moz-txt-link-freetext">https://lore.kernel.org/lkml/20230724211428.3831636-1-michal.winiarski@intel.com/T/</a><br>
libdrm:<br>
<a href="https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/305" moz-do-not-send="true" class="moz-txt-link-freetext">https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/305</a><br>
<br>
Signed-off-by: James Zhu <a class="moz-txt-link-rfc2396E" href="mailto:James.Zhu@amd.com"><James.Zhu@amd.com></a><br>
Acked-by: Christian König
<a class="moz-txt-link-rfc2396E" href="mailto:christian.koenig@amd.com"><christian.koenig@amd.com></a><br>
<br>
-v2: added warning message<br>
-v3: use dev_warn<br>
---<br>
drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c | 13
++++++++++++-<br>
drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 10
+++++++++-<br>
2 files changed, 21 insertions(+), 2 deletions(-)<br>
<br>
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c<br>
index 9c9cca129498..565a1fa436d4 100644<br>
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c<br>
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c<br>
@@ -239,8 +239,13 @@ static int
amdgpu_xcp_dev_alloc(struct amdgpu_device *adev)<br>
<br>
for (i = 1; i < MAX_XCP; i++) {<br>
ret =
amdgpu_xcp_drm_dev_alloc(&p_ddev);<br>
- if (ret)<br>
+ if (ret == -ENOSPC) {<br>
+ dev_warn(adev->dev,<br>
+ "Skip xcp node #%d when out of
drm node resource.", i);<br>
+ return 0;<br>
+ } else if (ret) {<br>
return ret;<br>
+ }<br>
<br>
/* Redirect all IOCTLs to the primary
device */<br>
adev->xcp_mgr->xcp[i].rdev =
p_ddev->render->dev;<br>
@@ -328,6 +333,9 @@ int amdgpu_xcp_dev_register(struct
amdgpu_device *adev,<br>
return 0;<br>
<br>
for (i = 1; i < MAX_XCP; i++) {<br>
+ if (!adev->xcp_mgr->xcp[i].ddev)<br>
+ break;<br>
+<br>
ret =
drm_dev_register(adev->xcp_mgr->xcp[i].ddev,
ent->driver_data);<br>
if (ret)<br>
return ret;<br>
@@ -345,6 +353,9 @@ void amdgpu_xcp_dev_unplug(struct
amdgpu_device *adev)<br>
return;<br>
<br>
for (i = 1; i < MAX_XCP; i++) {<br>
+ if (!adev->xcp_mgr->xcp[i].ddev)<br>
+ break;<br>
+<br>
p_ddev =
adev->xcp_mgr->xcp[i].ddev;<br>
drm_dev_unplug(p_ddev);<br>
p_ddev->render->dev =
adev->xcp_mgr->xcp[i].rdev;<br>
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c<br>
index 3b0749390388..310df98ba46a 100644<br>
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c<br>
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c<br>
@@ -1969,8 +1969,16 @@ int
kfd_topology_add_device(struct kfd_node *gpu)<br>
int i;<br>
const char *asic_name =
amdgpu_asic_name[gpu->adev->asic_type];<br>
<br>
+<br>
gpu_id = kfd_generate_gpu_id(gpu);<br>
- pr_debug("Adding new GPU (ID: 0x%x) to
topology\n", gpu_id);<br>
+ if (!gpu->xcp->ddev) {<br>
+ dev_warn(gpu->adev->dev,<br>
+ "Won't add GPU (ID: 0x%x) to topology
since it has no drm node assigned.",<br>
+ gpu_id);<br>
+ return 0;<br>
+ } else {<br>
+ pr_debug("Adding new GPU (ID: 0x%x) to
topology\n", gpu_id);<br>
+ }<br>
<br>
/* Check to see if this gpu device exists in
the topology_device_list.<br>
* If so, assign the gpu to that device,<br>
-- <br>
2.34.1<br>
<br>
</div>
</span></font></div>
</div>
</blockquote>
</body>
</html>