<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
</head>
<body>
<p style="font-family:Arial;font-size:10pt;color:#0000FF;margin:5pt;font-style:normal;font-weight:normal;text-decoration:none;" align="Left">
[AMD Official Use Only - General]<br>
</p>
<br>
<div>
<div style="" dir="auto">A dynamic partition switch could happen later. The switch could still be successful in terms of hardware, and hence gives a false feeling of success even if there are no render nodes available for any app to make use of the partition<span style="font-size: 12pt;">.</span></div>
<div style="" dir="auto"><br>
</div>
<div style="" dir="auto">Also, a kfd node is not expe<span>cted to have a valid xcp pointer on devices without partition. This access could break then gpu->xcp->ddev.</span></div>
<div style="" dir="auto"><br>
</div>
<div id="ms-outlook-mobile-signature" dir="auto">Thanks,<br>
Lijo</div>
<hr style="display:inline-block;width:98%" tabindex="-1">
<div id="divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>From:</b> amd-gfx <amd-gfx-bounces@lists.freedesktop.org> on behalf of James Zhu <James.Zhu@amd.com><br>
<b>Sent:</b> Saturday, August 12, 2023 2:36:27 AM<br>
<b>To:</b> amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org><br>
<b>Cc:</b> Lin, Amber <Amber.Lin@amd.com>; Zhu, James <James.Zhu@amd.com>; Kasiviswanathan, Harish <Harish.Kasiviswanathan@amd.com>; Koenig, Christian <Christian.Koenig@amd.com><br>
<b>Subject:</b> [PATCH v3] drm/amdgpu: skip xcp drm device allocation when out of drm resource</font>
<div> </div>
</div>
<div class="BodyFragment"><font size="2"><span style="font-size:11pt;">
<div class="PlainText">Return 0 when drm device alloc failed with -ENOSPC in<br>
order to allow amdgpu drive loading. But the xcp without<br>
drm device node assigned won't be visiable in user space.<br>
This helps amdgpu driver loading on system which has more<br>
than 64 nodes, the current limitation.<br>
<br>
The proposal to add more drm nodes is discussed in public,<br>
which will support up to 2^20 nodes totally.<br>
kernel drm:<br>
<a href="https://lore.kernel.org/lkml/20230724211428.3831636-1-michal.winiarski@intel.com/T/">https://lore.kernel.org/lkml/20230724211428.3831636-1-michal.winiarski@intel.com/T/</a><br>
libdrm:<br>
<a href="https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/305">https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/305</a><br>
<br>
Signed-off-by: James Zhu <James.Zhu@amd.com><br>
Acked-by: Christian König <christian.koenig@amd.com><br>
<br>
-v2: added warning message<br>
-v3: use dev_warn<br>
---<br>
drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c | 13 ++++++++++++-<br>
drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 10 +++++++++-<br>
2 files changed, 21 insertions(+), 2 deletions(-)<br>
<br>
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c<br>
index 9c9cca129498..565a1fa436d4 100644<br>
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c<br>
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c<br>
@@ -239,8 +239,13 @@ static int amdgpu_xcp_dev_alloc(struct amdgpu_device *adev)<br>
<br>
for (i = 1; i < MAX_XCP; i++) {<br>
ret = amdgpu_xcp_drm_dev_alloc(&p_ddev);<br>
- if (ret)<br>
+ if (ret == -ENOSPC) {<br>
+ dev_warn(adev->dev,<br>
+ "Skip xcp node #%d when out of drm node resource.", i);<br>
+ return 0;<br>
+ } else if (ret) {<br>
return ret;<br>
+ }<br>
<br>
/* Redirect all IOCTLs to the primary device */<br>
adev->xcp_mgr->xcp[i].rdev = p_ddev->render->dev;<br>
@@ -328,6 +333,9 @@ int amdgpu_xcp_dev_register(struct amdgpu_device *adev,<br>
return 0;<br>
<br>
for (i = 1; i < MAX_XCP; i++) {<br>
+ if (!adev->xcp_mgr->xcp[i].ddev)<br>
+ break;<br>
+<br>
ret = drm_dev_register(adev->xcp_mgr->xcp[i].ddev, ent->driver_data);<br>
if (ret)<br>
return ret;<br>
@@ -345,6 +353,9 @@ void amdgpu_xcp_dev_unplug(struct amdgpu_device *adev)<br>
return;<br>
<br>
for (i = 1; i < MAX_XCP; i++) {<br>
+ if (!adev->xcp_mgr->xcp[i].ddev)<br>
+ break;<br>
+<br>
p_ddev = adev->xcp_mgr->xcp[i].ddev;<br>
drm_dev_unplug(p_ddev);<br>
p_ddev->render->dev = adev->xcp_mgr->xcp[i].rdev;<br>
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c<br>
index 3b0749390388..310df98ba46a 100644<br>
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c<br>
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c<br>
@@ -1969,8 +1969,16 @@ int kfd_topology_add_device(struct kfd_node *gpu)<br>
int i;<br>
const char *asic_name = amdgpu_asic_name[gpu->adev->asic_type];<br>
<br>
+<br>
gpu_id = kfd_generate_gpu_id(gpu);<br>
- pr_debug("Adding new GPU (ID: 0x%x) to topology\n", gpu_id);<br>
+ if (!gpu->xcp->ddev) {<br>
+ dev_warn(gpu->adev->dev,<br>
+ "Won't add GPU (ID: 0x%x) to topology since it has no drm node assigned.",<br>
+ gpu_id);<br>
+ return 0;<br>
+ } else {<br>
+ pr_debug("Adding new GPU (ID: 0x%x) to topology\n", gpu_id);<br>
+ }<br>
<br>
/* Check to see if this gpu device exists in the topology_device_list.<br>
* If so, assign the gpu to that device,<br>
-- <br>
2.34.1<br>
<br>
</div>
</span></font></div>
</div>
</body>
</html>