<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;
mso-ligatures:none;}
@page WordSection1
{size:612.0pt 792.0pt;
margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
{page:WordSection1;}
--></style>
</head>
<body lang="en-CN" link="blue" vlink="purple" style="word-wrap:break-word">
<div class="WordSection1">
<div>
<p class="MsoNormal"><br>
On 2025/5/8 17:27, Christian König wrote:<br>
><br>
> On 5/8/25 07:09, Samuel Zhang wrote:<br>
>> For virtual machine with vGPUs in SRIOV single device mode and XGMI<br>
>> is enabled, XGMI physical node ids may change when waking up from<br>
>> hiberation with different vGPU devices. So update XGMI info on resume.<br>
>><br>
>> Signed-off-by: Jiang Liu <gerry@linux.alibaba.com><br>
>> Signed-off-by: Samuel Zhang <guoqing.zhang@amd.com><br>
>> ---<br>
>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 24 ++++++++++++++++++++++<br>
>> drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h | 4 ++++<br>
>> drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 6 ++++++<br>
>> 3 files changed, 34 insertions(+)<br>
>><br>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c<br>
>> index d477a901af84..843a3b0a9a07 100644<br>
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c<br>
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c<br>
>> @@ -4478,6 +4478,7 @@ int amdgpu_device_init(struct amdgpu_device *adev,<br>
>> r = adev->gfxhub.funcs->get_xgmi_info(adev);<br>
>> if (r)<br>
>> return r;<br>
>> + adev->gmc.xgmi.prev_physical_node_id = adev->gmc.xgmi.physical_node_id;<br>
>> }<br>
>> <br>
>> /* enable PCIE atomic ops */<br>
>> @@ -5040,6 +5041,26 @@ int amdgpu_device_suspend(struct drm_device *dev, bool notify_clients)<br>
>> return 0;<br>
>> }<br>
>> <br>
>> +static int amdgpu_device_update_xgmi_info(struct amdgpu_device *adev)<br>
>> +{<br>
>> + int r;<br>
>> +<br>
>> + /* Get xgmi info again for sriov to detect device changes */<br>
>> + if (amdgpu_sriov_vf(adev) &&<br>
>> + !(adev->flags & AMD_IS_APU) &&<br>
>> + adev->gmc.xgmi.supported &&<br>
>> + !adev->gmc.xgmi.connected_to_cpu) {<br>
>> + adev->gmc.xgmi.prev_physical_node_id = adev->gmc.xgmi.physical_node_id;<br>
>> + r = adev->gfxhub.funcs->get_xgmi_info(adev);<br>
>> + if (r)<br>
>> + return r;<br>
>> +<br>
>> + dev_info(adev->dev, "xgmi node, old id %d, new id %d\n",<br>
>> + adev->gmc.xgmi.prev_physical_node_id, adev->gmc.xgmi.physical_node_id);<br>
>> + }<br>
>> + return 0;<br>
>> +}<br>
>> +<br>
>> /**<br>
>> * amdgpu_device_resume - initiate device resume<br>
>> *<br>
>> @@ -5059,6 +5080,9 @@ int amdgpu_device_resume(struct drm_device *dev, bool notify_clients)<br>
>> r = amdgpu_virt_request_full_gpu(adev, true);<br>
>> if (r)<br>
>> return r;<br>
>> + r = amdgpu_device_update_xgmi_info(adev);<br>
>> + if (r)<br>
>> + return r;<br>
>> }<br>
>> <br>
>> if (dev->switch_power_state == DRM_SWITCH_POWER_OFF)<br>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h<br>
>> index 32dabba4062f..1387901576f1 100644<br>
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h<br>
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h<br>
>> @@ -89,6 +89,7 @@ struct amdgpu_xgmi {<br>
>> u64 node_segment_size;<br>
>> /* physical node (0-3) */<br>
>> unsigned physical_node_id;<br>
>> + unsigned prev_physical_node_id;<br>
>> /* number of nodes (0-4) */<br>
>> unsigned num_physical_nodes;<br>
>> /* gpu list in the same hive */<br>
>> @@ -101,6 +102,9 @@ struct amdgpu_xgmi {<br>
>> uint8_t max_width;<br>
>> };<br>
>> <br>
>> +#define amdgpu_xmgi_is_node_changed(adev) \<br>
>> + (adev->gmc.xgmi.prev_physical_node_id != adev->gmc.xgmi.physical_node_id)<br>
><br>
> Please drop that function and the related checks.<br>
><br>
> If this is necessary we want to update the relevant parameters all the time and not just when something changed.<br>
<br>
<br>
Hi @Lazar, Lijo <<a href="mailto:Lijo.Lazar@amd.com">mailto:Lijo.Lazar@amd.com</a>> and @Koenig, Christian
<br>
<<a href="mailto:Christian.Koenig@amd.com">mailto:Christian.Koenig@amd.com</a>>, you have different opinion on if the
<br>
check is needed. Could you align on this? Thank you!<br>
<br>
Regards<br>
Sam<br>
<br>
<br>
><br>
> Regards,<br>
> Christian.<br>
><br>
>> +<br>
>> struct amdgpu_hive_info *amdgpu_get_xgmi_hive(struct amdgpu_device *adev);<br>
>> void amdgpu_put_xgmi_hive(struct amdgpu_hive_info *hive);<br>
>> int amdgpu_xgmi_update_topology(struct amdgpu_hive_info *hive, struct amdgpu_device *adev);<br>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c<br>
>> index 59385da80185..7c0ca2721eb3 100644<br>
>> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c<br>
>> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c<br>
>> @@ -2533,6 +2533,12 @@ static int gmc_v9_0_resume(struct amdgpu_ip_block *ip_block)<br>
>> struct amdgpu_device *adev = ip_block->adev;<br>
>> int r;<br>
>> <br>
>> + if (amdgpu_xmgi_is_node_changed(adev)) {<br>
>> + adev->vm_manager.vram_base_offset = adev->gfxhub.funcs->get_mc_fb_offset(adev);<br>
>> + adev->vm_manager.vram_base_offset +=<br>
>> + adev->gmc.xgmi.physical_node_id * adev->gmc.xgmi.node_segment_size;<br>
>> + }<br>
>> +<br>
>> /* If a reset is done for NPS mode switch, read the memory range<br>
>> * information again.<br>
>> */<o:p></o:p></p>
</div>
</div>
</body>
</html>