[PATCH v8 0/4] enable xgmi node migration support for hibernate on SRIOV

Deng, Emily Emily.Deng at amd.com
Mon May 26 01:55:53 UTC 2025


[AMD Official Use Only - AMD Internal Distribution Only]


@Koenig, Christian<mailto:Christian.Koenig at amd.com> and @Lazar, Lijo<mailto:Lijo.Lazar at amd.com>

Could you help review these changes again?



Best whishes

Emily Deng



>-----Original Message-----

>From: Samuel Zhang <guoqing.zhang at amd.com>

>Sent: Thursday, May 22, 2025 6:41 PM

>To: amd-gfx at lists.freedesktop.org

>Cc: Zhao, Victor <Victor.Zhao at amd.com>; Chang, HaiJun

><HaiJun.Chang at amd.com>; Zhang, GuoQing (Sam) <GuoQing.Zhang at amd.com>;

>Koenig, Christian <Christian.Koenig at amd.com>; Deucher, Alexander

><Alexander.Deucher at amd.com>; Zhang, Owen(SRDC) <Owen.Zhang2 at amd.com>;

>Ma, Qing (Mark) <Qing.Ma at amd.com>; Lazar, Lijo <Lijo.Lazar at amd.com>; Deng,

>Emily <Emily.Deng at amd.com>

>Subject: [PATCH v8 0/4] enable xgmi node migration support for hibernate on SRIOV

>

>On SRIOV and VM environment, customer may need to switch to new vGPU indexes

>after hibernate and then resume the VM. For GPUs with XGMI, `vram_start` will

>change in this case, the FB aperture gpu address of VRAM BOs will also change.

>These gpu addresses need to be updated when resume. But these addresses are all

>over the KMD codebase, updating each of them is error-prone and not acceptable.

>

>The solution is to use pdb0 page table to cover both vram and gart memory and use

>pdb0 virtual gpu address instead. When gpu indexes change, the virtual gpu address

>won't change.

>

>For psp and smu, pdb0's gpu address does not work, so the original FB aperture gpu

>address is used instead. They need to be updated when resume with changed

>vGPUs.

>

>v2:

>- remove physical_node_id_changed

>- set vram_start to 0 to switch cached gpu addr to gart aperture

>- cleanup pdb0 patch

>v3:

>- remove gmc_v9_0_init_sw_mem_ranges() call

>- remove vram_offset memeber

>- add 4 refactoring patch to remove cached gpu addr

>- cleanup pdb0 patch

>v4:

>- remove gmc_v9_0_mc_init() call and `refresh` update.

>- do not set `fb_start` in mmhub_v1_8_get_fb_location() when pdb0 enabled.

>v5:

>- add amdgpu_virt_xgmi_migrate_enabled() check

>- move vram_base_offset update to pdb0 patch

>- remove 4 refactoring patches to remove cached gpu addr

>- add patch to fix IH not working issue when resume with new VF

>v6: per Lijo feedback

>- rename amdgpu_device_update_xgmi_info() to amdgpu_virt_resume()

>- merge xgmi node and vram_base_offset update, IH fix into amdgpu_virt_resume()

>- remove 2 unnecessary gpu addr update changes

>v7: per Christian feedback

>- remove pdb0_enabled and add gmc_v9_0_is_pdb0_enabled()

>- remove amdgpu_gmc_vram_location() call in amdgpu_gmc_sysvm_location()

>- remove check in mmhub_v1_8_get_fb_location() and update fb_start/fb_end on

>resume

>v8:

>- use cached fb_start in amdgpu_bo_fb_aper_addr()

>- remove fb_start/fb_end update in amdgpu_virt_resume() and

>amdgpu_gmc_sysvm_location()

>- use vram_start to set regVM_CONTEXT0_PAGE_TABLE_START_ADDR_*

>- move check to the callsite of amdgpu_virt_resume()

>- add gmc.xgmi.node_segment_size check in amdgpu_virt_xgmi_migrate_enabled()

>- rename gmc_v9_0_is_pdb0_enabled() to amdgpu_gmc_is_pdb0_enabled()

>

>Samuel Zhang (4):

>  drm/amdgpu: update xgmi info and vram_base_offset on resume

>  drm/amdgpu: update GPU addresses for SMU and PSP

>  drm/amdgpu: enable pdb0 for hibernation on SRIOV

>  drm/amdgpu: fix fence fallback timer expired error

>

> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 38 ++++++++++++++++++++++

> drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c    | 28 ++++++++++++----

> drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c    |  2 +-

> drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h    |  1 +

> drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 20 ++++++++++++

>drivers/gpu/drm/amd/amdgpu/amdgpu_object.h |  1 +

> drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c    | 23 +++++++++++++

> drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c  |  3 ++

> drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h   |  7 ++++

> drivers/gpu/drm/amd/amdgpu/gfxhub_v1_2.c   |  8 +++--

> drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c      | 13 +++++---

> drivers/gpu/drm/amd/amdgpu/mmhub_v1_8.c    |  6 ++--

> drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c  | 18 ++++++++++

> 13 files changed, 151 insertions(+), 17 deletions(-)

>

>--

>2.43.5


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20250526/e3e0de5f/attachment-0001.htm>


More information about the amd-gfx mailing list