[PATCH v8 0/4] enable xgmi node migration support for hibernate on SRIOV
Zhang, Owen(SRDC)
Owen.Zhang2 at amd.com
Tue May 27 06:43:31 UTC 2025
[AMD Official Use Only - AMD Internal Distribution Only]
Ping @Lazar, Lijo<mailto:Lijo.Lazar at amd.com>, @Koenig, Christian<mailto:Christian.Koenig at amd.com>…
Kindly pls review the updated patch in advance and we can discuss your suggestions in tomorrow's meeting. Thanks for your great support.
Rgds/Owen
From: Deng, Emily <Emily.Deng at amd.com>
Sent: Monday, May 26, 2025 9:56 AM
To: Zhang, GuoQing (Sam) <GuoQing.Zhang at amd.com>; Koenig, Christian <Christian.Koenig at amd.com>; Lazar, Lijo <Lijo.Lazar at amd.com>; Deucher, Alexander <Alexander.Deucher at amd.com>
Cc: Zhao, Victor <Victor.Zhao at amd.com>; Chang, HaiJun <HaiJun.Chang at amd.com>; Zhang, GuoQing (Sam) <GuoQing.Zhang at amd.com>; Zhang, Owen(SRDC) <Owen.Zhang2 at amd.com>; Ma, Qing (Mark) <Qing.Ma at amd.com>; amd-gfx at lists.freedesktop.org
Subject: RE: [PATCH v8 0/4] enable xgmi node migration support for hibernate on SRIOV
[AMD Official Use Only - AMD Internal Distribution Only]
@Koenig, Christian<mailto:Christian.Koenig at amd.com> and @Lazar, Lijo<mailto:Lijo.Lazar at amd.com>
Could you help review these changes again?
Best whishes
Emily Deng
>-----Original Message-----
>From: Samuel Zhang <guoqing.zhang at amd.com<mailto:guoqing.zhang at amd.com>>
>Sent: Thursday, May 22, 2025 6:41 PM
>To: amd-gfx at lists.freedesktop.org<mailto:amd-gfx at lists.freedesktop.org>
>Cc: Zhao, Victor <Victor.Zhao at amd.com<mailto:Victor.Zhao at amd.com>>; Chang, HaiJun
><HaiJun.Chang at amd.com<mailto:HaiJun.Chang at amd.com>>; Zhang, GuoQing (Sam) <GuoQing.Zhang at amd.com<mailto:GuoQing.Zhang at amd.com>>;
>Koenig, Christian <Christian.Koenig at amd.com<mailto:Christian.Koenig at amd.com>>; Deucher, Alexander
><Alexander.Deucher at amd.com<mailto:Alexander.Deucher at amd.com>>; Zhang, Owen(SRDC) <Owen.Zhang2 at amd.com<mailto:Owen.Zhang2 at amd.com>>;
>Ma, Qing (Mark) <Qing.Ma at amd.com<mailto:Qing.Ma at amd.com>>; Lazar, Lijo <Lijo.Lazar at amd.com<mailto:Lijo.Lazar at amd.com>>; Deng,
>Emily <Emily.Deng at amd.com<mailto:Emily.Deng at amd.com>>
>Subject: [PATCH v8 0/4] enable xgmi node migration support for hibernate on SRIOV
>
>On SRIOV and VM environment, customer may need to switch to new vGPU indexes
>after hibernate and then resume the VM. For GPUs with XGMI, `vram_start` will
>change in this case, the FB aperture gpu address of VRAM BOs will also change.
>These gpu addresses need to be updated when resume. But these addresses are all
>over the KMD codebase, updating each of them is error-prone and not acceptable.
>
>The solution is to use pdb0 page table to cover both vram and gart memory and use
>pdb0 virtual gpu address instead. When gpu indexes change, the virtual gpu address
>won't change.
>
>For psp and smu, pdb0's gpu address does not work, so the original FB aperture gpu
>address is used instead. They need to be updated when resume with changed
>vGPUs.
>
>v2:
>- remove physical_node_id_changed
>- set vram_start to 0 to switch cached gpu addr to gart aperture
>- cleanup pdb0 patch
>v3:
>- remove gmc_v9_0_init_sw_mem_ranges() call
>- remove vram_offset memeber
>- add 4 refactoring patch to remove cached gpu addr
>- cleanup pdb0 patch
>v4:
>- remove gmc_v9_0_mc_init() call and `refresh` update.
>- do not set `fb_start` in mmhub_v1_8_get_fb_location() when pdb0 enabled.
>v5:
>- add amdgpu_virt_xgmi_migrate_enabled() check
>- move vram_base_offset update to pdb0 patch
>- remove 4 refactoring patches to remove cached gpu addr
>- add patch to fix IH not working issue when resume with new VF
>v6: per Lijo feedback
>- rename amdgpu_device_update_xgmi_info() to amdgpu_virt_resume()
>- merge xgmi node and vram_base_offset update, IH fix into amdgpu_virt_resume()
>- remove 2 unnecessary gpu addr update changes
>v7: per Christian feedback
>- remove pdb0_enabled and add gmc_v9_0_is_pdb0_enabled()
>- remove amdgpu_gmc_vram_location() call in amdgpu_gmc_sysvm_location()
>- remove check in mmhub_v1_8_get_fb_location() and update fb_start/fb_end on
>resume
>v8:
>- use cached fb_start in amdgpu_bo_fb_aper_addr()
>- remove fb_start/fb_end update in amdgpu_virt_resume() and
>amdgpu_gmc_sysvm_location()
>- use vram_start to set regVM_CONTEXT0_PAGE_TABLE_START_ADDR_*
>- move check to the callsite of amdgpu_virt_resume()
>- add gmc.xgmi.node_segment_size check in amdgpu_virt_xgmi_migrate_enabled()
>- rename gmc_v9_0_is_pdb0_enabled() to amdgpu_gmc_is_pdb0_enabled()
>
>Samuel Zhang (4):
> drm/amdgpu: update xgmi info and vram_base_offset on resume
> drm/amdgpu: update GPU addresses for SMU and PSP
> drm/amdgpu: enable pdb0 for hibernation on SRIOV
> drm/amdgpu: fix fence fallback timer expired error
>
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 38 ++++++++++++++++++++++
> drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 28 ++++++++++++----
> drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 2 +-
> drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h | 1 +
> drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 20 ++++++++++++
>drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 1 +
> drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 23 +++++++++++++
> drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c | 3 ++
> drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h | 7 ++++
> drivers/gpu/drm/amd/amdgpu/gfxhub_v1_2.c | 8 +++--
> drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 13 +++++---
> drivers/gpu/drm/amd/amdgpu/mmhub_v1_8.c | 6 ++--
> drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 18 ++++++++++
> 13 files changed, 151 insertions(+), 17 deletions(-)
>
>--
>2.43.5
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20250527/57a5c43f/attachment-0001.htm>
More information about the amd-gfx
mailing list