<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
        {font-family:"Cambria Math";
        panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0cm;
        font-size:11.0pt;
        font-family:"Calibri",sans-serif;}
.MsoChpDefault
        {mso-style-type:export-only;
        font-size:10.0pt;
        mso-ligatures:none;}
@page WordSection1
        {size:612.0pt 792.0pt;
        margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
        {page:WordSection1;}
--></style>
</head>
<body lang="en-CN" link="#467886" vlink="#96607D" style="word-wrap:break-word">
<div class="WordSection1">
<div>
<p class="MsoNormal"><br>
On 2025/5/19 21:57, Christian König wrote:<br>
> On 5/19/25 10:20, Samuel Zhang wrote:<br>
>> When switching to new GPU index after hibernation and then resume,<br>
>> VRAM offset of each VRAM BO will be changed, and the cached gpu<br>
>> addresses needed to updated.<br>
>><br>
>> This is to enable pdb0 and switch to use pdb0-based virtual gpu<br>
>> address by default in amdgpu_bo_create_reserved(). since the virtual<br>
>> addresses do not change, this can avoid the need to update all<br>
>> cached gpu addresses all over the codebase.<br>
>><br>
>> Signed-off-by: Emily Deng <Emily.Deng@amd.com><br>
>> Signed-off-by: Samuel Zhang <guoqing.zhang@amd.com><br>
>> ---<br>
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c  | 32 ++++++++++++++++++------<br>
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h  |  1 +<br>
>>   drivers/gpu/drm/amd/amdgpu/gfxhub_v1_2.c |  2 +-<br>
>>   drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c    | 10 +++++---<br>
>>   drivers/gpu/drm/amd/amdgpu/mmhub_v1_8.c  |  6 +++--<br>
>>   5 files changed, 37 insertions(+), 14 deletions(-)<br>
>><br>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c<br>
>> index d1fa5e8e3937..265d6c777af5 100644<br>
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c<br>
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c<br>
>> @@ -38,6 +38,8 @@<br>
>>   #include <drm/drm_drv.h><br>
>>   #include <drm/ttm/ttm_tt.h><br>
>>   <br>
>> +static const u64 four_gb = 0x100000000ULL;<br>
>> +<br>
>>   /**<br>
>>    * amdgpu_gmc_pdb0_alloc - allocate vram for pdb0<br>
>>    *<br>
>> @@ -249,15 +251,24 @@ void amdgpu_gmc_sysvm_location(struct amdgpu_device *adev, struct amdgpu_gmc *mc<br>
>>   {<br>
>>       u64 hive_vram_start = 0;<br>
>>       u64 hive_vram_end = mc->xgmi.node_segment_size * mc->xgmi.num_physical_nodes - 1;<br>
>> -    mc->vram_start = mc->xgmi.node_segment_size * mc->xgmi.physical_node_id;<br>
>> -    mc->vram_end = mc->vram_start + mc->xgmi.node_segment_size - 1;<br>
>> -    mc->gart_start = hive_vram_end + 1;<br>
>> +<br>
>> +    if (amdgpu_virt_xgmi_migrate_enabled(adev)) {<br>
>> +            /* set mc->vram_start to 0 to switch the returned GPU address of<br>
>> +             * amdgpu_bo_create_reserved() from FB aperture to GART aperture.<br>
>> +             */<br>
>> +            amdgpu_gmc_vram_location(adev, mc, 0);<br>
> This function does a lot more than just setting mc->vram_start and mc->vram_end.<br>
><br>
> You should probably just update the two setting and not call amdgpu_gmc_vram_location() at all.<br>
<br>
I tried only setting mc->vram_start and mc->vram_end. But KMD load will <br>
fail with following error logs.<br>
<br>
[  329.314346] amdgpu 0000:09:00.0: amdgpu: VRAM: 196288M <br>
0x0000000000000000 - 0x0000002FEBFFFFFF (196288M used)<br>
[  329.314348] amdgpu 0000:09:00.0: amdgpu: GART: 512M <br>
0x0000018000000000 - 0x000001801FFFFFFF<br>
[  329.314385] [drm] Detected VRAM RAM=196288M, BAR=262144M<br>
[  329.314386] [drm] RAM width 8192bits HBM<br>
[  329.314546] amdgpu 0000:09:00.0: amdgpu: (-22) failed to allocate <br>
kernel bo<br>
[  329.315013] [drm:amdgpu_device_init [amdgpu]] *ERROR* sw_init of IP <br>
block <gmc_v9_0> failed -22<br>
[  329.315846] amdgpu 0000:09:00.0: amdgpu: amdgpu_device_ip_init failed<br>
<br>
<br>
It seems like setting mc->visible_vram_size and mc->visible_vram_size <br>
fields are also needed. In this case call amdgpu_gmc_vram_location() is <br>
better than inline the logic, I think.<br>
<br>
<br>
><br>
>> +    } else {<br>
>> +            mc->vram_start = mc->xgmi.node_segment_size * mc->xgmi.physical_node_id;<br>
>> +            mc->vram_end = mc->vram_start + mc->xgmi.node_segment_size - 1;<br>
>> +            dev_info(adev->dev, "VRAM: %lluM 0x%016llX - 0x%016llX (%lluM used)\n",<br>
>> +                            mc->mc_vram_size >> 20, mc->vram_start,<br>
>> +                            mc->vram_end, mc->real_vram_size >> 20);<br>
>> +    }<br>
>> +    /* node_segment_size may not 4GB aligned on SRIOV, align up is needed. */<br>
>> +    mc->gart_start = ALIGN(hive_vram_end + 1, four_gb);<br>
>>       mc->gart_end = mc->gart_start + mc->gart_size - 1;<br>
>>       mc->fb_start = hive_vram_start;<br>
>>       mc->fb_end = hive_vram_end;<br>
>> -    dev_info(adev->dev, "VRAM: %lluM 0x%016llX - 0x%016llX (%lluM used)\n",<br>
>> -                    mc->mc_vram_size >> 20, mc->vram_start,<br>
>> -                    mc->vram_end, mc->real_vram_size >> 20);<br>
>>       dev_info(adev->dev, "GART: %lluM 0x%016llX - 0x%016llX\n",<br>
>>                       mc->gart_size >> 20, mc->gart_start, mc->gart_end);<br>
>>   }<br>
>> @@ -276,7 +287,6 @@ void amdgpu_gmc_sysvm_location(struct amdgpu_device *adev, struct amdgpu_gmc *mc<br>
>>   void amdgpu_gmc_gart_location(struct amdgpu_device *adev, struct amdgpu_gmc *mc,<br>
>>                             enum amdgpu_gart_placement gart_placement)<br>
>>   {<br>
>> -    const uint64_t four_gb = 0x100000000ULL;<br>
>>       u64 size_af, size_bf;<br>
>>       /*To avoid the hole, limit the max mc address to AMDGPU_GMC_HOLE_START*/<br>
>>       u64 max_mc_address = min(adev->gmc.mc_mask, AMDGPU_GMC_HOLE_START - 1);<br>
>> @@ -1068,6 +1078,14 @@ void amdgpu_gmc_init_pdb0(struct amdgpu_device *adev)<br>
>>       flags |= AMDGPU_PTE_FRAG((adev->gmc.vmid0_page_table_block_size + 9*1));<br>
>>       flags |= AMDGPU_PDE_PTE_FLAG(adev);<br>
>>   <br>
>> +    if (amdgpu_virt_xgmi_migrate_enabled(adev)) {<br>
>> +            /* always start from current device so that the GART address can keep<br>
>> +             * consistent when hibernate-resume with different GPUs.<br>
>> +             */<br>
>> +            vram_addr = adev->vm_manager.vram_base_offset;<br>
>> +            vram_end = vram_addr + vram_size;<br>
>> +    }<br>
>> +<br>
>>       /* The first n PDE0 entries are used as PTE,<br>
>>        * pointing to vram<br>
>>        */<br>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h<br>
>> index bd7fc123b8f9..46fac7ca7dfa 100644<br>
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h<br>
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h<br>
>> @@ -307,6 +307,7 @@ struct amdgpu_gmc {<br>
>>       struct amdgpu_bo                *pdb0_bo;<br>
>>       /* CPU kmapped address of pdb0*/<br>
>>       void                            *ptr_pdb0;<br>
>> +    bool pdb0_enabled;<br>
> This isn't needed, just always check (adev->gmc.xgmi.connected_to_cpu || amdgpu_virt_xgmi_migrate_enabled(adev)), make a function for that if necessary.<br>
<br>
Ok, I will update it in the next patch version.<br>
<br>
<br>
><br>
>>   <br>
>>       /* MALL size */<br>
>>       u64 mall_size;<br>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_2.c b/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_2.c<br>
>> index cb25f7f0dfc1..e6165f6d0763 100644<br>
>> --- a/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_2.c<br>
>> +++ b/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_2.c<br>
>> @@ -180,7 +180,7 @@ gfxhub_v1_2_xcc_init_system_aperture_regs(struct amdgpu_device *adev,<br>
>>               /* In the case squeezing vram into GART aperture, we don't use<br>
>>                * FB aperture and AGP aperture. Disable them.<br>
>>                */<br>
>> -            if (adev->gmc.pdb0_bo) {<br>
>> +            if (adev->gmc.pdb0_bo && !amdgpu_virt_xgmi_migrate_enabled(adev)) {<br>
>>                       WREG32_SOC15(GC, GET_INST(GC, i), regMC_VM_FB_LOCATION_TOP, 0);<br>
>>                       WREG32_SOC15(GC, GET_INST(GC, i), regMC_VM_FB_LOCATION_BASE, 0x00FFFFFF);<br>
>>                       WREG32_SOC15(GC, GET_INST(GC, i), regMC_VM_AGP_TOP, 0);<br>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c<br>
>> index 59385da80185..04fb99c64b37 100644<br>
>> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c<br>
>> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c<br>
>> @@ -1682,6 +1682,8 @@ static int gmc_v9_0_early_init(struct amdgpu_ip_block *ip_block)<br>
>>               adev->gmc.private_aperture_start + (4ULL << 30) - 1;<br>
>>       adev->gmc.noretry_flags = AMDGPU_VM_NORETRY_FLAGS_TF;<br>
>>   <br>
>> +    adev->gmc.pdb0_enabled = adev->gmc.xgmi.connected_to_cpu ||<br>
>> +            amdgpu_virt_xgmi_migrate_enabled(adev);<br>
>>       return 0;<br>
>>   }<br>
>>   <br>
>> @@ -1726,7 +1728,7 @@ static void gmc_v9_0_vram_gtt_location(struct amdgpu_device *adev,<br>
>>   <br>
>>       /* add the xgmi offset of the physical node */<br>
>>       base += adev->gmc.xgmi.physical_node_id * adev->gmc.xgmi.node_segment_size;<br>
>> -    if (adev->gmc.xgmi.connected_to_cpu) {<br>
>> +    if (adev->gmc.pdb0_enabled) {<br>
>>               amdgpu_gmc_sysvm_location(adev, mc);<br>
>>       } else {<br>
>>               amdgpu_gmc_vram_location(adev, mc, base);<br>
>> @@ -1841,7 +1843,7 @@ static int gmc_v9_0_gart_init(struct amdgpu_device *adev)<br>
>>               return 0;<br>
>>       }<br>
>>   <br>
>> -    if (adev->gmc.xgmi.connected_to_cpu) {<br>
>> +    if (adev->gmc.pdb0_enabled) {<br>
>>               adev->gmc.vmid0_page_table_depth = 1;<br>
>>               adev->gmc.vmid0_page_table_block_size = 12;<br>
>>       } else {<br>
>> @@ -1867,7 +1869,7 @@ static int gmc_v9_0_gart_init(struct amdgpu_device *adev)<br>
>>               if (r)<br>
>>                       return r;<br>
>>   <br>
>> -            if (adev->gmc.xgmi.connected_to_cpu)<br>
>> +            if (adev->gmc.pdb0_enabled)<br>
>>                       r = amdgpu_gmc_pdb0_alloc(adev);<br>
>>       }<br>
>>   <br>
>> @@ -2372,7 +2374,7 @@ static int gmc_v9_0_gart_enable(struct amdgpu_device *adev)<br>
>>   {<br>
>>       int r;<br>
>>   <br>
>> -    if (adev->gmc.xgmi.connected_to_cpu)<br>
>> +    if (adev->gmc.pdb0_enabled)<br>
>>               amdgpu_gmc_init_pdb0(adev);<br>
>>   <br>
>>       if (adev->gart.bo == NULL) {<br>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/mmhub_v1_8.c b/drivers/gpu/drm/amd/amdgpu/mmhub_v1_8.c<br>
>> index 84cde1239ee4..18e80aa78aff 100644<br>
>> --- a/drivers/gpu/drm/amd/amdgpu/mmhub_v1_8.c<br>
>> +++ b/drivers/gpu/drm/amd/amdgpu/mmhub_v1_8.c<br>
>> @@ -45,8 +45,10 @@ static u64 mmhub_v1_8_get_fb_location(struct amdgpu_device *adev)<br>
>>       top &= MC_VM_FB_LOCATION_TOP__FB_TOP_MASK;<br>
>>       top <<= 24;<br>
>>   <br>
>> -    adev->gmc.fb_start = base;<br>
>> -    adev->gmc.fb_end = top;<br>
>> +    if (!amdgpu_virt_xgmi_migrate_enabled(adev)) {<br>
>> +            adev->gmc.fb_start = base;<br>
>> +            adev->gmc.fb_end = top;<br>
>> +    }<br>
> We should probably avoid calling this in the first place.<br>
><br>
> The function gmc_v9_0_vram_gtt_location() should probably be adjusted.<br>
<br>
mmhub_v1_8_get_fb_location() is called by the new <br>
amdgpu_bo_fb_aper_addr() as well, not just gmc_v9_0_vram_gtt_location().<br>
mmhub_v1_8_get_fb_location() is supposed to be a query api according to <br>
its name. having such side effect is very surprising.<br>
<br>
Another approach is set the right fb_start and fb_end in the new <br>
amdgpu_virt_resume(), like updating vram_base_offset.<br>
<br>
Which approach do you prefer? Or any better suggestions? Thank you.<br>
<br>
<br>
Regards<br>
Sam<br>
<br>
<br>
<br>
><br>
> Regards,<br>
> Christian.<br>
><br>
>>   <br>
>>       return base;<br>
>>   }<o:p></o:p></p>
</div>
</div>
</body>
</html>