[PATCH v6 3/4] drm/amdgpu: enable pdb0 for hibernation on SRIOV

Wed May 21 08:06:01 UTC 2025

On 5/20/25 07:10, Zhang, GuoQing (Sam) wrote:
>>> +    if (amdgpu_virt_xgmi_migrate_enabled(adev)) {
>>> +            /* set mc->vram_start to 0 to switch the returned GPU address of
>>> +             * amdgpu_bo_create_reserved() from FB aperture to GART aperture.
>>> +             */
>>> +            amdgpu_gmc_vram_location(adev, mc, 0);
>> This function does a lot more than just setting mc->vram_start and mc->vram_end.
>>
>> You should probably just update the two setting and not call amdgpu_gmc_vram_location() at all.
> 
> I tried only setting mc->vram_start and mc->vram_end. But KMD load will
> fail with following error logs.
> 
> [  329.314346] amdgpu 0000:09:00.0: amdgpu: VRAM: 196288M
> 0x0000000000000000 - 0x0000002FEBFFFFFF (196288M used)
> [  329.314348] amdgpu 0000:09:00.0: amdgpu: GART: 512M
> 0x0000018000000000 - 0x000001801FFFFFFF
> [  329.314385] [drm] Detected VRAM RAM=196288M, BAR=262144M
> [  329.314386] [drm] RAM width 8192bits HBM
> [  329.314546] amdgpu 0000:09:00.0: amdgpu: (-22) failed to allocate
> kernel bo
> [  329.315013] [drm:amdgpu_device_init [amdgpu]] *ERROR* sw_init of IP
> block <gmc_v9_0> failed -22
> [  329.315846] amdgpu 0000:09:00.0: amdgpu: amdgpu_device_ip_init failed
> 
> 
> It seems like setting mc->visible_vram_size and mc->visible_vram_size
> fields are also needed. In this case call amdgpu_gmc_vram_location() is
> better than inline the logic, I think.

Yeah, exactly that is not a good idea.

The mc->visible_vram_size and mc->real_vram_size should have been initialized by gmc_v9_0_mc_init(). Why didn't that happen?

>>> diff --git a/drivers/gpu/drm/amd/amdgpu/mmhub_v1_8.c b/drivers/gpu/drm/amd/amdgpu/mmhub_v1_8.c
>>> index 84cde1239ee4..18e80aa78aff 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/mmhub_v1_8.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/mmhub_v1_8.c
>>> @@ -45,8 +45,10 @@ static u64 mmhub_v1_8_get_fb_location(struct amdgpu_device *adev)
>>>       top &= MC_VM_FB_LOCATION_TOP__FB_TOP_MASK;
>>>       top <<= 24;
>>>  
>>> -    adev->gmc.fb_start = base;
>>> -    adev->gmc.fb_end = top;
>>> +    if (!amdgpu_virt_xgmi_migrate_enabled(adev)) {
>>> +            adev->gmc.fb_start = base;
>>> +            adev->gmc.fb_end = top;
>>> +    }
>> We should probably avoid calling this in the first place.
>>
>> The function gmc_v9_0_vram_gtt_location() should probably be adjusted.
> 
> mmhub_v1_8_get_fb_location() is called by the new
> amdgpu_bo_fb_aper_addr() as well, not just gmc_v9_0_vram_gtt_location().

Oh, that is probably a bad idea. The function amdgpu_bo_fb_aper_addr() should only rely on cached data.

> mmhub_v1_8_get_fb_location() is supposed to be a query api according to
> its name. having such side effect is very surprising.
> 
> Another approach is set the right fb_start and fb_end in the new
> amdgpu_virt_resume(), like updating vram_base_offset.

That is probably better. And skip setting fb_start and fb_end in amdgpu_gmc_sysvm_location() for this use case.

That was done only because we re-program those registers on bare metal.

Regards,
Christian.

> 
> Which approach do you prefer? Or any better suggestions? Thank you.
> 
> 
> Regards
> Sam
> 
> 
> 
>>
>> Regards,
>> Christian.
>>
>>>  
>>>       return base;
>>>   }
>