[PATCH] drm/amdgpu: Check resize bar register when system uses large bar

Mario Limonciello mario.limonciello at amd.com
Sat Dec 30 02:22:52 UTC 2023


On 12/20/2023 19:58, Ma, Jun wrote:
> Hi Christian,
> 
> 
> On 12/20/2023 10:10 PM, Christian König wrote:
>> Am 19.12.23 um 06:58 schrieb Ma Jun:
>>> Print a warnning message if the system can't access
>>> the resize bar register when using large bar.
>>
>> Well pretty clear NAK, we have embedded use cases where this would
>> trigger an incorrect warning.
>>
>> What should that be good for in the first place?
>>
> Some customer platforms do not enable mmconfig for various reasons, such as
> bios bug, and therefore cannot access the GPU extend configuration
> space through mmio.
> 
> Therefore, when the system enters the d3cold state and resumes,
> the amdgpu driver fails to resume because the extend configuration
> space registers of GPU can't be restored. At this point, Usually we
> only see some failure dmesg log printed by amdgpu driver, it is
> difficult to find the root cause.
> 
> So I thought it would be helpful to print some warning messages at
> the beginning to identify problems quickly.

This doesn't yet have review comments with the holidays but I think this 
is a scalable solution to that specific issue:

https://lore.kernel.org/linux-pci/20231215220343.22523-1-mario.limonciello@amd.com/

Can you try on one of these affected systems and see that it helps?

> 
> Regards,
> Ma Jun
> 
>> Regards,
>> Christian.
>>
>>>
>>> Signed-off-by: Ma Jun <Jun.Ma2 at amd.com>
>>> ---
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 +++++++++-
>>>    1 file changed, 9 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> index 4b694696930e..e7aedb4bd66e 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> @@ -1417,6 +1417,12 @@ void amdgpu_device_wb_free(struct amdgpu_device *adev, u32 wb)
>>>    		__clear_bit(wb, adev->wb.used);
>>>    }
>>>    
>>> +static inline void amdgpu_find_rb_register(struct amdgpu_device *adev)
>>> +{
>>> +	if (!pci_find_ext_capability(adev->pdev, PCI_EXT_CAP_ID_REBAR))
>>> +		DRM_WARN("System can't access the resize bar register,please check!!\n");
>>> +}
>>> +
>>>    /**
>>>     * amdgpu_device_resize_fb_bar - try to resize FB BAR
>>>     *
>>> @@ -1444,8 +1450,10 @@ int amdgpu_device_resize_fb_bar(struct amdgpu_device *adev)
>>>    
>>>    	/* skip if the bios has already enabled large BAR */
>>>    	if (adev->gmc.real_vram_size &&
>>> -	    (pci_resource_len(adev->pdev, 0) >= adev->gmc.real_vram_size))
>>> +	    (pci_resource_len(adev->pdev, 0) >= adev->gmc.real_vram_size)) {
>>> +		amdgpu_find_rb_register(adev);
>>>    		return 0;
>>> +	}
>>>    
>>>    	/* Check if the root BUS has 64bit memory resources */
>>>    	root = adev->pdev->bus;
>>



More information about the amd-gfx mailing list