[PATCH] drm/amdgpu: Check resize bar register when system uses large bar

Christian König ckoenig.leichtzumerken at gmail.com
Fri Jan 5 13:39:50 UTC 2024


Am 21.12.23 um 02:58 schrieb Ma, Jun:
> Hi Christian,
>
>
> On 12/20/2023 10:10 PM, Christian König wrote:
>> Am 19.12.23 um 06:58 schrieb Ma Jun:
>>> Print a warnning message if the system can't access
>>> the resize bar register when using large bar.
>> Well pretty clear NAK, we have embedded use cases where this would
>> trigger an incorrect warning.
>>
>> What should that be good for in the first place?
>>
> Some customer platforms do not enable mmconfig for various reasons, such as
> bios bug, and therefore cannot access the GPU extend configuration
> space through mmio.
>
> Therefore, when the system enters the d3cold state and resumes,
> the amdgpu driver fails to resume because the extend configuration
> space registers of GPU can't be restored. At this point, Usually we
> only see some failure dmesg log printed by amdgpu driver, it is
> difficult to find the root cause.
>
> So I thought it would be helpful to print some warning messages at
> the beginning to identify problems quickly.

Interesting bug, but we can't do this here. We have a couple of devices 
where the REBAR cap isn't enabled for some reason (or not correctly 
enabled).

In this case this would print a warning even when there isn't anything 
wrong.

What we could potentially do is to check for the MSI extension, that 
should always be there if I'm not completely mistaken.

But how does this hardware platform even works without the extended mmio 
space? I mean we can't even enable/disable MSI in that configuration if 
I'm not completely mistaken.

Regards,
Christian.

>
> Regards,
> Ma Jun
>
>> Regards,
>> Christian.
>>
>>> Signed-off-by: Ma Jun <Jun.Ma2 at amd.com>
>>> ---
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 +++++++++-
>>>    1 file changed, 9 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> index 4b694696930e..e7aedb4bd66e 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> @@ -1417,6 +1417,12 @@ void amdgpu_device_wb_free(struct amdgpu_device *adev, u32 wb)
>>>    		__clear_bit(wb, adev->wb.used);
>>>    }
>>>    
>>> +static inline void amdgpu_find_rb_register(struct amdgpu_device *adev)
>>> +{
>>> +	if (!pci_find_ext_capability(adev->pdev, PCI_EXT_CAP_ID_REBAR))
>>> +		DRM_WARN("System can't access the resize bar register,please check!!\n");
>>> +}
>>> +
>>>    /**
>>>     * amdgpu_device_resize_fb_bar - try to resize FB BAR
>>>     *
>>> @@ -1444,8 +1450,10 @@ int amdgpu_device_resize_fb_bar(struct amdgpu_device *adev)
>>>    
>>>    	/* skip if the bios has already enabled large BAR */
>>>    	if (adev->gmc.real_vram_size &&
>>> -	    (pci_resource_len(adev->pdev, 0) >= adev->gmc.real_vram_size))
>>> +	    (pci_resource_len(adev->pdev, 0) >= adev->gmc.real_vram_size)) {
>>> +		amdgpu_find_rb_register(adev);
>>>    		return 0;
>>> +	}
>>>    
>>>    	/* Check if the root BUS has 64bit memory resources */
>>>    	root = adev->pdev->bus;



More information about the amd-gfx mailing list