[PATCH v2] drm/amgpu: Check return value of amdgpu_device_baco_enter/exit
Ma, Jun
majun at amd.com
Thu Feb 29 11:10:01 UTC 2024
Hi Lijo,
On 2/29/2024 3:33 PM, Lazar, Lijo wrote:
>
>
> On 2/29/2024 11:49 AM, Ma Jun wrote:
>> Check return value of amdgpu_device_baco_enter/exit and print
>> warning message because these errors may cause runtime resume failure
>>
>> Signed-off-by: Ma Jun <Jun.Ma2 at amd.com>
>> ---
>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 29 ++++++++++++++++------
>> 1 file changed, 22 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index e68bd6f8a6a4..4928b588cd12 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -6000,15 +6000,24 @@ int amdgpu_device_baco_enter(struct drm_device *dev)
>> {
>> struct amdgpu_device *adev = drm_to_adev(dev);
>> struct amdgpu_ras *ras = amdgpu_ras_get_context(adev);
>> + int ret = 0;
>>
>> - if (!amdgpu_device_supports_baco(dev))
>> - return -ENOTSUPP;
>> + if (!amdgpu_device_supports_baco(dev)) {
>> + ret = -ENOTSUPP;
>> + goto baco_error;
>> + }
>>
>> if (ras && adev->ras_enabled &&
>> adev->nbio.funcs->enable_doorbell_interrupt)
>> adev->nbio.funcs->enable_doorbell_interrupt(adev, false);
>>
>> - return amdgpu_dpm_baco_enter(adev);
>> + ret = amdgpu_dpm_baco_enter(adev);
>> +
>> +baco_error:
>> + if (ret)
>> + dev_warn(adev->dev, "warning: device fails to enter baco. ret=%d\n", ret);
>> +
>
> This doesn't look like a real case, moreover the warning message is
In fact this is a case that actually happened.
When amdgpu_device_supports_baco returns with error because of some reasons,
device will enter runtime suspend without calling the amdgpu_device_baco_enter()
and without any warning message being printed. Then, device is usually fails
to resume.
So, I add this message as a warning to help us find the real reason.
> misleading. If the device doesn't support baco, driver is not supposed
> to call it for runpm purpose -
> https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c#L2664
>
I changed this code in another patch.
https://lists.freedesktop.org/archives/amd-gfx/2024-February/104929.html
Regards,
Ma Jun
> Thanks,
> Lijo
>
>> + return ret;
>> }
>>
>> int amdgpu_device_baco_exit(struct drm_device *dev)
>> @@ -6017,12 +6026,14 @@ int amdgpu_device_baco_exit(struct drm_device *dev)
>> struct amdgpu_ras *ras = amdgpu_ras_get_context(adev);
>> int ret = 0;
>>
>> - if (!amdgpu_device_supports_baco(dev))
>> - return -ENOTSUPP;
>> + if (!amdgpu_device_supports_baco(dev)) {
>> + ret = -ENOTSUPP;
>> + goto baco_error;
>> + }
>>
>> ret = amdgpu_dpm_baco_exit(adev);
>> if (ret)
>> - return ret;
>> + goto baco_error;
>>
>> if (ras && adev->ras_enabled &&
>> adev->nbio.funcs->enable_doorbell_interrupt)
>> @@ -6032,7 +6043,11 @@ int amdgpu_device_baco_exit(struct drm_device *dev)
>> adev->nbio.funcs->clear_doorbell_interrupt)
>> adev->nbio.funcs->clear_doorbell_interrupt(adev);
>>
>> - return 0;
>> +baco_error:
>> + if (ret)
>> + dev_warn(adev->dev, "warning: device fails to exit from baco. ret=%d\n", ret);
>> +
>> + return ret;
>> }
>>
>> /**
More information about the amd-gfx
mailing list