[PATCH v3 3/3] drm/amd: Retry delayed work handler if sensor is busy
Lazar, Lijo
lijo.lazar at amd.com
Mon Dec 18 05:21:13 UTC 2023
On 12/16/2023 1:25 AM, Mario Limonciello wrote:
> The SW CTF delayed work handler triggers a shutdown if a sensor
> read failed for any reason.
>
> The specific circumstance of a busy sensor should be retried
> however to ensure that a good value can be returned.
>
> Signed-off-by: Mario Limonciello <mario.limonciello at amd.com>
> ---
> drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
> index 963cf6e76935..5eb46b6bad43 100644
> --- a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
> +++ b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
> @@ -1182,6 +1182,12 @@ static void smu_swctf_delayed_work_handler(struct work_struct *work)
> if (hotspot_tmp / 1000 < range->software_shutdown_temp)
> return;
> break;
> + case -EBUSY:
In patch 1, presently -EBUSY is returned for
1) RAS interrupt - A RAS interrupt will eventually result in a reset of
the device. All processes running on the device are going to be
suspended before that, so a reschedule here won't be necessary.
2) Only for arcturus, aldebaran and smu v13.0.6 - Aldebaran and SMU
v13.0.6 don't use SW CTF (SW CTF limit is set in aldebaran in such a way
that it won't be hit). I don't know about SW CTF usage in arcturus.
Thanks,
Lijo
> + dev_warn(adev->dev, "Unable to read hotspot sensor, retrying in %d ms\n",
> + AMDGPU_SWCTF_EXTRA_DELAY);
> + schedule_delayed_work(&smu->swctf_delayed_work,
> + msecs_to_jiffies(AMDGPU_SWCTF_EXTRA_DELAY));
> + return;
> default:
> dev_err(adev->dev, "Failed to read hotspot temperature: %d\n", r);
> }
More information about the amd-gfx
mailing list