[PATCH 2/2] drm/amd/pm: fix the deadlock observed on performance_level setting
Quan, Evan
Evan.Quan at amd.com
Wed Jan 26 04:03:56 UTC 2022
[AMD Official Use Only]
> -----Original Message-----
> From: Alex Deucher <alexdeucher at gmail.com>
> Sent: Tuesday, January 25, 2022 11:35 PM
> To: Quan, Evan <Evan.Quan at amd.com>
> Cc: amd-gfx list <amd-gfx at lists.freedesktop.org>; Deucher, Alexander
> <Alexander.Deucher at amd.com>
> Subject: Re: [PATCH 2/2] drm/amd/pm: fix the deadlock observed on
> performance_level setting
>
> On Tue, Jan 25, 2022 at 3:57 AM Evan Quan <evan.quan at amd.com> wrote:
> >
> > The sub-routine(amdgpu_gfx_off_ctrl) tried to obtain the lock
> > adev->pm.mutex which was actually hold by
> amdgpu_dpm_force_performance_level.
> > A deadlock happened then.
> >
> > Signed-off-by: Evan Quan <evan.quan at amd.com>
> > Change-Id: Id692829381dedc6380f5464d74107d696f7abca1
>
> I think in the long term, we should fix up the logic to allow us to keep the lock
> across the whole function, but for now,
[Quan, Evan] Agreed. But I have not figured out what's the proper way to do that. In my opinion we could either enforce a different lock inside the function.
Or we move this interface somewhere upper layer(e.g. amdgpu_device.c) and add some lock protections there.
But as you said, let's land this a temporary fix for now.
BR
Evan
>
> Reviewed-by: Alex Deucher <alexander.deucher at amd.com>
>
> > ---
> > drivers/gpu/drm/amd/pm/amdgpu_dpm.c | 50
> > ++++++++++-------------------
> > 1 file changed, 17 insertions(+), 33 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
> > b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
> > index 5fc33893a68c..ef574c96b41c 100644
> > --- a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
> > +++ b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
> > @@ -692,25 +692,16 @@ void amdgpu_dpm_set_power_state(struct
> amdgpu_device *adev,
> > amdgpu_dpm_compute_clocks(adev); }
> >
> > -static enum amd_dpm_forced_level
> > amdgpu_dpm_get_performance_level_locked(struct amdgpu_device
> *adev)
> > +enum amd_dpm_forced_level
> amdgpu_dpm_get_performance_level(struct
> > +amdgpu_device *adev)
> > {
> > const struct amd_pm_funcs *pp_funcs = adev->powerplay.pp_funcs;
> > enum amd_dpm_forced_level level;
> >
> > + mutex_lock(&adev->pm.mutex);
> > if (pp_funcs->get_performance_level)
> > level = pp_funcs->get_performance_level(adev-
> >powerplay.pp_handle);
> > else
> > level = adev->pm.dpm.forced_level;
> > -
> > - return level;
> > -}
> > -
> > -enum amd_dpm_forced_level
> amdgpu_dpm_get_performance_level(struct
> > amdgpu_device *adev) -{
> > - enum amd_dpm_forced_level level;
> > -
> > - mutex_lock(&adev->pm.mutex);
> > - level = amdgpu_dpm_get_performance_level_locked(adev);
> > mutex_unlock(&adev->pm.mutex);
> >
> > return level;
> > @@ -725,23 +716,16 @@ int
> amdgpu_dpm_force_performance_level(struct amdgpu_device *adev,
> > AMD_DPM_FORCED_LEVEL_PROFILE_MIN_SCLK |
> > AMD_DPM_FORCED_LEVEL_PROFILE_MIN_MCLK |
> > AMD_DPM_FORCED_LEVEL_PROFILE_PEAK;
> > - int ret = 0;
> >
> > if (!pp_funcs->force_performance_level)
> > return 0;
> >
> > - mutex_lock(&adev->pm.mutex);
> > -
> > - if (adev->pm.dpm.thermal_active) {
> > - ret = -EINVAL;
> > - goto out;
> > - }
> > + if (adev->pm.dpm.thermal_active)
> > + return -EINVAL;
> >
> > - current_level = amdgpu_dpm_get_performance_level_locked(adev);
> > - if (current_level == level) {
> > - ret = 0;
> > - goto out;
> > - }
> > + current_level = amdgpu_dpm_get_performance_level(adev);
> > + if (current_level == level)
> > + return 0;
> >
> > if (adev->asic_type == CHIP_RAVEN) {
> > if (!(adev->apu_flags & AMD_APU_IS_RAVEN2)) { @@
> > -755,10 +739,8 @@ int amdgpu_dpm_force_performance_level(struct
> amdgpu_device *adev,
> > }
> >
> > if (!(current_level & profile_mode_mask) &&
> > - (level == AMD_DPM_FORCED_LEVEL_PROFILE_EXIT)) {
> > - ret = -EINVAL;
> > - goto out;
> > - }
> > + (level == AMD_DPM_FORCED_LEVEL_PROFILE_EXIT))
> > + return -EINVAL;
> >
> > if (!(current_level & profile_mode_mask) &&
> > (level & profile_mode_mask)) { @@ -780,17 +762,19 @@ int
> > amdgpu_dpm_force_performance_level(struct amdgpu_device *adev,
> > AMD_PG_STATE_GATE);
> > }
> >
> > + mutex_lock(&adev->pm.mutex);
> > +
> > if (pp_funcs->force_performance_level(adev->powerplay.pp_handle,
> > - level))
> > - ret = -EINVAL;
> > + level)) {
> > + mutex_unlock(&adev->pm.mutex);
> > + return -EINVAL;
> > + }
> >
> > - if (!ret)
> > - adev->pm.dpm.forced_level = level;
> > + adev->pm.dpm.forced_level = level;
> >
> > -out:
> > mutex_unlock(&adev->pm.mutex);
> >
> > - return ret;
> > + return 0;
> > }
> >
> > int amdgpu_dpm_get_pp_num_states(struct amdgpu_device *adev,
> > --
> > 2.29.0
> >
More information about the amd-gfx
mailing list