[PATCH V2 1/2] drm/amdgpu/gfx: fix ref counting for ring based profile handling
Alex Deucher
alexdeucher at gmail.com
Fri Mar 14 13:00:12 UTC 2025
On Fri, Mar 14, 2025 at 6:31 AM Lazar, Lijo <lijo.lazar at amd.com> wrote:
>
>
>
> On 3/12/2025 7:49 PM, Alex Deucher wrote:
> > We need to make sure the workload profile ref counts are
> > balanced. This isn't currently the case because we can
> > increment the count on submissions, but the decrement may
> > be delayed as work comes in. Track when we enable the
> > workload profile so the references are balanced.
> >
> > v2: switch to a mutex and active flag
> >
> > Fixes: 8fdb3958e396 ("drm/amdgpu/gfx: add ring helpers for setting workload profile")
> > Cc: Yang Wang <kevinyang.wang at amd.com>
> > Cc: Kenneth Feng <kenneth.feng at amd.com>
> > Signed-off-by: Alex Deucher <alexander.deucher at amd.com>
> > ---
> > drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 30 ++++++++++++++++---------
> > drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h | 2 ++
> > 2 files changed, 22 insertions(+), 10 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> > index 984e6ff6e4632..099329d15b9ff 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> > @@ -2160,11 +2160,16 @@ void amdgpu_gfx_profile_idle_work_handler(struct work_struct *work)
> > for (i = 0; i < (AMDGPU_MAX_COMPUTE_RINGS * AMDGPU_MAX_GC_INSTANCES); ++i)
> > fences += amdgpu_fence_count_emitted(&adev->gfx.compute_ring[i]);
> > if (!fences && !atomic_read(&adev->gfx.total_submission_cnt)) {
> > - r = amdgpu_dpm_switch_power_profile(adev, profile, false);
> > - if (r)
> > - dev_warn(adev->dev, "(%d) failed to disable %s power profile mode\n", r,
> > - profile == PP_SMC_POWER_PROFILE_FULLSCREEN3D ?
> > - "fullscreen 3D" : "compute");
> > + mutex_lock(&adev->gfx.workload_profile_mutex);
> > + if (adev->gfx.workload_profile_active) {
> > + r = amdgpu_dpm_switch_power_profile(adev, profile, false);
> > + if (r)
> > + dev_warn(adev->dev, "(%d) failed to disable %s power profile mode\n", r,
> > + profile == PP_SMC_POWER_PROFILE_FULLSCREEN3D ?
> > + "fullscreen 3D" : "compute");
> > + adev->gfx.workload_profile_active = false;
> > + }
> > + mutex_unlock(&adev->gfx.workload_profile_mutex);
> > } else {
> > schedule_delayed_work(&adev->gfx.idle_work, GFX_PROFILE_IDLE_TIMEOUT);
> > }
> > @@ -2184,11 +2189,16 @@ void amdgpu_gfx_profile_ring_begin_use(struct amdgpu_ring *ring)
> > atomic_inc(&adev->gfx.total_submission_cnt);
> >
> > if (!cancel_delayed_work_sync(&adev->gfx.idle_work)) {
>
> I guess this has the same problem as mentioned in the earlier patch.
> AFAIU, this will switch profile only if no idle work is scheduled. If a
> begin_use call comes when idle_work is being executed, there is a chance
> that idle_work completes amdgpu_dpm_switch_power_profile(adev, profile,
> false). Then this would skip changing the profile.
I think that problem exists already independent of the ref counting.
I suppose there isn't actually a need to make the workload change
dependent on the cancelling of the delayed work. I'll send out some
patches to address this.
Alex
>
> Thanks,
> Lijo
>
> > - r = amdgpu_dpm_switch_power_profile(adev, profile, true);
> > - if (r)
> > - dev_warn(adev->dev, "(%d) failed to disable %s power profile mode\n", r,
> > - profile == PP_SMC_POWER_PROFILE_FULLSCREEN3D ?
> > - "fullscreen 3D" : "compute");
> > + mutex_lock(&adev->gfx.workload_profile_mutex);
> > + if (!adev->gfx.workload_profile_active) {
> > + r = amdgpu_dpm_switch_power_profile(adev, profile, true);
> > + if (r)
> > + dev_warn(adev->dev, "(%d) failed to disable %s power profile mode\n", r,
> > + profile == PP_SMC_POWER_PROFILE_FULLSCREEN3D ?
> > + "fullscreen 3D" : "compute");
> > + adev->gfx.workload_profile_active = true;
> > + }
> > + mutex_unlock(&adev->gfx.workload_profile_mutex);
> > }
> > }
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> > index ddf4533614bac..75af4f25a133b 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> > @@ -483,6 +483,8 @@ struct amdgpu_gfx {
> >
> > atomic_t total_submission_cnt;
> > struct delayed_work idle_work;
> > + bool workload_profile_active;
> > + struct mutex workload_profile_mutex;
> > };
> >
> > struct amdgpu_gfx_ras_reg_entry {
>
More information about the amd-gfx
mailing list