[PATCH 1/5] drm/amdgpu/gfx: add ring helpers for setting workload profile

Fri Jan 10 15:03:04 UTC 2025

On Thu, Jan 9, 2025 at 10:30 PM Lazar, Lijo <lijo.lazar at amd.com> wrote:
>
>
>
> On 1/9/2025 10:36 PM, Alex Deucher wrote:
> > On Thu, Jan 9, 2025 at 12:59 AM Lazar, Lijo <lijo.lazar at amd.com> wrote:
> >>
> >>
> >>
> >> On 1/9/2025 4:26 AM, Alex Deucher wrote:
> >>> Add helpers to switch the workload profile dynamically when
> >>> commands are submitted.  This allows us to switch to
> >>> the FULLSCREEN3D or COMPUTE profile when work is submitted.
> >>> Add a delayed work handler to delay switching out of the
> >>> selected profile if additional work comes in.  This works
> >>> the same as the VIDEO profile for VCN.  This lets dynamically
> >>> enable workload profiles on the fly and then move back
> >>> to the default when there is no work.
> >>>
> >>> Signed-off-by: Alex Deucher <alexander.deucher at amd.com>
> >>> ---
> >>>  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 57 +++++++++++++++++++++++++
> >>>  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h | 11 +++++
> >>>  2 files changed, 68 insertions(+)
> >>>
> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> >>> index 6d5d81f0dc4e7..c542617121393 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> >>> @@ -2110,6 +2110,63 @@ void amdgpu_gfx_enforce_isolation_ring_end_use(struct amdgpu_ring *ring)
> >>>       mutex_unlock(&adev->enforce_isolation_mutex);
> >>>  }
> >>>
> >>> +void amdgpu_gfx_profile_idle_work_handler(struct work_struct *work)
> >>> +{
> >>> +     struct amdgpu_device *adev =
> >>> +             container_of(work, struct amdgpu_device, gfx.idle_work.work);
> >>> +     enum PP_SMC_POWER_PROFILE profile;
> >>> +     u32 i, fences = 0;
> >>> +     int r;
> >>> +
> >>> +     if (adev->gfx.num_gfx_rings)
> >>> +             profile = PP_SMC_POWER_PROFILE_FULLSCREEN3D;
> >>> +     else
> >>> +             profile = PP_SMC_POWER_PROFILE_COMPUTE;
> >>
> >> Since profile selection is in generic code, it makes sense to first
> >> check if the profile is supported for the family. Otherwise, this needs
> >> to be passed by the respective GFX family.
> >
> > The generic code already handles this.  If you select an unsupported
> > profile, it's ignored when the mask is updated.
> >
>
> That is strange. Does that mean user never gets an error if user
> attempts to set an unsupported profile?

If you use sysfs, you can only select from the available options
supported by the chip so there is no way to select a non-supported
profile.  For the internal driver API, we just silently ignore it.

>
> Another problem is this could override the user set profile now. Is that
> intended? In the current logic, whenever user sets a profile, the
> current profile is removed. With this one, another profile gets added
> and the user preference could be ignored depending on the priority.

Yes, I think.  For VCN we already select the video profile in a
similar manner and for ROCm we already select the compute profile,
this just extends that to gfx.  This doesn't really change the
behavior compared to the current state of the driver.  At the moment
we default to fullscreen3d on navi chips and on MI chips we always
enable compute when ROCm is active.  The change here is that we
eventually fall back to the bootup profile by default when the GPU is
idle.  This allows PMFW to enable additional power saving features
while still providing a boost when applications are running.

Alex

>
> Thanks,
> Lijo
>
> > Alex
> >
> >>
> >> Thanks,
> >> Lijo
> >>
> >>> +
> >>> +     for (i = 0; i < AMDGPU_MAX_GFX_RINGS; ++i)
> >>> +             fences += amdgpu_fence_count_emitted(&adev->gfx.gfx_ring[i]);
> >>> +     for (i = 0; i < (AMDGPU_MAX_COMPUTE_RINGS * AMDGPU_MAX_GC_INSTANCES); ++i)
> >>> +             fences += amdgpu_fence_count_emitted(&adev->gfx.compute_ring[i]);
> >>> +     if (!fences && !atomic_read(&adev->gfx.total_submission_cnt)) {
> >>> +             r = amdgpu_dpm_switch_power_profile(adev, profile, false);
> >>> +             if (r)
> >>> +                     dev_warn(adev->dev, "(%d) failed to disable %s power profile mode\n", r,
> >>> +                              profile == PP_SMC_POWER_PROFILE_FULLSCREEN3D ?
> >>> +                              "fullscreen 3D" : "compute");
> >>> +     } else {
> >>> +             schedule_delayed_work(&adev->gfx.idle_work, GFX_PROFILE_IDLE_TIMEOUT);
> >>> +     }
> >>> +}
> >>> +
> >>> +void amdgpu_gfx_profile_ring_begin_use(struct amdgpu_ring *ring)
> >>> +{
> >>> +     struct amdgpu_device *adev = ring->adev;
> >>> +     enum PP_SMC_POWER_PROFILE profile;
> >>> +     int r;
> >>> +
> >>> +     if (adev->gfx.num_gfx_rings)
> >>> +             profile = PP_SMC_POWER_PROFILE_FULLSCREEN3D;
> >>> +     else
> >>> +             profile = PP_SMC_POWER_PROFILE_COMPUTE;
> >>> +
> >>> +     atomic_inc(&adev->gfx.total_submission_cnt);
> >>> +
> >>> +     if (!cancel_delayed_work_sync(&adev->gfx.idle_work)) {
> >>> +             r = amdgpu_dpm_switch_power_profile(adev, profile, true);
> >>> +             if (r)
> >>> +                     dev_warn(adev->dev, "(%d) failed to disable %s power profile mode\n", r,
> >>> +                              profile == PP_SMC_POWER_PROFILE_FULLSCREEN3D ?
> >>> +                              "fullscreen 3D" : "compute");
> >>> +     }
> >>> +}
> >>> +
> >>> +void amdgpu_gfx_profile_ring_end_use(struct amdgpu_ring *ring)
> >>> +{
> >>> +     atomic_dec(&ring->adev->gfx.total_submission_cnt);
> >>> +
> >>> +     schedule_delayed_work(&ring->adev->gfx.idle_work, GFX_PROFILE_IDLE_TIMEOUT);
> >>> +}
> >>> +
> >>>  /*
> >>>   * debugfs for to enable/disable gfx job submission to specific core.
> >>>   */
> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> >>> index 7f9e261f47f11..6c84598caec21 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> >>> @@ -57,6 +57,9 @@ enum amdgpu_gfx_pipe_priority {
> >>>  #define AMDGPU_GFX_QUEUE_PRIORITY_MINIMUM  0
> >>>  #define AMDGPU_GFX_QUEUE_PRIORITY_MAXIMUM  15
> >>>
> >>> +/* 1 second timeout */
> >>> +#define GFX_PROFILE_IDLE_TIMEOUT     msecs_to_jiffies(1000)
> >>> +
> >>>  enum amdgpu_gfx_partition {
> >>>       AMDGPU_SPX_PARTITION_MODE = 0,
> >>>       AMDGPU_DPX_PARTITION_MODE = 1,
> >>> @@ -477,6 +480,9 @@ struct amdgpu_gfx {
> >>>       bool                            kfd_sch_inactive[MAX_XCP];
> >>>       unsigned long                   enforce_isolation_jiffies[MAX_XCP];
> >>>       unsigned long                   enforce_isolation_time[MAX_XCP];
> >>> +
> >>> +     atomic_t                        total_submission_cnt;
> >>> +     struct delayed_work             idle_work;
> >>>  };
> >>>
> >>>  struct amdgpu_gfx_ras_reg_entry {
> >>> @@ -585,6 +591,11 @@ void amdgpu_gfx_cleaner_shader_init(struct amdgpu_device *adev,
> >>>  void amdgpu_gfx_enforce_isolation_handler(struct work_struct *work);
> >>>  void amdgpu_gfx_enforce_isolation_ring_begin_use(struct amdgpu_ring *ring);
> >>>  void amdgpu_gfx_enforce_isolation_ring_end_use(struct amdgpu_ring *ring);
> >>> +
> >>> +void amdgpu_gfx_profile_idle_work_handler(struct work_struct *work);
> >>> +void amdgpu_gfx_profile_ring_begin_use(struct amdgpu_ring *ring);
> >>> +void amdgpu_gfx_profile_ring_end_use(struct amdgpu_ring *ring);
> >>> +
> >>>  void amdgpu_debugfs_gfx_sched_mask_init(struct amdgpu_device *adev);
> >>>  void amdgpu_debugfs_compute_sched_mask_init(struct amdgpu_device *adev);
> >>>
> >>
>