[PATCH 2/6] drm/amd/pm: Add arcturus throttler translation

Sider, Graham Graham.Sider at amd.com
Fri May 21 21:32:34 UTC 2021


Would this be referring to tools that may parse /sys/class/.../device/gpu_metrics or the actual gpu_metrics_vX_Y structs? For the latter, if there are tools that parse dependent on version vX_Y, I agree that we would not want to break those.

Since most ASICs are using different version currently, we would have to create a duplicate struct for each gpu_metrics version currently being used, unless I'm misunderstanding. I'm not sure if this is what you had in mind - let me know.

Best,
Graham

-----Original Message-----
From: Alex Deucher <alexdeucher at gmail.com> 
Sent: Friday, May 21, 2021 4:15 PM
To: Sider, Graham <Graham.Sider at amd.com>
Cc: amd-gfx list <amd-gfx at lists.freedesktop.org>; Kasiviswanathan, Harish <Harish.Kasiviswanathan at amd.com>; Sakhnovitch, Elena (Elen) <Elena.Sakhnovitch at amd.com>
Subject: Re: [PATCH 2/6] drm/amd/pm: Add arcturus throttler translation

[CAUTION: External Email]

On Fri, May 21, 2021 at 1:39 PM Sider, Graham <Graham.Sider at amd.com> wrote:
>
> Hi Alex,
>
> Are you referring to bumping the gpu_metrics_vX_Y version number? Different ASICs are currently using different version numbers already, so I'm not sure how feasible this might be (e.g. arcturus ==  gpu_metrics_v1_1, navi1x == gpu_metrics_v1_3, vangogh == gpu_metrics_v2_1).
>
> Technically speaking no new fields have been added to any of the gpu_metrics versions, just a change in representation in the throttle_status field. Let me know your thoughts on this.
>

I don't know if we have any existing tools out there that parse this data, but if so, they would interpret it incorrectly after this change.  If we bump the version, at least the tools will know how to handle it.

Alex


> Best,
> Graham
>
> -----Original Message-----
> From: Alex Deucher <alexdeucher at gmail.com>
> Sent: Friday, May 21, 2021 10:27 AM
> To: Sider, Graham <Graham.Sider at amd.com>
> Cc: amd-gfx list <amd-gfx at lists.freedesktop.org>; Kasiviswanathan, 
> Harish <Harish.Kasiviswanathan at amd.com>; Sakhnovitch, Elena (Elen) 
> <Elena.Sakhnovitch at amd.com>
> Subject: Re: [PATCH 2/6] drm/amd/pm: Add arcturus throttler 
> translation
>
> [CAUTION: External Email]
>
> General comment on the patch series, do you want to bump the metrics table version since the meaning of the throttler status has changed?
>
> Alex
>
> On Thu, May 20, 2021 at 10:30 AM Graham Sider <Graham.Sider at amd.com> wrote:
> >
> > Perform dependent to independent throttle status translation for 
> > arcturus.
> > ---
> >  .../gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c | 62
> > ++++++++++++++++---
> >  1 file changed, 53 insertions(+), 9 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c
> > b/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c
> > index 1735a96dd307..7c01c0bf2073 100644
> > --- a/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c
> > +++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c
> > @@ -540,6 +540,49 @@ static int arcturus_freqs_in_same_level(int32_t frequency1,
> >         return (abs(frequency1 - frequency2) <= EPSILON);  }
> >
> > +static uint32_t arcturus_get_indep_throttler_status(
> > +                                       unsigned long
> > +dep_throttler_status) {
> > +       unsigned long indep_throttler_status = 0;
> > +
> > +       __assign_bit(INDEP_THROTTLER_TEMP_EDGE_BIT, &indep_throttler_status,
> > +                 test_bit(THROTTLER_TEMP_EDGE_BIT, &dep_throttler_status));
> > +       __assign_bit(INDEP_THROTTLER_TEMP_HOTSPOT_BIT, &indep_throttler_status,
> > +                 test_bit(THROTTLER_TEMP_HOTSPOT_BIT, &dep_throttler_status));
> > +       __assign_bit(INDEP_THROTTLER_TEMP_MEM_BIT, &indep_throttler_status,
> > +                 test_bit(THROTTLER_TEMP_MEM_BIT, &dep_throttler_status));
> > +       __assign_bit(INDEP_THROTTLER_TEMP_VR_GFX_BIT, &indep_throttler_status,
> > +                 test_bit(THROTTLER_TEMP_VR_GFX_BIT, &dep_throttler_status));
> > +       __assign_bit(INDEP_THROTTLER_TEMP_VR_MEM_BIT, &indep_throttler_status,
> > +                 test_bit(THROTTLER_TEMP_VR_MEM_BIT, &dep_throttler_status));
> > +       __assign_bit(INDEP_THROTTLER_TEMP_VR_SOC_BIT, &indep_throttler_status,
> > +                 test_bit(THROTTLER_TEMP_VR_SOC_BIT, &dep_throttler_status));
> > +       __assign_bit(INDEP_THROTTLER_TDC_GFX_BIT, &indep_throttler_status,
> > +                 test_bit(THROTTLER_TDC_GFX_BIT, &dep_throttler_status));
> > +       __assign_bit(INDEP_THROTTLER_TDC_SOC_BIT, &indep_throttler_status,
> > +                 test_bit(THROTTLER_TDC_SOC_BIT, &dep_throttler_status));
> > +       __assign_bit(INDEP_THROTTLER_PPT0_BIT, &indep_throttler_status,
> > +                 test_bit(THROTTLER_PPT0_BIT, &dep_throttler_status));
> > +       __assign_bit(INDEP_THROTTLER_PPT1_BIT, &indep_throttler_status,
> > +                 test_bit(THROTTLER_PPT1_BIT, &dep_throttler_status));
> > +       __assign_bit(INDEP_THROTTLER_PPT2_BIT, &indep_throttler_status,
> > +                 test_bit(THROTTLER_PPT2_BIT, &dep_throttler_status));
> > +       __assign_bit(INDEP_THROTTLER_PPT3_BIT, &indep_throttler_status,
> > +                 test_bit(THROTTLER_PPT3_BIT, &dep_throttler_status));
> > +       __assign_bit(INDEP_THROTTLER_PPM_BIT, &indep_throttler_status,
> > +                 test_bit(THROTTLER_PPM_BIT, &dep_throttler_status));
> > +       __assign_bit(INDEP_THROTTLER_FIT_BIT, &indep_throttler_status,
> > +                 test_bit(THROTTLER_FIT_BIT, &dep_throttler_status));
> > +       __assign_bit(INDEP_THROTTLER_APCC_BIT, &indep_throttler_status,
> > +                 test_bit(THROTTLER_APCC_BIT, &dep_throttler_status));
> > +       __assign_bit(INDEP_THROTTLER_VRHOT0_BIT, &indep_throttler_status,
> > +                 test_bit(THROTTLER_VRHOT0_BIT, &dep_throttler_status));
> > +       __assign_bit(INDEP_THROTTLER_VRHOT1_BIT, &indep_throttler_status,
> > +                 test_bit(THROTTLER_VRHOT1_BIT, 
> > + &dep_throttler_status));
> > +
> > +       return (uint32_t)indep_throttler_status; }
> > +
> >  static int arcturus_get_smu_metrics_data(struct smu_context *smu,
> >                                          MetricsMember_t member,
> >                                          uint32_t *value) @@ -629,7
> > +672,7 @@ static int arcturus_get_smu_metrics_data(struct 
> > +smu_context *smu,
> >                         SMU_TEMPERATURE_UNITS_PER_CENTIGRADES;
> >                 break;
> >         case METRICS_THROTTLER_STATUS:
> > -               *value = metrics->ThrottlerStatus;
> > +               *value =
> > + arcturus_get_indep_throttler_status(metrics->ThrottlerStatus);
> >                 break;
> >         case METRICS_CURR_FANSPEED:
> >                 *value = metrics->CurrFanSpeed; @@ -2213,13 +2256,13 
> > @@ static const struct throttling_logging_label {
> >         uint32_t feature_mask;
> >         const char *label;
> >  } logging_label[] = {
> > -       {(1U << THROTTLER_TEMP_HOTSPOT_BIT), "GPU"},
> > -       {(1U << THROTTLER_TEMP_MEM_BIT), "HBM"},
> > -       {(1U << THROTTLER_TEMP_VR_GFX_BIT), "VR of GFX rail"},
> > -       {(1U << THROTTLER_TEMP_VR_MEM_BIT), "VR of HBM rail"},
> > -       {(1U << THROTTLER_TEMP_VR_SOC_BIT), "VR of SOC rail"},
> > -       {(1U << THROTTLER_VRHOT0_BIT), "VR0 HOT"},
> > -       {(1U << THROTTLER_VRHOT1_BIT), "VR1 HOT"},
> > +       {(1U << INDEP_THROTTLER_TEMP_HOTSPOT_BIT), "GPU"},
> > +       {(1U << INDEP_THROTTLER_TEMP_MEM_BIT), "HBM"},
> > +       {(1U << INDEP_THROTTLER_TEMP_VR_GFX_BIT), "VR of GFX rail"},
> > +       {(1U << INDEP_THROTTLER_TEMP_VR_MEM_BIT), "VR of HBM rail"},
> > +       {(1U << INDEP_THROTTLER_TEMP_VR_SOC_BIT), "VR of SOC rail"},
> > +       {(1U << INDEP_THROTTLER_VRHOT0_BIT), "VR0 HOT"},
> > +       {(1U << INDEP_THROTTLER_VRHOT1_BIT), "VR1 HOT"},
> >  };
> >  static void arcturus_log_thermal_throttling_event(struct 
> > smu_context
> > *smu)  { @@ -2314,7 +2357,8 @@ static ssize_t 
> > arcturus_get_gpu_metrics(struct smu_context *smu,
> >         gpu_metrics->current_vclk0 = metrics.CurrClock[PPCLK_VCLK];
> >         gpu_metrics->current_dclk0 = metrics.CurrClock[PPCLK_DCLK];
> >
> > -       gpu_metrics->throttle_status = metrics.ThrottlerStatus;
> > +       gpu_metrics->throttle_status =
> > +
> > + arcturus_get_indep_throttler_status(metrics.ThrottlerStatus);
> >
> >         gpu_metrics->current_fan_speed = metrics.CurrFanSpeed;
> >
> > --
> > 2.17.1
> >
> > _______________________________________________
> > amd-gfx mailing list
> > amd-gfx at lists.freedesktop.org
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fli
> > st 
> > s.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=04%7C01%7C
> > Gr
> > aham.Sider%40amd.com%7Ca3ca9a6b0576479e545808d91c648f50%7C3dd8961fe4
> > 88 
> > 4e608e11a82d994e183d%7C0%7C0%7C637572040495495758%7CUnknown%7CTWFpbG
> > Zs 
> > b3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%
> > 3D 
> > %7C1000&sdata=YxUx7BrsQKBauKE3fHpNrkWMAG4dBy11fV9xnJdMHns%3D&amp
> > ;r
> > eserved=0


More information about the amd-gfx mailing list