[Intel-gfx] [PATCH v3] drm/i915/hwmon: Use 0 to designate disabled PL1 power limit
Rodrigo Vivi
rodrigo.vivi at intel.com
Mon Apr 3 17:29:32 UTC 2023
On Fri, Mar 31, 2023 at 07:41:46PM -0700, Ashutosh Dixit wrote:
> On ATSM the PL1 limit is disabled at power up. The previous uapi assumed
> that the PL1 limit is always enabled and therefore did not have a notion of
> a disabled PL1 limit. This results in erroneous PL1 limit values when the
> PL1 limit is disabled. For example at power up, the disabled ATSM PL1 limit
> was previously shown as 0 which means a low PL1 limit whereas the limit
> being disabled actually implies a high effective PL1 limit value.
>
> To get round this problem, the PL1 limit uapi is expanded to include a
> special value 0 to designate a disabled PL1 limit. A read value of 0 means
> that the PL1 power limit is disabled, writing 0 disables the limit.
>
> The link between this patch and the bugs mentioned below is as follows:
> * Because on ATSM the PL1 power limit is disabled on power up and there
> were no means to enable it, we previously implemented the means to
> enable the limit when the PL1 hwmon entry (power1_max) was written to.
> * Now there is a IGT igt at i915_hwmon@hwmon_write which (a) reads orig value
> from all hwmon sysfs (b) does a bunch of random writes and finally (c)
> restores the orig value read. On ATSM since the orig value is 0, when
> the IGT restores the 0 value, the PL1 limit is now enabled with a value
> of 0.
> * PL1 limit of 0 implies a low PL1 limit which causes GPU freq to fall to
> 100 MHz. This causes GuC FW load and several IGT's to start timing out
> and gives rise to these Intel CI bugs. After this patch, writing 0 would
> disable the PL1 limit instead of enabling it, avoiding the freq drop
> issue.
>
> v2: Add explanation for bugs mentioned below (Rodrigo)
> v3: Eliminate race during PL1 disable and verify (Tvrtko)
> Change return to -ENODEV if verify fails (Tvrtko)
>
> Link: https://gitlab.freedesktop.org/drm/intel/-/issues/8062
> Link: https://gitlab.freedesktop.org/drm/intel/-/issues/8060
> Signed-off-by: Ashutosh Dixit <ashutosh.dixit at intel.com>
> Reviewed-by: Rodrigo Vivi <rodrigo.vivi at intel.com>
pushed to drm-intel-next
> ---
> .../ABI/testing/sysfs-driver-intel-i915-hwmon | 4 ++-
> drivers/gpu/drm/i915/i915_hwmon.c | 26 +++++++++++++++++++
> 2 files changed, 29 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon b/Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon
> index 2d6a472eef885..8d7d8f05f6cd0 100644
> --- a/Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon
> +++ b/Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon
> @@ -14,7 +14,9 @@ Description: RW. Card reactive sustained (PL1/Tau) power limit in microwatts.
>
> The power controller will throttle the operating frequency
> if the power averaged over a window (typically seconds)
> - exceeds this limit.
> + exceeds this limit. A read value of 0 means that the PL1
> + power limit is disabled, writing 0 disables the
> + limit. Writing values > 0 will enable the power limit.
>
> Only supported for particular Intel i915 graphics platforms.
>
> diff --git a/drivers/gpu/drm/i915/i915_hwmon.c b/drivers/gpu/drm/i915/i915_hwmon.c
> index 596dd2c070106..8e7dccc8d3a0e 100644
> --- a/drivers/gpu/drm/i915/i915_hwmon.c
> +++ b/drivers/gpu/drm/i915/i915_hwmon.c
> @@ -349,6 +349,8 @@ hwm_power_is_visible(const struct hwm_drvdata *ddat, u32 attr, int chan)
> }
> }
>
> +#define PL1_DISABLE 0
> +
> /*
> * HW allows arbitrary PL1 limits to be set but silently clamps these values to
> * "typical but not guaranteed" min/max values in rg.pkg_power_sku. Follow the
> @@ -362,6 +364,14 @@ hwm_power_max_read(struct hwm_drvdata *ddat, long *val)
> intel_wakeref_t wakeref;
> u64 r, min, max;
>
> + /* Check if PL1 limit is disabled */
> + with_intel_runtime_pm(ddat->uncore->rpm, wakeref)
> + r = intel_uncore_read(ddat->uncore, hwmon->rg.pkg_rapl_limit);
> + if (!(r & PKG_PWR_LIM_1_EN)) {
> + *val = PL1_DISABLE;
> + return 0;
> + }
> +
> *val = hwm_field_read_and_scale(ddat,
> hwmon->rg.pkg_rapl_limit,
> PKG_PWR_LIM_1,
> @@ -385,8 +395,24 @@ static int
> hwm_power_max_write(struct hwm_drvdata *ddat, long val)
> {
> struct i915_hwmon *hwmon = ddat->hwmon;
> + intel_wakeref_t wakeref;
> u32 nval;
>
> + /* Disable PL1 limit and verify, because the limit cannot be disabled on all platforms */
> + if (val == PL1_DISABLE) {
> + mutex_lock(&hwmon->hwmon_lock);
> + with_intel_runtime_pm(ddat->uncore->rpm, wakeref) {
> + intel_uncore_rmw(ddat->uncore, hwmon->rg.pkg_rapl_limit,
> + PKG_PWR_LIM_1_EN, 0);
> + nval = intel_uncore_read(ddat->uncore, hwmon->rg.pkg_rapl_limit);
> + }
> + mutex_unlock(&hwmon->hwmon_lock);
> +
> + if (nval & PKG_PWR_LIM_1_EN)
> + return -ENODEV;
> + return 0;
> + }
> +
> /* Computation in 64-bits to avoid overflow. Round to nearest. */
> nval = DIV_ROUND_CLOSEST_ULL((u64)val << hwmon->scl_shift_power, SF_POWER);
> nval = PKG_PWR_LIM_1_EN | REG_FIELD_PREP(PKG_PWR_LIM_1, nval);
> --
> 2.38.0
>
More information about the Intel-gfx
mailing list