[Intel-gfx] [PATCH v3] drm/i915/hwmon: Use 0 to designate disabled PL1 power limit

Rodrigo Vivi rodrigo.vivi at intel.com
Mon Apr 3 17:29:32 UTC 2023


On Fri, Mar 31, 2023 at 07:41:46PM -0700, Ashutosh Dixit wrote:
> On ATSM the PL1 limit is disabled at power up. The previous uapi assumed
> that the PL1 limit is always enabled and therefore did not have a notion of
> a disabled PL1 limit. This results in erroneous PL1 limit values when the
> PL1 limit is disabled. For example at power up, the disabled ATSM PL1 limit
> was previously shown as 0 which means a low PL1 limit whereas the limit
> being disabled actually implies a high effective PL1 limit value.
> 
> To get round this problem, the PL1 limit uapi is expanded to include a
> special value 0 to designate a disabled PL1 limit. A read value of 0 means
> that the PL1 power limit is disabled, writing 0 disables the limit.
> 
> The link between this patch and the bugs mentioned below is as follows:
> * Because on ATSM the PL1 power limit is disabled on power up and there
>   were no means to enable it, we previously implemented the means to
>   enable the limit when the PL1 hwmon entry (power1_max) was written to.
> * Now there is a IGT igt at i915_hwmon@hwmon_write which (a) reads orig value
>   from all hwmon sysfs  (b) does a bunch of random writes and finally (c)
>   restores the orig value read. On ATSM since the orig value is 0, when
>   the IGT restores the 0 value, the PL1 limit is now enabled with a value
>   of 0.
> * PL1 limit of 0 implies a low PL1 limit which causes GPU freq to fall to
>   100 MHz. This causes GuC FW load and several IGT's to start timing out
>   and gives rise to these Intel CI bugs. After this patch, writing 0 would
>   disable the PL1 limit instead of enabling it, avoiding the freq drop
>   issue.
> 
> v2: Add explanation for bugs mentioned below (Rodrigo)
> v3: Eliminate race during PL1 disable and verify (Tvrtko)
>     Change return to -ENODEV if verify fails (Tvrtko)
> 
> Link: https://gitlab.freedesktop.org/drm/intel/-/issues/8062
> Link: https://gitlab.freedesktop.org/drm/intel/-/issues/8060
> Signed-off-by: Ashutosh Dixit <ashutosh.dixit at intel.com>
> Reviewed-by: Rodrigo Vivi <rodrigo.vivi at intel.com>

pushed to drm-intel-next

> ---
>  .../ABI/testing/sysfs-driver-intel-i915-hwmon |  4 ++-
>  drivers/gpu/drm/i915/i915_hwmon.c             | 26 +++++++++++++++++++
>  2 files changed, 29 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon b/Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon
> index 2d6a472eef885..8d7d8f05f6cd0 100644
> --- a/Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon
> +++ b/Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon
> @@ -14,7 +14,9 @@ Description:	RW. Card reactive sustained  (PL1/Tau) power limit in microwatts.
>  
>  		The power controller will throttle the operating frequency
>  		if the power averaged over a window (typically seconds)
> -		exceeds this limit.
> +		exceeds this limit. A read value of 0 means that the PL1
> +		power limit is disabled, writing 0 disables the
> +		limit. Writing values > 0 will enable the power limit.
>  
>  		Only supported for particular Intel i915 graphics platforms.
>  
> diff --git a/drivers/gpu/drm/i915/i915_hwmon.c b/drivers/gpu/drm/i915/i915_hwmon.c
> index 596dd2c070106..8e7dccc8d3a0e 100644
> --- a/drivers/gpu/drm/i915/i915_hwmon.c
> +++ b/drivers/gpu/drm/i915/i915_hwmon.c
> @@ -349,6 +349,8 @@ hwm_power_is_visible(const struct hwm_drvdata *ddat, u32 attr, int chan)
>  	}
>  }
>  
> +#define PL1_DISABLE 0
> +
>  /*
>   * HW allows arbitrary PL1 limits to be set but silently clamps these values to
>   * "typical but not guaranteed" min/max values in rg.pkg_power_sku. Follow the
> @@ -362,6 +364,14 @@ hwm_power_max_read(struct hwm_drvdata *ddat, long *val)
>  	intel_wakeref_t wakeref;
>  	u64 r, min, max;
>  
> +	/* Check if PL1 limit is disabled */
> +	with_intel_runtime_pm(ddat->uncore->rpm, wakeref)
> +		r = intel_uncore_read(ddat->uncore, hwmon->rg.pkg_rapl_limit);
> +	if (!(r & PKG_PWR_LIM_1_EN)) {
> +		*val = PL1_DISABLE;
> +		return 0;
> +	}
> +
>  	*val = hwm_field_read_and_scale(ddat,
>  					hwmon->rg.pkg_rapl_limit,
>  					PKG_PWR_LIM_1,
> @@ -385,8 +395,24 @@ static int
>  hwm_power_max_write(struct hwm_drvdata *ddat, long val)
>  {
>  	struct i915_hwmon *hwmon = ddat->hwmon;
> +	intel_wakeref_t wakeref;
>  	u32 nval;
>  
> +	/* Disable PL1 limit and verify, because the limit cannot be disabled on all platforms */
> +	if (val == PL1_DISABLE) {
> +		mutex_lock(&hwmon->hwmon_lock);
> +		with_intel_runtime_pm(ddat->uncore->rpm, wakeref) {
> +			intel_uncore_rmw(ddat->uncore, hwmon->rg.pkg_rapl_limit,
> +					 PKG_PWR_LIM_1_EN, 0);
> +			nval = intel_uncore_read(ddat->uncore, hwmon->rg.pkg_rapl_limit);
> +		}
> +		mutex_unlock(&hwmon->hwmon_lock);
> +
> +		if (nval & PKG_PWR_LIM_1_EN)
> +			return -ENODEV;
> +		return 0;
> +	}
> +
>  	/* Computation in 64-bits to avoid overflow. Round to nearest. */
>  	nval = DIV_ROUND_CLOSEST_ULL((u64)val << hwmon->scl_shift_power, SF_POWER);
>  	nval = PKG_PWR_LIM_1_EN | REG_FIELD_PREP(PKG_PWR_LIM_1, nval);
> -- 
> 2.38.0
> 


More information about the Intel-gfx mailing list