[Intel-gfx] [PATCH] drm/i915/guc: Disable PL1 power limit when loading GuC firmware

Belgaumkar, Vinay vinay.belgaumkar at intel.com
Sat Mar 25 00:06:33 UTC 2023


On 3/24/2023 4:31 PM, Dixit, Ashutosh wrote:
> On Fri, 24 Mar 2023 11:15:02 -0700, Belgaumkar, Vinay wrote:
> Hi Vinay,
>
> Thanks for the review. Comments inline below.
Sorry about asking the same questions all over again :) Didn't look at 
previous versions.
>
>> On 3/15/2023 8:59 PM, Ashutosh Dixit wrote:
>>> On dGfx, the PL1 power limit being enabled and set to a low value results
>>> in a low GPU operating freq. It also negates the freq raise operation which
>>> is done before GuC firmware load. As a result GuC firmware load can time
>>> out. Such timeouts were seen in the GL #8062 bug below (where the PL1 power
>>> limit was enabled and set to a low value). Therefore disable the PL1 power
>>> limit when allowed by HW when loading GuC firmware.
>> v3 label missing in subject.
>>> v2:
>>>    - Take mutex (to disallow writes to power1_max) across GuC reset/fw load
>>>    - Add hwm_power_max_restore to error return code path
>>>
>>> v3 (Jani N):
>>>    - Add/remove explanatory comments
>>>    - Function renames
>>>    - Type corrections
>>>    - Locking annotation
>>>
>>> Link: https://gitlab.freedesktop.org/drm/intel/-/issues/8062
>>> Signed-off-by: Ashutosh Dixit <ashutosh.dixit at intel.com>
>>> ---
>>>    drivers/gpu/drm/i915/gt/uc/intel_uc.c |  9 +++++++
>>>    drivers/gpu/drm/i915/i915_hwmon.c     | 39 +++++++++++++++++++++++++++
>>>    drivers/gpu/drm/i915/i915_hwmon.h     |  7 +++++
>>>    3 files changed, 55 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
>>> index 4ccb4be4c9cba..aa8e35a5636a0 100644
>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
>>> @@ -18,6 +18,7 @@
>>>    #include "intel_uc.h"
>>>      #include "i915_drv.h"
>>> +#include "i915_hwmon.h"
>>>      static const struct intel_uc_ops uc_ops_off;
>>>    static const struct intel_uc_ops uc_ops_on;
>>> @@ -461,6 +462,7 @@ static int __uc_init_hw(struct intel_uc *uc)
>>> 	struct intel_guc *guc = &uc->guc;
>>> 	struct intel_huc *huc = &uc->huc;
>>> 	int ret, attempts;
>>> +	bool pl1en;
>> Init to 'false' here
> See next comment.
>
>>
>>> 		GEM_BUG_ON(!intel_uc_supports_guc(uc));
>>> 	GEM_BUG_ON(!intel_uc_wants_guc(uc));
>>> @@ -491,6 +493,9 @@ static int __uc_init_hw(struct intel_uc *uc)
>>> 	else
>>> 		attempts = 1;
>>>    +	/* Disable a potentially low PL1 power limit to allow freq to be
>>> raised */
>>> +	i915_hwmon_power_max_disable(gt->i915, &pl1en);
>>> +
>>> 	intel_rps_raise_unslice(&uc_to_gt(uc)->rps);
>>> 		while (attempts--) {
>>> @@ -547,6 +552,8 @@ static int __uc_init_hw(struct intel_uc *uc)
>>> 		intel_rps_lower_unslice(&uc_to_gt(uc)->rps);
>>> 	}
>>>    +	i915_hwmon_power_max_restore(gt->i915, pl1en);
>>> +
>>> 	guc_info(guc, "submission %s\n", str_enabled_disabled(intel_uc_uses_guc_submission(uc)));
>>> 	guc_info(guc, "SLPC %s\n", str_enabled_disabled(intel_uc_uses_guc_slpc(uc)));
>>>    @@ -563,6 +570,8 @@ static int __uc_init_hw(struct intel_uc *uc)
>>> 	/* Return GT back to RPn */
>>> 	intel_rps_lower_unslice(&uc_to_gt(uc)->rps);
>>>    +	i915_hwmon_power_max_restore(gt->i915, pl1en);
>> if (pl1en)
>>
>>      i915_hwmon_power_max_enable().
> IMO it's better not to have checks in the main __uc_init_hw() function (if
> we do this we'll need to add 2 checks in __uc_init_hw()). If you really
> want we could do something like this inside
> i915_hwmon_power_max_disable/i915_hwmon_power_max_restore. But for now I
> am not making any changes.
ok.
>
> (I can send a patch with the changes if you want to take a look but IMO it
> will add more logic/code but without real benefits (it will save a rmw if
> the limit was already disabled, but IMO this code is called so infrequently
> (only during GuC resets) as to not have any significant impact)).
>
>>> +
>>> 	__uc_sanitize(uc);
>>> 		if (!ret) {
>>> diff --git a/drivers/gpu/drm/i915/i915_hwmon.c b/drivers/gpu/drm/i915/i915_hwmon.c
>>> index ee63a8fd88fc1..769b5bda4d53f 100644
>>> --- a/drivers/gpu/drm/i915/i915_hwmon.c
>>> +++ b/drivers/gpu/drm/i915/i915_hwmon.c
>>> @@ -444,6 +444,45 @@ hwm_power_write(struct hwm_drvdata *ddat, u32 attr, int chan, long val)
>>> 	}
>>>    }
>>>    +void i915_hwmon_power_max_disable(struct drm_i915_private *i915, bool
>>> *old)
>> Shouldn't we call this i915_hwmon_package_pl1_disable()?
> I did think of using "pl1" in the function name but then decided to retain
> "power_max" because other hwmon functions for PL1 limit also use
> "power_max" (hwm_power_max_read/hwm_power_max_write) and currently
> "hwmon_power_max" is mapped to the PL1 limit. So "power_max" is used to
> show that all these functions deal with the PL1 power limit.
>
> There is a comment in __uc_init_hw() explaining "power_max" means the PL1
> power limit.
ok.
>
>>> +	__acquires(i915->hwmon->hwmon_lock)
>>> +{
>>> +	struct i915_hwmon *hwmon = i915->hwmon;
>>> +	intel_wakeref_t wakeref;
>>> +	u32 r;
>>> +
>>> +	if (!hwmon || !i915_mmio_reg_valid(hwmon->rg.pkg_rapl_limit))
>>> +		return;
>>> +
>>> +	/* Take mutex to prevent concurrent hwm_power_max_write */
>>> +	mutex_lock(&hwmon->hwmon_lock);
>>> +
>>> +	with_intel_runtime_pm(hwmon->ddat.uncore->rpm, wakeref)
>>> +		r = intel_uncore_rmw(hwmon->ddat.uncore,
>>> +				     hwmon->rg.pkg_rapl_limit,
>>> +				     PKG_PWR_LIM_1_EN, 0);
>> Most of this code (lock and rmw parts) is already inside static void
>> hwm_locked_with_pm_intel_uncore_rmw() , can we reuse that here?
> This was the case in v1 of the patch:
>
> https://patchwork.freedesktop.org/patch/526393/?series=115003&rev=1
>
> But now this cannot be done because if you notice we acquire the mutex in
> i915_hwmon_power_max_disable() and release the mutex in
> i915_hwmon_power_max_restore().
>
> I explained the reason why this the mutex is handled this way in my reply
> to Jani Nikula here:
>
> https://patchwork.freedesktop.org/patch/526598/?series=115003&rev=2
>
> Quoting below:
>
> ```
>>> +	/* hwmon_lock mutex is unlocked in hwm_power_max_restore */
>> Not too happy about that... any better ideas?
> Afais, taking the mutex is the only fully correct solution (when we disable
> the power limit, userspace can go re-enable it). Examples of partly
> incorrect solutions (which don't take the mutex) include:
>
> a. Don't take the mutex, don't do anything, ignore any changes to the value
>     if it has changed during GuC reset/fw load (just overwrite the changed
>     value). Con: changed value is lost.
>
> b. Detect if the value has changed (the limit has been re-enabled) after we
>     have disabled the limit and in that case skip restoring the value. But
>     then someone can say why do we allow enabling the PL1 limit since we
>     want to disable it.
>
> Both these are very unlikely scenarios so they might work. But I would
> first like to explore if holding a mutex across GuC reset is prolebmatic
> since that is /the/ correct solution. But if anyone comes up with a reason
> why that cannot be done we can look at these other not completely correct
> options.

Well, one reason is that this is adding a lot of duplicate/non-reusable 
code needlessly. If it gets re-used elsewhere, that could lead to some 
weird situations where the lock could be held for an extended period of 
time and introduce dependencies. Also, how/why would the user modify 
this PL1 during guc load? The sysfs interfaces are not even ready at 
this point? Even if we consider this during a resume, the terminal will 
not be available to the user.

Thanks,

Vinay.

> ```
>
>>> +
>>> +	*old = !!(r & PKG_PWR_LIM_1_EN);
>>> +}
>>> +
>>> +void i915_hwmon_power_max_restore(struct drm_i915_private *i915, bool old)
>>> +	__releases(i915->hwmon->hwmon_lock)
>> We can just call this i915_hwmon_power_max_enable() and call whenever the
>> old value was actually enabled. That way, we have proper mirror functions.
> As I explained that would mean adding two checks in the main __uc_init_hw()
> function which I am trying to avoid. So we have disable/restore pair.
>
>>> +{
>>> +	struct i915_hwmon *hwmon = i915->hwmon;
>>> +	intel_wakeref_t wakeref;
>>> +
>>> +	if (!hwmon || !i915_mmio_reg_valid(hwmon->rg.pkg_rapl_limit))
>>> +		return;
>>> +
>>> +	with_intel_runtime_pm(hwmon->ddat.uncore->rpm, wakeref)
>>> +		intel_uncore_rmw(hwmon->ddat.uncore,
>>> +				 hwmon->rg.pkg_rapl_limit,
>>> +				 PKG_PWR_LIM_1_EN,
>>> +				 old ? PKG_PWR_LIM_1_EN : 0);
>> 3rd param should be 0 here, else we will end up clearing other bits.
> No see intel_uncore_rmw(), it will only clear the PKG_PWR_LIM_1_EN bit, so
> the code here is correct. intel_uncore_rmw() does:
>
>          val = (old & ~clear) | set;
Ok, just confusing, since you are also setting it with the 4th param.
>
> So for now I am not making any changes, if you feel strongly about
> something one way or another let me know. Anyway these comments should help
> you understand the patch better so take a look and we can go from there.
>
> Thanks.
> --
> Ashutosh
>
>>> +
>>> +	mutex_unlock(&hwmon->hwmon_lock);
>>> +}
>>> +
>>>    static umode_t
>>>    hwm_energy_is_visible(const struct hwm_drvdata *ddat, u32 attr)
>>>    {
>>> diff --git a/drivers/gpu/drm/i915/i915_hwmon.h b/drivers/gpu/drm/i915/i915_hwmon.h
>>> index 7ca9cf2c34c96..0fcb7de844061 100644
>>> --- a/drivers/gpu/drm/i915/i915_hwmon.h
>>> +++ b/drivers/gpu/drm/i915/i915_hwmon.h
>>> @@ -7,14 +7,21 @@
>>>    #ifndef __I915_HWMON_H__
>>>    #define __I915_HWMON_H__
>>>    +#include <linux/types.h>
>>> +
>>>    struct drm_i915_private;
>>> +struct intel_gt;
>>>      #if IS_REACHABLE(CONFIG_HWMON)
>>>    void i915_hwmon_register(struct drm_i915_private *i915);
>>>    void i915_hwmon_unregister(struct drm_i915_private *i915);
>>> +void i915_hwmon_power_max_disable(struct drm_i915_private *i915, bool *old);
>>> +void i915_hwmon_power_max_restore(struct drm_i915_private *i915, bool old);
>>>    #else
>>>    static inline void i915_hwmon_register(struct drm_i915_private *i915) { };
>>>    static inline void i915_hwmon_unregister(struct drm_i915_private *i915) { };
>>> +static inline void i915_hwmon_power_max_disable(struct drm_i915_private *i915, bool *old) { };
>>> +static inline void i915_hwmon_power_max_restore(struct drm_i915_private *i915, bool old) { };
>>>    #endif
>>>      #endif /* __I915_HWMON_H__ */


More information about the Intel-gfx mailing list