[Intel-xe] [PATCH v2 2/6] drm/xe/hwmon: Expose power attributes
Andi Shyti
andi.shyti at linux.intel.com
Thu Jun 29 14:09:31 UTC 2023
Hi Matt and Badal,
[...]
> > +};
> > +
> > +enum hwm_reg_operation {
> > + reg_read,
> > + reg_write,
> > + reg_rmw,
>
> Upper case?
>
> > +};
> > +
>
> s/hwm_reg/xe_hwm_reg
I agree with Matt here... throughout this series of patches the
naming is very confusing, sometimes starting with xe_*, sometimes
with hwm_*, sometimes with hwmon_*... there is no consistency.
Please, Badal, choos a consistent prefix and stick with it for
every function and global definition.
Matt suggested xe_hwmon_*, I'm fine with it.
[...]
> > struct hwm_drvdata {
> > struct xe_hwmon *hwmon;
> > struct device *hwmon_dev;
> > + struct xe_gt *gt;
> > char name[12];
> > + bool reset_in_progress;
> > + wait_queue_head_t waitq;
> > };
> >
> > struct xe_hwmon {
> > struct hwm_drvdata ddat;
> > struct mutex hwmon_lock;
> > + int scl_shift_power;
> > };
> >
>
> Same as previous patch, 1 struct seems like a better idea to me.
I made the same comment in the previous patch.
[...]
> > + switch (reg_name) {
> > + case pkg_rapl_limit:
> > + if (xe->info.platform == XE_DG2)
> > + return PCU_CR_PACKAGE_RAPL_LIMIT;
> > + else if (xe->info.platform == XE_PVC)
> > + return PVC_GT0_PACKAGE_RAPL_LIMIT;
> > + break;
> > + case pkg_power_sku:
> > + if (xe->info.platform == XE_DG2)
> > + return PCU_CR_PACKAGE_POWER_SKU;
> > + else if (xe->info.platform == XE_PVC)
> > + return PVC_GT0_PACKAGE_POWER_SKU;
> > + break;
> > + case pkg_power_sku_unit:
> > + if (xe->info.platform == XE_DG2)
> > + return PCU_CR_PACKAGE_POWER_SKU_UNIT;
> > + else if (xe->info.platform == XE_PVC)
> > + return PVC_GT0_PACKAGE_POWER_SKU_UNIT;
> > + break;
> > + default:
>
> BUG_ON or WARN_ON saying not possible?
MISSING_CASE() is in i915_utils.h, perhaps we can move it to a
more generic place... it would be at handy here.
> > +static int hwm_power_max_read(struct hwm_drvdata *ddat, long *value)
>
> The return value is always 0, why not return value?
>
> s/hwm_power_max_read/xe_hwmon_power_max_read
>
> > +{
> > + struct xe_hwmon *hwmon = ddat->hwmon;
> > + u32 reg_val;
> > + u64 r, min, max;
> > +
> > + xe_device_mem_access_get(gt_to_xe(ddat->gt));
> > +
>
> Same as above, use xe_device_assert_mem_access.
>
> > + process_hwmon_reg(ddat, pkg_rapl_limit, reg_read, ®_val, 0, 0);
> > + /* Check if PL1 limit is disabled */
> > + if (!(reg_val & PKG_PWR_LIM_1_EN)) {
> > + *value = PL1_DISABLE;
> > + xe_device_mem_access_put(gt_to_xe(ddat->gt));
> > + return 0;
> > + }
> > +
> > + reg_val = REG_FIELD_GET(PKG_PWR_LIM_1, reg_val);
> > + *value = mul_u64_u32_shr(reg_val, SF_POWER, hwmon->scl_shift_power);
> > +
> > + process_hwmon_reg_read64(ddat, pkg_power_sku, &r);
> > + min = REG_FIELD_GET(PKG_MIN_PWR, r);
> > + min = mul_u64_u32_shr(min, SF_POWER, hwmon->scl_shift_power);
> > + max = REG_FIELD_GET(PKG_MAX_PWR, r);
> > + max = mul_u64_u32_shr(max, SF_POWER, hwmon->scl_shift_power);
> > +
> > + if (min && max)
> > + *value = clamp_t(u64, *value, min, max);
> > +
> > + xe_device_mem_access_put(gt_to_xe(ddat->gt));
> > + return 0;
> > +}
> > +
> > +static int hwm_power_max_write(struct hwm_drvdata *ddat, long value)
> > +{
> > + struct xe_hwmon *hwmon = ddat->hwmon;
> > + DEFINE_WAIT(wait);
> > + int ret = 0;
> > + u32 nval;
> > +
> > + /* Block waiting for GuC reset to complete when needed */
> > + for (;;) {
with a do...while() you shouldn't need a for(;;)... your choice,
not going to beat on that.
> > + mutex_lock(&hwmon->hwmon_lock);
> > +
> > + prepare_to_wait(&ddat->waitq, &wait, TASK_INTERRUPTIBLE);
> > +
> > + if (!hwmon->ddat.reset_in_progress)
> > + break;
> > +
> > + if (signal_pending(current)) {
> > + ret = -EINTR;
> > + break;
cough! cough! unlock! cough! cough!
> > + }
> > +
> > + mutex_unlock(&hwmon->hwmon_lock);
> > +
> > + schedule();
> > + }
> > + finish_wait(&ddat->waitq, &wait);
> > + if (ret)
> > + goto unlock;
>
> Anyway to not open code this? We similar in with
> xe_guc_submit_reset_wait, could we expose a global reset in progress in
> function which we can expose at the gt level?
>
> > +
> > + xe_device_mem_access_get(gt_to_xe(ddat->gt));
> > +
>
> This certainly is an outer most thing, e.g. doing this under
> hwmon->hwmon_lock seems dangerous. Again the upper levels of the stack
> should do the xe_device_mem_access_get, which it does. Just pointing out
> doing xe_device_mem_access_get/put under a lock isn't a good idea.
>
> Also the the loop which acquires hwmon->hwmon_lock is confusing too.
>
> > + /* Disable PL1 limit and verify, as limit cannot be disabled on all platforms */
> > + if (value == PL1_DISABLE) {
> > + process_hwmon_reg(ddat, pkg_rapl_limit, reg_rmw, &nval,
> > + PKG_PWR_LIM_1_EN, 0);
> > + process_hwmon_reg(ddat, pkg_rapl_limit, reg_read, &nval,
> > + PKG_PWR_LIM_1_EN, 0);
> > +
> > + if (nval & PKG_PWR_LIM_1_EN)
> > + ret = -ENODEV;
> > + goto exit;
cough! cough! lock! cough! cough!
> > + }
> > +
> > + /* Computation in 64-bits to avoid overflow. Round to nearest. */
> > + nval = DIV_ROUND_CLOSEST_ULL((u64)value << hwmon->scl_shift_power, SF_POWER);
> > + nval = PKG_PWR_LIM_1_EN | REG_FIELD_PREP(PKG_PWR_LIM_1, nval);
> > +
> > + process_hwmon_reg(ddat, pkg_rapl_limit, reg_rmw, &nval,
> > + PKG_PWR_LIM_1_EN | PKG_PWR_LIM_1, nval);
> > +exit:
> > + xe_device_mem_access_put(gt_to_xe(ddat->gt));
> > +unlock:
> > + mutex_unlock(&hwmon->hwmon_lock);
> > +
mmhhh???... jokes apart this is so error prone that it will
deadlock as soon as someone will think of editing this file :)
It worried me already at the first part.
Please, as Matt said, have a more linear locking here.
[...]
Andi
More information about the Intel-xe
mailing list