[PATCH v7 3/4] drm/xe/hwmon: Update xe_hwmon_process_reg
Lucas De Marchi
lucas.demarchi at intel.com
Thu Apr 18 13:14:54 UTC 2024
On Tue, Apr 09, 2024 at 02:52:07PM GMT, Rodrigo Vivi wrote:
>On Tue, Apr 09, 2024 at 12:52:34PM -0500, Lucas De Marchi wrote:
>> On Fri, Apr 05, 2024 at 06:31:26PM +0530, Karthik Poosa wrote:
>> > Return u64 from xe_hwmon_process_reg, instead of a void return.
>> > u64* input pointer not needed with this change.
>> >
>> > With this caller can directly assign return value to a variable without
>> > need of explicit initialization and pass by reference.
>> >
>> > v2:
>> > - Fix checkpatch warnings.
>> >
>> > v3:
>> > - Rebase
>> > - Removed unncessary break statements.
>> >
>> > Signed-off-by: Karthik Poosa <karthik.poosa at intel.com>
>> > Suggested-by: Lucas De Marchi <lucas.demarchi at intel.com>
>> > Cc: Badal Nilawar <badal.nilawar at intel.com>
>>
>> Applied the other patches. This one I'm putting on hold to think about.
>>
>> I'm not sure the approach in that hwmon in general is good with the
>> xe_hwmon_get_reg() + xe_hwmon_process_reg(). It seems it's even taking
>> some pm refs when it doesn't need (to decide if attribute is visible).
>
>I believe this approach is fine.
>We do need to earlier get pm refs if we believe that there will be mmio
>operations underneath. Better more then less in this case.
see this example:
static umode_t xe_hwmon_attributes_visible(struct kobject *kobj,
struct attribute *attr, int index)
{
struct device *dev = kobj_to_dev(kobj);
struct xe_hwmon *hwmon = dev_get_drvdata(dev);
int ret = 0;
xe_pm_runtime_get(gt_to_xe(hwmon->gt));
ret = xe_reg_is_valid(xe_hwmon_get_reg(hwmon, REG_PKG_RAPL_LIMIT, index)) ? attr->mode : 0;
xe_pm_runtime_put(gt_to_xe(hwmon->gt));
return ret;
}
that xe_pm_runtime_get() is totally bogus. We are basically doing a
SW-only check, calling xe_hwmon_get_reg() and xe_reg_is_valid().
It's only xe_hwmon_process_reg() that actually read/write/rmw anything.
I think we are following the approach "upon any entry from sysfs,
xe_pm_runtime_get(), lock hwmon, etc", which I don't like.
Besides that... xe_hwmon_process_reg() is an umbrella function for no
good reason. The caller hardcodes the OP. It could very well have called
the right function and just didn't because xe_hwmon_process_reg() also
bundles getting the reg.
taking xe_hwmon_power_max_read as example, why can't we
split the get_reg() / read_reg() like below?
diff --git a/drivers/gpu/drm/xe/xe_hwmon.c b/drivers/gpu/drm/xe/xe_hwmon.c
index 453e601ddd5e..8ce8d9a66df9 100644
--- a/drivers/gpu/drm/xe/xe_hwmon.c
+++ b/drivers/gpu/drm/xe/xe_hwmon.c
@@ -160,10 +160,22 @@ static void xe_hwmon_process_reg(struct xe_hwmon *hwmon, enum xe_hwmon_reg hwmon
static void xe_hwmon_power_max_read(struct xe_hwmon *hwmon, int channel, long *value)
{
u64 reg_val, min, max;
+ struct xe_reg reg_power_sku, reg_rapl_limit;
+
+ reg_rapl_limit = xe_hwmon_get_reg(hwmon, REG_PKG_RAPL_LIMIT, channel);
+ reg_power_sku = xe_hwmon_get_reg(hwmon, REG_PKG_POWER_SKU, channel);
+
+ /* XXX could probably be xe_gt_assert() or add a warning */
+ if (!xe_reg_is_valid(reg_power_sku) || !xe_reg_is_valid(reg_rapl_limit))
+ return;
mutex_lock(&hwmon->hwmon_lock);
- xe_hwmon_process_reg(hwmon, REG_PKG_RAPL_LIMIT, REG_READ32, ®_val, 0, 0, channel);
+ reg_val = xe_mmio_read32(hwmon->gt, reg_rapl_limit);
/* Check if PL1 limit is disabled */
if (!(reg_val & PKG_PWR_LIM_1_EN)) {
*value = PL1_DISABLE;
@@ -173,7 +185,7 @@ static void xe_hwmon_power_max_read(struct xe_hwmon *hwmon, int channel, long *v
reg_val = REG_FIELD_GET(PKG_PWR_LIM_1, reg_val);
*value = mul_u64_u32_shr(reg_val, SF_POWER, hwmon->scl_shift_power);
- xe_hwmon_process_reg(hwmon, REG_PKG_POWER_SKU, REG_READ64, ®_val, 0, 0, channel);
+ reg_val = xe_mmio_read64_2x32(hwmon->gt, reg_power_sku);
min = REG_FIELD_GET(PKG_MIN_PWR, reg_val);
min = mul_u64_u32_shr(min, SF_POWER, hwmon->scl_shift_power);
max = REG_FIELD_GET(PKG_MAX_PWR, reg_val);
Yes, it's longer, but it also uncovers one issue that could have been a
silent error: if there's a programming mistake and
xe_hwmon_process_reg() returns a valid register for REG_PKG_RAPL_LIMIT
but not REG_PKG_POWER_SKU, we'd basically use the value from the first
register in the calculation, which is wrong and with no
warning/error/whatever. Btw, should we change the visible()
method to cover that?
I's my personal feeling. I don't like the pattern xe_hwmon.c is using
since it's very easy to introduce buggy code. If someone wants to give a
r-b and merge this particular patch, fine. But I do think this pattern
should be changed.
Lucas De Marchi
>
>>
>> Lucas De Marchi
More information about the Intel-xe
mailing list