[PATCH v7 3/4] drm/xe/hwmon: Update xe_hwmon_process_reg

Thu Apr 18 13:14:54 UTC 2024

On Tue, Apr 09, 2024 at 02:52:07PM GMT, Rodrigo Vivi wrote:
>On Tue, Apr 09, 2024 at 12:52:34PM -0500, Lucas De Marchi wrote:
>> On Fri, Apr 05, 2024 at 06:31:26PM +0530, Karthik Poosa wrote:
>> > Return u64 from xe_hwmon_process_reg, instead of a void return.
>> > u64* input pointer not needed with this change.
>> >
>> > With this caller can directly assign return value to a variable without
>> > need of explicit initialization and pass by reference.
>> >
>> > v2:
>> > - Fix checkpatch warnings.
>> >
>> > v3:
>> > - Rebase
>> > - Removed unncessary break statements.
>> >
>> > Signed-off-by: Karthik Poosa <karthik.poosa at intel.com>
>> > Suggested-by: Lucas De Marchi <lucas.demarchi at intel.com>
>> > Cc: Badal Nilawar <badal.nilawar at intel.com>
>>
>> Applied the other patches.  This one I'm putting on hold to think about.
>>
>> I'm not sure the approach in that hwmon in general is good with the
>> xe_hwmon_get_reg() + xe_hwmon_process_reg(). It seems it's even taking
>> some pm refs when it doesn't need (to decide if attribute is visible).
>
>I believe this approach is fine.
>We do need to earlier get pm refs if we believe that there will be mmio
>operations underneath. Better more then less in this case.

see this example:

static umode_t xe_hwmon_attributes_visible(struct kobject *kobj,                                         
                                            struct attribute *attr, int index)                            
{                                                                                                        
         struct device *dev = kobj_to_dev(kobj);                                                          
         struct xe_hwmon *hwmon = dev_get_drvdata(dev);                                                   
         int ret = 0;                                                                                     
                                                                                                          
         xe_pm_runtime_get(gt_to_xe(hwmon->gt));                                                          
                                                                                                          
         ret = xe_reg_is_valid(xe_hwmon_get_reg(hwmon, REG_PKG_RAPL_LIMIT, index)) ? attr->mode : 0;      
                                                                                                          
         xe_pm_runtime_put(gt_to_xe(hwmon->gt));                                                          
                                                                                                          
         return ret;                                                                                      
}                                                                                                        
                                                                                                          
that xe_pm_runtime_get() is totally bogus. We are basically doing a
SW-only check, calling xe_hwmon_get_reg() and xe_reg_is_valid().
It's only xe_hwmon_process_reg() that actually read/write/rmw anything.
I think we are following the approach "upon any entry from sysfs,
xe_pm_runtime_get(), lock hwmon, etc", which I don't like.

Besides that...  xe_hwmon_process_reg() is an umbrella function for no
good reason. The caller hardcodes the OP. It could very well have called
the right function and just didn't because xe_hwmon_process_reg() also
bundles getting the reg.

taking xe_hwmon_power_max_read as example, why can't we
split the get_reg() / read_reg() like below?

diff --git a/drivers/gpu/drm/xe/xe_hwmon.c b/drivers/gpu/drm/xe/xe_hwmon.c
index 453e601ddd5e..8ce8d9a66df9 100644
--- a/drivers/gpu/drm/xe/xe_hwmon.c
+++ b/drivers/gpu/drm/xe/xe_hwmon.c
@@ -160,10 +160,22 @@ static void xe_hwmon_process_reg(struct xe_hwmon *hwmon, enum xe_hwmon_reg hwmon
  static void xe_hwmon_power_max_read(struct xe_hwmon *hwmon, int channel, long *value)
  {
         u64 reg_val, min, max;
+       struct xe_reg reg_power_sku, reg_rapl_limit;
+
+       reg_rapl_limit = xe_hwmon_get_reg(hwmon, REG_PKG_RAPL_LIMIT, channel);
+       reg_power_sku = xe_hwmon_get_reg(hwmon, REG_PKG_POWER_SKU, channel);
+
+       /* XXX could probably be xe_gt_assert() or add a warning */
+       if (!xe_reg_is_valid(reg_power_sku) || !xe_reg_is_valid(reg_rapl_limit))
+               return;

         mutex_lock(&hwmon->hwmon_lock);

-       xe_hwmon_process_reg(hwmon, REG_PKG_RAPL_LIMIT, REG_READ32, &reg_val, 0, 0, channel);
+       reg_val = xe_mmio_read32(hwmon->gt, reg_rapl_limit);
         /* Check if PL1 limit is disabled */
         if (!(reg_val & PKG_PWR_LIM_1_EN)) {
                 *value = PL1_DISABLE;
@@ -173,7 +185,7 @@ static void xe_hwmon_power_max_read(struct xe_hwmon *hwmon, int channel, long *v
         reg_val = REG_FIELD_GET(PKG_PWR_LIM_1, reg_val);
         *value = mul_u64_u32_shr(reg_val, SF_POWER, hwmon->scl_shift_power);

-       xe_hwmon_process_reg(hwmon, REG_PKG_POWER_SKU, REG_READ64, &reg_val, 0, 0, channel);
+       reg_val = xe_mmio_read64_2x32(hwmon->gt, reg_power_sku);
         min = REG_FIELD_GET(PKG_MIN_PWR, reg_val);
         min = mul_u64_u32_shr(min, SF_POWER, hwmon->scl_shift_power);
         max = REG_FIELD_GET(PKG_MAX_PWR, reg_val);


Yes, it's longer, but it also uncovers one issue that could have been a
silent error: if there's a programming mistake and
xe_hwmon_process_reg() returns a valid register for REG_PKG_RAPL_LIMIT
but not REG_PKG_POWER_SKU, we'd basically use the value from the first
register in the calculation, which is wrong and with no
warning/error/whatever. Btw, should we change the visible()
method to cover that?

I's my personal feeling. I don't like the pattern xe_hwmon.c is using
since it's very easy to introduce buggy code. If someone wants to give a
r-b and merge this particular patch, fine. But I do think this pattern
should be changed.

Lucas De Marchi

>
>>
>> Lucas De Marchi