[PATCH v1] drm/xe/hwmon: expose package and vram temperature

Nilawar, Badal badal.nilawar at intel.com
Fri Jan 24 14:57:14 UTC 2025


On 24-01-2025 18:03, Raag Jadav wrote:
> On Fri, Jan 24, 2025 at 05:29:16PM +0530, Nilawar, Badal wrote:
>> On 24-01-2025 11:46, Riana Tauro wrote:
>>> Hi Raag
>>>
>>> On 1/23/2025 8:21 AM, Raag Jadav wrote:
>>>> On Tue, Jan 21, 2025 at 01:56:05PM +0530, Riana Tauro wrote:
>>>>> Hi Raag
>>>>>
>>>>> On 1/8/2025 2:54 PM, Raag Jadav wrote:
>>>>>> Add hwmon support for temp1_input and temp2_input
>>>>>> attributes, which will
>>>>>> expose package and vram temperature in millidegree Celsius.
>>>>>> With this in
>>>>>> place we can monitor temperature using lm-sensors tool.
>>>>>>
>>>>>> Signed-off-by: Raag Jadav <raag.jadav at intel.com>
>>>>>> ---
>>>>>>     .../ABI/testing/sysfs-driver-intel-xe-hwmon   | 16 +++++
>>>>>>     drivers/gpu/drm/xe/regs/xe_mchbar_regs.h      |  3 +
>>>>>>     drivers/gpu/drm/xe/regs/xe_pcode_regs.h       |  2 +
>>>>>>     drivers/gpu/drm/xe/xe_hwmon.c                 | 63
>>>>>> +++++++++++++++++++
>>>>>>     4 files changed, 84 insertions(+)
>>>>>>
>>>>>> diff --git
>>>>>> a/Documentation/ABI/testing/sysfs-driver-intel-xe-hwmon
>>>>>> b/Documentation/ABI/testing/sysfs-driver-intel-xe-hwmon
>>>>>> index d792a56f59ac..998cfb0ee1a6 100644
>>>>>> --- a/Documentation/ABI/testing/sysfs-driver-intel-xe-hwmon
>>>>>> +++ b/Documentation/ABI/testing/sysfs-driver-intel-xe-hwmon
>>>>>> @@ -108,3 +108,19 @@ Contact: intel-xe at lists.freedesktop.org
>>>>>>     Description:    RO. Package current voltage in millivolt.
>>>>>>             Only supported for particular Intel Xe graphics platforms.
>>>>>> +
>>>>>> +What: /sys/bus/pci/drivers/xe/.../hwmon/hwmon<i>/temp1_input
>>>>>> +Date:        April 2025
>>>>>> +KernelVersion:    6.15
>>>>>> +Contact:    intel-xe at lists.freedesktop.org
>>>>>> +Description:    RO. Package temperature in millidegree Celsius.
>>>>>> +
>>>>>> +        Only supported for particular Intel Xe graphics platforms.
>>>>>> +
>>>>>> +What: /sys/bus/pci/drivers/xe/.../hwmon/hwmon<i>/temp2_input
>>>>>> +Date:        April 2025
>>>>>> +KernelVersion:    6.15
>>>>>> +Contact:    intel-xe at lists.freedesktop.org
>>>>>> +Description:    RO. VRAM temperature in millidegree Celsius.
>>>>>> +
>>>>>> +        Only supported for particular Intel Xe graphics platforms.
>>>>>> diff --git a/drivers/gpu/drm/xe/regs/xe_mchbar_regs.h
>>>>>> b/drivers/gpu/drm/xe/regs/xe_mchbar_regs.h
>>>>>> index 519dd1067a19..f5e5234857c1 100644
>>>>>> --- a/drivers/gpu/drm/xe/regs/xe_mchbar_regs.h
>>>>>> +++ b/drivers/gpu/drm/xe/regs/xe_mchbar_regs.h
>>>>>> @@ -34,6 +34,9 @@
>>>>>>     #define PCU_CR_PACKAGE_ENERGY_STATUS
>>>>>> XE_REG(MCHBAR_MIRROR_BASE_SNB + 0x593c)
>>>>>> +#define PCU_CR_PACKAGE_TEMPERATURE
>>>>>> XE_REG(MCHBAR_MIRROR_BASE_SNB + 0x5978)
>>>>>> +#define   TEMP_MASK                REG_GENMASK(7, 0)
>>>>>> +
>>>>>>     #define PCU_CR_PACKAGE_RAPL_LIMIT
>>>>>> XE_REG(MCHBAR_MIRROR_BASE_SNB + 0x59a0)
>>>>>>     #define   PKG_PWR_LIM_1                REG_GENMASK(14, 0)
>>>>>>     #define   PKG_PWR_LIM_1_EN            REG_BIT(15)
>>>>>> diff --git a/drivers/gpu/drm/xe/regs/xe_pcode_regs.h
>>>>>> b/drivers/gpu/drm/xe/regs/xe_pcode_regs.h
>>>>>> index 0b0b49d850ae..8846eb9ce2a4 100644
>>>>>> --- a/drivers/gpu/drm/xe/regs/xe_pcode_regs.h
>>>>>> +++ b/drivers/gpu/drm/xe/regs/xe_pcode_regs.h
>>>>>> @@ -21,6 +21,8 @@
>>>>>>     #define BMG_PACKAGE_POWER_SKU            XE_REG(0x138098)
>>>>>>     #define BMG_PACKAGE_POWER_SKU_UNIT XE_REG(0x1380dc)
>>>>>>     #define BMG_PACKAGE_ENERGY_STATUS        XE_REG(0x138120)
>>>>>> +#define BMG_VRAM_TEMPERATURE            XE_REG(0x1382c0)
>>>>>> +#define BMG_PACKAGE_TEMPERATURE            XE_REG(0x138434)
>>>>> indentation.
>>>> It's a git quirk, you won't see it in file.
>>>>
>>>>> Also you are using the same for DG2. Should have a common name
>>>> Just following the conventions.
>>> Did not find this convention in the file.
>>> BMG_VRAM_TEMPERATURE is used in both dg2 and bmg and has a bmg prefix.
>>> Doesn't seem right
>>>>>>     #define BMG_PACKAGE_RAPL_LIMIT            XE_REG(0x138440)
>>>>>>     #define BMG_PLATFORM_ENERGY_STATUS XE_REG(0x138458)
>>>>>>     #define BMG_PLATFORM_POWER_LIMIT        XE_REG(0x138460)
>>>>>> diff --git a/drivers/gpu/drm/xe/xe_hwmon.c
>>>>>> b/drivers/gpu/drm/xe/xe_hwmon.c
>>>>>> index fde56dad3ab7..5b5c844adf4a 100644
>>>>>> --- a/drivers/gpu/drm/xe/xe_hwmon.c
>>>>>> +++ b/drivers/gpu/drm/xe/xe_hwmon.c
>>>>>> @@ -6,6 +6,7 @@
>>>>>>     #include <linux/hwmon-sysfs.h>
>>>>>>     #include <linux/hwmon.h>
>>>>>>     #include <linux/types.h>
>>>>>> +#include <linux/units.h>
>>>>>>     #include <drm/drm_managed.h>
>>>>>>     #include "regs/xe_gt_regs.h"
>>>>>> @@ -20,6 +21,7 @@
>>>>>>     #include "xe_pm.h"
>>>>>>     enum xe_hwmon_reg {
>>>>>> +    REG_TEMP,
>>>>> add to the end
>>>>>>         REG_PKG_RAPL_LIMIT,
>>>>>>         REG_PKG_POWER_SKU,
>>>>>>         REG_PKG_POWER_SKU_UNIT,
>>>>>> @@ -39,6 +41,11 @@ enum xe_hwmon_channel {
>>>>>>         CHANNEL_MAX,
>>>>>>     };
>>>>>> +enum xe_hwmon_temp {
>>>>>> +    TEMP_PKG,
>>>>>> +    TEMP_VRAM,
>>>>>> +};
>>>>> Can't the existing channel enum be used here?
>>>> Nope, that'd break the indexes.
>>> @badal/@karthik Are multiple indexes for the same channel okay?
>>>
>>> In the current code, for dg2 only channel 1 is exposed for power and
>>> channel 0 skipped. Something like that needs to be done here too?
>>
>> Thanks for looping me in this. Yes, Channel 0 represent card specific
>> attributes and Channel 1 represent package specific attributes. That's how
>> it should be followed.
>> With that BMG_PACKAGE_TEMPERATURE should go under CHANNEL_PKG. For
>> BMG_VRAM_TEMPERATURE new channel (channel 3) should be added in enum
>> xe_hwmon_channel.
> And how does that work with hwmon_channel_info?

Check curr_crit implementation.
HWMON_CHANNEL_INFO(curr, HWMON_C_LABEL, HWMON_C_CRIT | HWMON_C_LABEL)

Regards,
Badal

>
> Raag


More information about the Intel-xe mailing list