[PATCH v1] drm/xe/hwmon: expose package and vram temperature
Nilawar, Badal
badal.nilawar at intel.com
Fri Jan 24 14:57:14 UTC 2025
On 24-01-2025 18:03, Raag Jadav wrote:
> On Fri, Jan 24, 2025 at 05:29:16PM +0530, Nilawar, Badal wrote:
>> On 24-01-2025 11:46, Riana Tauro wrote:
>>> Hi Raag
>>>
>>> On 1/23/2025 8:21 AM, Raag Jadav wrote:
>>>> On Tue, Jan 21, 2025 at 01:56:05PM +0530, Riana Tauro wrote:
>>>>> Hi Raag
>>>>>
>>>>> On 1/8/2025 2:54 PM, Raag Jadav wrote:
>>>>>> Add hwmon support for temp1_input and temp2_input
>>>>>> attributes, which will
>>>>>> expose package and vram temperature in millidegree Celsius.
>>>>>> With this in
>>>>>> place we can monitor temperature using lm-sensors tool.
>>>>>>
>>>>>> Signed-off-by: Raag Jadav <raag.jadav at intel.com>
>>>>>> ---
>>>>>> .../ABI/testing/sysfs-driver-intel-xe-hwmon | 16 +++++
>>>>>> drivers/gpu/drm/xe/regs/xe_mchbar_regs.h | 3 +
>>>>>> drivers/gpu/drm/xe/regs/xe_pcode_regs.h | 2 +
>>>>>> drivers/gpu/drm/xe/xe_hwmon.c | 63
>>>>>> +++++++++++++++++++
>>>>>> 4 files changed, 84 insertions(+)
>>>>>>
>>>>>> diff --git
>>>>>> a/Documentation/ABI/testing/sysfs-driver-intel-xe-hwmon
>>>>>> b/Documentation/ABI/testing/sysfs-driver-intel-xe-hwmon
>>>>>> index d792a56f59ac..998cfb0ee1a6 100644
>>>>>> --- a/Documentation/ABI/testing/sysfs-driver-intel-xe-hwmon
>>>>>> +++ b/Documentation/ABI/testing/sysfs-driver-intel-xe-hwmon
>>>>>> @@ -108,3 +108,19 @@ Contact: intel-xe at lists.freedesktop.org
>>>>>> Description: RO. Package current voltage in millivolt.
>>>>>> Only supported for particular Intel Xe graphics platforms.
>>>>>> +
>>>>>> +What: /sys/bus/pci/drivers/xe/.../hwmon/hwmon<i>/temp1_input
>>>>>> +Date: April 2025
>>>>>> +KernelVersion: 6.15
>>>>>> +Contact: intel-xe at lists.freedesktop.org
>>>>>> +Description: RO. Package temperature in millidegree Celsius.
>>>>>> +
>>>>>> + Only supported for particular Intel Xe graphics platforms.
>>>>>> +
>>>>>> +What: /sys/bus/pci/drivers/xe/.../hwmon/hwmon<i>/temp2_input
>>>>>> +Date: April 2025
>>>>>> +KernelVersion: 6.15
>>>>>> +Contact: intel-xe at lists.freedesktop.org
>>>>>> +Description: RO. VRAM temperature in millidegree Celsius.
>>>>>> +
>>>>>> + Only supported for particular Intel Xe graphics platforms.
>>>>>> diff --git a/drivers/gpu/drm/xe/regs/xe_mchbar_regs.h
>>>>>> b/drivers/gpu/drm/xe/regs/xe_mchbar_regs.h
>>>>>> index 519dd1067a19..f5e5234857c1 100644
>>>>>> --- a/drivers/gpu/drm/xe/regs/xe_mchbar_regs.h
>>>>>> +++ b/drivers/gpu/drm/xe/regs/xe_mchbar_regs.h
>>>>>> @@ -34,6 +34,9 @@
>>>>>> #define PCU_CR_PACKAGE_ENERGY_STATUS
>>>>>> XE_REG(MCHBAR_MIRROR_BASE_SNB + 0x593c)
>>>>>> +#define PCU_CR_PACKAGE_TEMPERATURE
>>>>>> XE_REG(MCHBAR_MIRROR_BASE_SNB + 0x5978)
>>>>>> +#define TEMP_MASK REG_GENMASK(7, 0)
>>>>>> +
>>>>>> #define PCU_CR_PACKAGE_RAPL_LIMIT
>>>>>> XE_REG(MCHBAR_MIRROR_BASE_SNB + 0x59a0)
>>>>>> #define PKG_PWR_LIM_1 REG_GENMASK(14, 0)
>>>>>> #define PKG_PWR_LIM_1_EN REG_BIT(15)
>>>>>> diff --git a/drivers/gpu/drm/xe/regs/xe_pcode_regs.h
>>>>>> b/drivers/gpu/drm/xe/regs/xe_pcode_regs.h
>>>>>> index 0b0b49d850ae..8846eb9ce2a4 100644
>>>>>> --- a/drivers/gpu/drm/xe/regs/xe_pcode_regs.h
>>>>>> +++ b/drivers/gpu/drm/xe/regs/xe_pcode_regs.h
>>>>>> @@ -21,6 +21,8 @@
>>>>>> #define BMG_PACKAGE_POWER_SKU XE_REG(0x138098)
>>>>>> #define BMG_PACKAGE_POWER_SKU_UNIT XE_REG(0x1380dc)
>>>>>> #define BMG_PACKAGE_ENERGY_STATUS XE_REG(0x138120)
>>>>>> +#define BMG_VRAM_TEMPERATURE XE_REG(0x1382c0)
>>>>>> +#define BMG_PACKAGE_TEMPERATURE XE_REG(0x138434)
>>>>> indentation.
>>>> It's a git quirk, you won't see it in file.
>>>>
>>>>> Also you are using the same for DG2. Should have a common name
>>>> Just following the conventions.
>>> Did not find this convention in the file.
>>> BMG_VRAM_TEMPERATURE is used in both dg2 and bmg and has a bmg prefix.
>>> Doesn't seem right
>>>>>> #define BMG_PACKAGE_RAPL_LIMIT XE_REG(0x138440)
>>>>>> #define BMG_PLATFORM_ENERGY_STATUS XE_REG(0x138458)
>>>>>> #define BMG_PLATFORM_POWER_LIMIT XE_REG(0x138460)
>>>>>> diff --git a/drivers/gpu/drm/xe/xe_hwmon.c
>>>>>> b/drivers/gpu/drm/xe/xe_hwmon.c
>>>>>> index fde56dad3ab7..5b5c844adf4a 100644
>>>>>> --- a/drivers/gpu/drm/xe/xe_hwmon.c
>>>>>> +++ b/drivers/gpu/drm/xe/xe_hwmon.c
>>>>>> @@ -6,6 +6,7 @@
>>>>>> #include <linux/hwmon-sysfs.h>
>>>>>> #include <linux/hwmon.h>
>>>>>> #include <linux/types.h>
>>>>>> +#include <linux/units.h>
>>>>>> #include <drm/drm_managed.h>
>>>>>> #include "regs/xe_gt_regs.h"
>>>>>> @@ -20,6 +21,7 @@
>>>>>> #include "xe_pm.h"
>>>>>> enum xe_hwmon_reg {
>>>>>> + REG_TEMP,
>>>>> add to the end
>>>>>> REG_PKG_RAPL_LIMIT,
>>>>>> REG_PKG_POWER_SKU,
>>>>>> REG_PKG_POWER_SKU_UNIT,
>>>>>> @@ -39,6 +41,11 @@ enum xe_hwmon_channel {
>>>>>> CHANNEL_MAX,
>>>>>> };
>>>>>> +enum xe_hwmon_temp {
>>>>>> + TEMP_PKG,
>>>>>> + TEMP_VRAM,
>>>>>> +};
>>>>> Can't the existing channel enum be used here?
>>>> Nope, that'd break the indexes.
>>> @badal/@karthik Are multiple indexes for the same channel okay?
>>>
>>> In the current code, for dg2 only channel 1 is exposed for power and
>>> channel 0 skipped. Something like that needs to be done here too?
>>
>> Thanks for looping me in this. Yes, Channel 0 represent card specific
>> attributes and Channel 1 represent package specific attributes. That's how
>> it should be followed.
>> With that BMG_PACKAGE_TEMPERATURE should go under CHANNEL_PKG. For
>> BMG_VRAM_TEMPERATURE new channel (channel 3) should be added in enum
>> xe_hwmon_channel.
> And how does that work with hwmon_channel_info?
Check curr_crit implementation.
HWMON_CHANNEL_INFO(curr, HWMON_C_LABEL, HWMON_C_CRIT | HWMON_C_LABEL)
Regards,
Badal
>
> Raag
More information about the Intel-xe
mailing list