[PATCH 1/4] drm/xe/pmu: Enable PMU interface

Umesh Nerlige Ramappa umesh.nerlige.ramappa at intel.com
Wed Nov 27 18:47:28 UTC 2024


On Wed, Nov 27, 2024 at 11:13:24AM -0600, Lucas De Marchi wrote:
>On Wed, Nov 27, 2024 at 12:42:14PM +0530, Riana Tauro wrote:
>>Hi Lucas
>>
>>On 11/26/2024 12:37 AM, Umesh Nerlige Ramappa wrote:
>>>On Mon, Sep 02, 2024 at 10:01:11PM -0500, Lucas De Marchi wrote:
>>>>On Mon, Sep 02, 2024 at 11:29:05AM GMT, Aravind Iddamsetty wrote:
>>>>>
>>>>>On 30/08/24 03:40, Belgaumkar, Vinay wrote:
>>>>>>
>>>>>>On 8/28/2024 12:33 PM, Lucas De Marchi wrote:
>>>>>>>On Tue, Aug 27, 2024 at 09:41:04AM GMT, Vinay Belgaumkar wrote:
>>>>>>>>diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
>>>>>>>>index b6fbe4988f2e..de6f39db618c 100644
>>>>>>>>--- a/include/uapi/drm/xe_drm.h
>>>>>>>>+++ b/include/uapi/drm/xe_drm.h
>>>>>>>>@@ -1389,6 +1389,40 @@ struct drm_xe_wait_user_fence {
>>>>>>>>    __u64 reserved[2];
>>>>>>>>};
>>>>>>>>
>>>>>>>>+/**
>>>>>>>>+ * DOC: XE PMU event config IDs
>>>>>>>>+ *
>>>>>>>>+ * Check 'man perf_event_open' to use the ID's 
>>>>>>>>XE_PMU_XXXX listed in xe_drm.h
>>>>>>>>+ * in 'struct perf_event_attr' as part of 
>>>>>>>>perf_event_open syscall to read a
>>>>>>>>+ * particular event.
>>>>>>>>+ *
>>>>>>>>+ * For example to open the XE_PMU_RENDER_GROUP_BUSY(0):
>>>>>>>>+ *
>>>>>>>>+ * .. code-block:: C
>>>>>>>>+ *
>>>>>>>>+ *    struct perf_event_attr attr;
>>>>>>>>+ *    long long count;
>>>>>>>>+ *    int cpu = 0;
>>>>>>>>+ *    int fd;
>>>>>>>>+ *
>>>>>>>>+ *    memset(&attr, 0, sizeof(struct perf_event_attr));
>>>>>>>>+ *    attr.type = type; // eg: 
>>>>>>>>/sys/bus/event_source/devices/ xe_0000_56_00.0/type
>>>>>>>>+ *    attr.read_format = PERF_FORMAT_TOTAL_TIME_ENABLED;
>>>>>>>>+ *    attr.use_clockid = 1;
>>>>>>>>+ *    attr.clockid = CLOCK_MONOTONIC;
>>>>>>>>+ *    attr.config = XE_PMU_RENDER_GROUP_BUSY(0);
>>>>>>>>+ *
>>>>>>>>+ *    fd = syscall(__NR_perf_event_open, &attr, -1, cpu, -1, 0);
>>>>>>>>+ */
>>>>>>>>+
>>>>>>>>+/*
>>>>>>>>+ * Top bits of every counter are GT id.
>>>>>>>>+ */
>>>>>>>>+#define __XE_PMU_GT_SHIFT (56)
>>>>>>>>+
>>>>>>>>+#define ___XE_PMU_OTHER(gt, x) \
>>>>>>>>+    (((__u64)(x)) | ((__u64)(gt) << __XE_PMU_GT_SHIFT))
>>>>>>>>+
>>>>>>>
>>>>>>>The perf uapi is self-describing and users should look up 
>>>>>>>on sysfs what
>>>>>>>to use. Example for i915 since it's what I'm currently working on:
>>>>>>>
>>>>>>>    $ cat /sys/bus/event_source/devices/i915/events/actual-frequency
>>>>>>>    config=0x100000
>>>>>>>    $ cat 
>>>>>>>/sys/bus/event_source/devices/i915/events/actual- 
>>>>>>>frequency.unit
>>>>>>>    M
>>>>>>>
>>>>>>>`perf list` works fine and doesn't know anything about this xe-only
>>>>>>>header. Why would we add anything here rather than 
>>>>>>>encourage other users
>>>>>>>to read from the generic interface?
>>>>>>
>>>>>>Agree. perf list | grep rc6 is sufficient.
>>>>>
>>>>>This was previously asked by Rodrigo https:// 
>>>>>patchwork.freedesktop.org/patch/555013/?series=119504&rev=5
>>>>>
>>>>>so what changed from then to now.
>>>>
>>>>2 different things. That rev you are pointing out had:
>>>>
>>>>include/uapi/drm/xe_drm.h            |  16 +
>>>>
>>>>so the uapi was being changed without proper doc. Rodrigo asked for the
>>>>documentation to be added, because that is the proper way to add things
>>>>to the uapi header.
>>>>
>>>>In this review what I noticed though is that the change shouldn't be
>>>>there in the first place. We shouldn't change anything in the drm/xe
>>>>UAPI header because this is actually an UAPI (via sysfs) coming from
>>>>perf.
>>>
>>>@Lucas,
>>>
>>>Agree. This is carried over from i915, where other constraints may 
>>>have led to adding some header bits for gt.
>>>
>>>I want to mention that now there are 2 events per engine - ticks 
>>>(actual engine run ticks) and total_ticks (total ticks that the 
>>>engine was alloted) which enable us to take a ratio of deltas to 
>>>calculate utilization at the engine level. If we add the config 
>>>for each possible combination of gt and vf_id, we would have
>>>
>>>num_engines * 2 * num_gts * num_vfs
>
>not sure what you're trying to do here with that * 2...?
>
>what are you trying to read from the userspace side?  2 separate
>counters or just 1 with the embedded info?

2 separate counters per engine. Both are required to calculate % 
busyness. This is similar to PCEU where we read 2 counters and then 
determine the % utilization.

>
>Also, why is this considering num_vfs? Is this to monitor from the host
>all the VFs being used?

Yes. Currently VF specific utilization can only be accessed from host.

>If so, then we need to be careful what we are
>trying to support: there's no dynamic event creation in perf - we'd
>need to unregister and register the pmu again like it's done in the
>uncore pmu

Yes, initially I was suggesting that only supported events should show 
up in sysfs, but later Riana pointed out that we cannot do it, so I was 
hoping we could just register events for max_vfs (which is known at 
driver load time). Although, that's not a clean way to do it.

Ideally VF KMD should be responsible for it's own instance of PMU (like 
below), but reading the counters from VF is not supported yet,

/sys/bus/event_source/devices/<host_device>/events
/sys/bus/event_source/devices/<vf1>/events
/sys/bus/event_source/devices/<vf2>/events
...

What we have here is a special case where VF events are being accessed 
from host, so I am unclear on what the best interface would be.

Maybe we just create new PMU instances in host for each provisioned VF 
and then list those events under host like below:

/sys/bus/event_source/devices/<host_device>/events/vf1
/sys/bus/event_source/devices/<host_device>/events/vf2
...

These sysfs directories will get created dynamically, but will be 
associated with separate PMU instance.

Thoughts?

Thanks,
Umesh

>
>>>
>>>where num_vfs should (ideally) be the number of vfs provisioned.
>>
>>Here it will be max_vfs instead of provisioned num_vfs. Pmu event 
>>attributes are added on pmu_register.
>>
>>num_engines * 2 * num_gts * max_vfs
>>
>>Did not find a way to add/remove events dynamically.
>
>there isn't. other PMUs unregister and register it again... but that is
>only done on early probe, not something really dynamic. If we want to
>support something dynamic like that then we really need to wait the
>perf_pmu_unregister() to be fixed: not really possible without that.
>
>
>Lucas De Marchi
>
>>
>>Thanks
>>Riana
>>
>>>
>>>Do you think that's fine?
>
>you have config, config1 and config2, each with 64bits
>
>I'm not really sure what that *2 mean and we'd better have some room
>to grow it for ABI compatibility (even if a proper tool would read the
>sysfs to check what is the meaning of each bit).
>
>Lucas De Marchi
>
>>>
>>>Thanks,
>>>Umesh
>>>
>>>>
>>>>Lucas De Marchi
>>>>
>>>>
>>>>>
>>>>>Thanks,
>>>>>Aravind.
>>>>>
>>>>>>
>>>>>>Thanks,
>>>>>>
>>>>>>Vinay.
>>>>>>
>>>>>>>
>>>>>>>Lucas De Marchi
>>


More information about the Intel-xe mailing list