[Intel-xe] [PATCH v2 2/2] drm/xe/pmu: Enable PMU interface
Iddamsetty, Aravind
aravind.iddamsetty at intel.com
Mon Jul 24 09:38:58 UTC 2023
On 22-07-2023 11:34, Dixit, Ashutosh wrote:
> On Fri, 21 Jul 2023 16:36:02 -0700, Dixit, Ashutosh wrote:
>>
>> On Fri, 21 Jul 2023 04:51:09 -0700, Iddamsetty, Aravind wrote:
>>>
>> Hi Aravind,
>>
>>> On 21-07-2023 06:32, Dixit, Ashutosh wrote:
>>>> On Tue, 27 Jun 2023 05:21:13 -0700, Aravind Iddamsetty wrote:
>>>>>
>>>> More stuff to mull over. You can ignore comments starting with "OK", those
>>>> are just notes to myself.
>>>>
>>>> Also, maybe some time we can add a basic IGT which reads these exposed
>>>> counters and verifies that we can read them and they are monotonically
>>>> increasing?
>>>
>>> this is the IGT https://patchwork.freedesktop.org/series/119936/ series
>>> using these counters posted by Venkat.
>>>
>>>>
>>>>> There are a set of engine group busyness counters provided by HW which are
>>>>> perfect fit to be exposed via PMU perf events.
>>>>>
>>>>> BSPEC: 46559, 46560, 46722, 46729
>>>>
>>>> Also add these Bspec entries: 71028, 52071
>>>
>>> OK.
>>>
>>>>
>>>>>
>>>>> events can be listed using:
>>>>> perf list
>>>>> xe_0000_03_00.0/any-engine-group-busy-gt0/ [Kernel PMU event]
>>>>> xe_0000_03_00.0/copy-group-busy-gt0/ [Kernel PMU event]
>>>>> xe_0000_03_00.0/interrupts/ [Kernel PMU event]
>>>>> xe_0000_03_00.0/media-group-busy-gt0/ [Kernel PMU event]
>>>>> xe_0000_03_00.0/render-group-busy-gt0/ [Kernel PMU event]
>>>>>
>>>>> and can be read using:
>>>>>
>>>>> perf stat -e "xe_0000_8c_00.0/render-group-busy-gt0/" -I 1000
>>>>> time counts unit events
>>>>> 1.001139062 0 ns xe_0000_8c_00.0/render-group-busy-gt0/
>>>>> 2.003294678 0 ns xe_0000_8c_00.0/render-group-busy-gt0/
>>>>> 3.005199582 0 ns xe_0000_8c_00.0/render-group-busy-gt0/
>>>>> 4.007076497 0 ns xe_0000_8c_00.0/render-group-busy-gt0/
>>>>> 5.008553068 0 ns xe_0000_8c_00.0/render-group-busy-gt0/
>>>>> 6.010531563 43520 ns xe_0000_8c_00.0/render-group-busy-gt0/
>>>>> 7.012468029 44800 ns xe_0000_8c_00.0/render-group-busy-gt0/
>>>>> 8.013463515 0 ns xe_0000_8c_00.0/render-group-busy-gt0/
>>>>> 9.015300183 0 ns xe_0000_8c_00.0/render-group-busy-gt0/
>>>>> 10.017233010 0 ns xe_0000_8c_00.0/render-group-busy-gt0/
>>>>> 10.971934120 0 ns xe_0000_8c_00.0/render-group-busy-gt0/
>>>>>
>>>>> The pmu base implementation is taken from i915.
>>>>>
>>>>> v2:
>>>>> Store last known value when device is awake return that while the GT is
>>>>> suspended and then update the driver copy when read during awake.
>>>>>
>>>>> Co-developed-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>>>> Co-developed-by: Bommu Krishnaiah <krishnaiah.bommu at intel.com>
>>>>> Signed-off-by: Aravind Iddamsetty <aravind.iddamsetty at intel.com>
>>>>> ---
>>>>> drivers/gpu/drm/xe/Makefile | 2 +
>>>>> drivers/gpu/drm/xe/regs/xe_gt_regs.h | 5 +
>>>>> drivers/gpu/drm/xe/xe_device.c | 2 +
>>>>> drivers/gpu/drm/xe/xe_device_types.h | 4 +
>>>>> drivers/gpu/drm/xe/xe_gt.c | 2 +
>>>>> drivers/gpu/drm/xe/xe_irq.c | 22 +
>>>>> drivers/gpu/drm/xe/xe_module.c | 5 +
>>>>> drivers/gpu/drm/xe/xe_pmu.c | 739 +++++++++++++++++++++++++++
>>>>> drivers/gpu/drm/xe/xe_pmu.h | 25 +
>>>>> drivers/gpu/drm/xe/xe_pmu_types.h | 80 +++
>>>>> include/uapi/drm/xe_drm.h | 16 +
>>>>> 11 files changed, 902 insertions(+)
>>>>> create mode 100644 drivers/gpu/drm/xe/xe_pmu.c
>>>>> create mode 100644 drivers/gpu/drm/xe/xe_pmu.h
>>>>> create mode 100644 drivers/gpu/drm/xe/xe_pmu_types.h
>>>>>
<snip>
>>>>> +
>>>>> +void engine_group_busyness_store(struct xe_gt *gt)
>>>>> +{
>>>>> + struct xe_pmu *pmu = >->tile->xe->pmu;
>>>>> + unsigned int gt_id = gt->info.id;
>>>>> + unsigned long flags;
>>>>> +
>>>>> + spin_lock_irqsave(&pmu->lock, flags);
>>>>> +
>>>>> + store_sample(pmu, gt_id, __XE_SAMPLE_RENDER_GROUP_BUSY,
>>>>> + __engine_group_busyness_read(gt, XE_PMU_RENDER_GROUP_BUSY(0)));
>>>>> + store_sample(pmu, gt_id, __XE_SAMPLE_COPY_GROUP_BUSY,
>>>>> + __engine_group_busyness_read(gt, XE_PMU_COPY_GROUP_BUSY(0)));
>>>>> + store_sample(pmu, gt_id, __XE_SAMPLE_MEDIA_GROUP_BUSY,
>>>>> + __engine_group_busyness_read(gt, XE_PMU_MEDIA_GROUP_BUSY(0)));
>>>>> + store_sample(pmu, gt_id, __XE_SAMPLE_ANY_ENGINE_GROUP_BUSY,
>>>>> + __engine_group_busyness_read(gt, XE_PMU_ANY_ENGINE_GROUP_BUSY(0)));
>
> Here why should we store everything, we should store only those events
> which are enabled?
>
> Also it would good if the above can be done in a loop somehow. 4 is fine
> but if we add events later, a loop will be nice, if possible.
i got your point. i could do something like this
for (i = __XE_SAMPLE_RENDER_GROUP_BUSY; i < __XE_NUM_PMU_SAMPLERS; i++) {
val = __engine_group_busyness_read(gt, i);
pmu->sample[gt_id][i] = val;
}
Thanks,
Aravind.
More information about the Intel-xe
mailing list