[Intel-xe] [PATCH v2 2/2] drm/xe/pmu: Enable PMU interface

Iddamsetty, Aravind aravind.iddamsetty at intel.com
Mon Jul 24 09:38:58 UTC 2023



On 22-07-2023 11:34, Dixit, Ashutosh wrote:
> On Fri, 21 Jul 2023 16:36:02 -0700, Dixit, Ashutosh wrote:
>>
>> On Fri, 21 Jul 2023 04:51:09 -0700, Iddamsetty, Aravind wrote:
>>>
>> Hi Aravind,
>>
>>> On 21-07-2023 06:32, Dixit, Ashutosh wrote:
>>>> On Tue, 27 Jun 2023 05:21:13 -0700, Aravind Iddamsetty wrote:
>>>>>
>>>> More stuff to mull over. You can ignore comments starting with "OK", those
>>>> are just notes to myself.
>>>>
>>>> Also, maybe some time we can add a basic IGT which reads these exposed
>>>> counters and verifies that we can read them and they are monotonically
>>>> increasing?
>>>
>>> this is the IGT https://patchwork.freedesktop.org/series/119936/ series
>>> using these counters posted by Venkat.
>>>
>>>>
>>>>> There are a set of engine group busyness counters provided by HW which are
>>>>> perfect fit to be exposed via PMU perf events.
>>>>>
>>>>> BSPEC: 46559, 46560, 46722, 46729
>>>>
>>>> Also add these Bspec entries: 71028, 52071
>>>
>>> OK.
>>>
>>>>
>>>>>
>>>>> events can be listed using:
>>>>> perf list
>>>>>   xe_0000_03_00.0/any-engine-group-busy-gt0/         [Kernel PMU event]
>>>>>   xe_0000_03_00.0/copy-group-busy-gt0/               [Kernel PMU event]
>>>>>   xe_0000_03_00.0/interrupts/                        [Kernel PMU event]
>>>>>   xe_0000_03_00.0/media-group-busy-gt0/              [Kernel PMU event]
>>>>>   xe_0000_03_00.0/render-group-busy-gt0/             [Kernel PMU event]
>>>>>
>>>>> and can be read using:
>>>>>
>>>>> perf stat -e "xe_0000_8c_00.0/render-group-busy-gt0/" -I 1000
>>>>>            time             counts unit events
>>>>>      1.001139062                  0 ns  xe_0000_8c_00.0/render-group-busy-gt0/
>>>>>      2.003294678                  0 ns  xe_0000_8c_00.0/render-group-busy-gt0/
>>>>>      3.005199582                  0 ns  xe_0000_8c_00.0/render-group-busy-gt0/
>>>>>      4.007076497                  0 ns  xe_0000_8c_00.0/render-group-busy-gt0/
>>>>>      5.008553068                  0 ns  xe_0000_8c_00.0/render-group-busy-gt0/
>>>>>      6.010531563              43520 ns  xe_0000_8c_00.0/render-group-busy-gt0/
>>>>>      7.012468029              44800 ns  xe_0000_8c_00.0/render-group-busy-gt0/
>>>>>      8.013463515                  0 ns  xe_0000_8c_00.0/render-group-busy-gt0/
>>>>>      9.015300183                  0 ns  xe_0000_8c_00.0/render-group-busy-gt0/
>>>>>     10.017233010                  0 ns  xe_0000_8c_00.0/render-group-busy-gt0/
>>>>>     10.971934120                  0 ns  xe_0000_8c_00.0/render-group-busy-gt0/
>>>>>
>>>>> The pmu base implementation is taken from i915.
>>>>>
>>>>> v2:
>>>>> Store last known value when device is awake return that while the GT is
>>>>> suspended and then update the driver copy when read during awake.
>>>>>
>>>>> Co-developed-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>>>> Co-developed-by: Bommu Krishnaiah <krishnaiah.bommu at intel.com>
>>>>> Signed-off-by: Aravind Iddamsetty <aravind.iddamsetty at intel.com>
>>>>> ---
>>>>>  drivers/gpu/drm/xe/Makefile          |   2 +
>>>>>  drivers/gpu/drm/xe/regs/xe_gt_regs.h |   5 +
>>>>>  drivers/gpu/drm/xe/xe_device.c       |   2 +
>>>>>  drivers/gpu/drm/xe/xe_device_types.h |   4 +
>>>>>  drivers/gpu/drm/xe/xe_gt.c           |   2 +
>>>>>  drivers/gpu/drm/xe/xe_irq.c          |  22 +
>>>>>  drivers/gpu/drm/xe/xe_module.c       |   5 +
>>>>>  drivers/gpu/drm/xe/xe_pmu.c          | 739 +++++++++++++++++++++++++++
>>>>>  drivers/gpu/drm/xe/xe_pmu.h          |  25 +
>>>>>  drivers/gpu/drm/xe/xe_pmu_types.h    |  80 +++
>>>>>  include/uapi/drm/xe_drm.h            |  16 +
>>>>>  11 files changed, 902 insertions(+)
>>>>>  create mode 100644 drivers/gpu/drm/xe/xe_pmu.c
>>>>>  create mode 100644 drivers/gpu/drm/xe/xe_pmu.h
>>>>>  create mode 100644 drivers/gpu/drm/xe/xe_pmu_types.h
>>>>>
<snip>
>>>>> +
>>>>> +void engine_group_busyness_store(struct xe_gt *gt)
>>>>> +{
>>>>> +	struct xe_pmu *pmu = &gt->tile->xe->pmu;
>>>>> +	unsigned int gt_id = gt->info.id;
>>>>> +	unsigned long flags;
>>>>> +
>>>>> +	spin_lock_irqsave(&pmu->lock, flags);
>>>>> +
>>>>> +	store_sample(pmu, gt_id, __XE_SAMPLE_RENDER_GROUP_BUSY,
>>>>> +		     __engine_group_busyness_read(gt, XE_PMU_RENDER_GROUP_BUSY(0)));
>>>>> +	store_sample(pmu, gt_id, __XE_SAMPLE_COPY_GROUP_BUSY,
>>>>> +		     __engine_group_busyness_read(gt, XE_PMU_COPY_GROUP_BUSY(0)));
>>>>> +	store_sample(pmu, gt_id, __XE_SAMPLE_MEDIA_GROUP_BUSY,
>>>>> +		     __engine_group_busyness_read(gt, XE_PMU_MEDIA_GROUP_BUSY(0)));
>>>>> +	store_sample(pmu, gt_id, __XE_SAMPLE_ANY_ENGINE_GROUP_BUSY,
>>>>> +		     __engine_group_busyness_read(gt, XE_PMU_ANY_ENGINE_GROUP_BUSY(0)));
> 
> Here why should we store everything, we should store only those events
> which are enabled?
> 
> Also it would good if the above can be done in a loop somehow. 4 is fine
> but if we add events later, a loop will be nice, if possible.

i got your point. i could do something like this

for (i = __XE_SAMPLE_RENDER_GROUP_BUSY; i < __XE_NUM_PMU_SAMPLERS; i++) {

  val = __engine_group_busyness_read(gt, i);
  pmu->sample[gt_id][i] = val;
}

Thanks,
Aravind.


More information about the Intel-xe mailing list