[PATCH 0/7] drm/xe: Per client usage

Tue Apr 16 08:37:44 UTC 2024

On 16/04/2024 04:04, Lucas De Marchi wrote:
> Add per-client usage statistics to xe. This ports xe to use the common
> method in drm to export the usage to userspace per client (where 1
> client == 1 drm fd open).
> 
> However insted of using the current format, this creates a new one with
> the unit "ticks". The intention here is not to mix the GPU clock domain
> with the CPU clock. It allows to cover a few more use cases without
> extra complications.
> 
> Last patch was a quick implemenation of a gputop-like tool in python.
> I ended doing it to cross check the gputop implementation. I's not
> really meant to be applied here.
> 
> I tested this on DG2 and TGL with kmscube (console-only) and vkcube
> (in a gnome session), but it would be good to soak this under more
> tests. The biggest goal for this patch series right now is to get
> consensus on the new UAPI.
> 
> TODO: Add documentation on top with the new interface.

Yeah a drm-usage-stats.rst patch would be nice to have in the RFC so one 
does not have to look into the driver implementation to discuss the 
proposed uapi.

Nevertheless I understand the proposal is to add this:

  drm-engine-<class>: <GPU_TIMESTAMP> <RUNTIME> ticks

That's two values per key. I guess "one key value pair for per one line 
of text" does not get strictly broken and that you propose a heuristics 
in parsing to detect that the <RUNTIME> cannot be mis-interpreted as the 
unit?

Not sure it is a good idea though. If you instead added a new key for 
the gpu time what would be the downside in your view? Like:

drm-engine-<class>: <uint> ticks
drm-ticks-<class>: <uint>

Or maybe even obsfuscate/generalise as:

drm-engine-<class>: <uint> gpu-time
drm-gpu-time-<class>: <uint>

Potentially could also add a key saying how much wall time is one unit 
of GPU time.

Or.. would even the existing drm-cycles, plus abuse of drm-maxfreq, 
work? Ticks == cycles, maxfreq == ticks per wall second.

Secondly, wrap behaviour every 25-30 seconds patch 6/7 describes 
definitely breaks the format spec and in my view should be worked around 
in the driver:

"""
Values are not required to be constantly monotonic if it makes the driver
implementation easier, but are required to catch up with the previously 
reported
larger value within a reasonable period. Upon observing a value lower 
than what
was previously read, userspace is expected to stay with that larger previous
value until a monotonic update is seen.
"""

Regards,

Tvrtko

> 
> Test-with: https://lore.kernel.org/igt-dev/20240405060056.59379-1-lucas.demarchi@intel.com/
> 
> Lucas De Marchi (5):
>    drm/xe: Promote xe_hw_engine_class_to_str()
>    drm/xe: Add XE_ENGINE_CLASS_OTHER to str conversion
>    drm/xe: Add helper to capture engine timestamp
>    drm/xe/client: Print runtime to fdinfo
>    HACK: simple gputop-like impl in python
> 
> Umesh Nerlige Ramappa (2):
>    drm/xe/lrc: Add helper to capture context timestamp
>    drm/xe: Add helper to capture context runtime
> 
>   drivers/gpu/drm/xe/regs/xe_lrc_layout.h       |   1 +
>   drivers/gpu/drm/xe/xe_device_types.h          |   9 ++
>   drivers/gpu/drm/xe/xe_drm_client.c            |  81 ++++++++++++-
>   drivers/gpu/drm/xe/xe_exec_queue.c            |  37 ++++++
>   drivers/gpu/drm/xe/xe_exec_queue.h            |   1 +
>   drivers/gpu/drm/xe/xe_hw_engine.c             |  29 +++++
>   drivers/gpu/drm/xe/xe_hw_engine.h             |   4 +
>   drivers/gpu/drm/xe/xe_hw_engine_class_sysfs.c |  18 ---
>   drivers/gpu/drm/xe/xe_lrc.c                   |  11 ++
>   drivers/gpu/drm/xe/xe_lrc.h                   |   2 +
>   drivers/gpu/drm/xe/xe_lrc_types.h             |   3 +
>   drivers/gpu/drm/xe/xe_sched_job.c             |   2 +
>   pyfdinfo                                      | 113 ++++++++++++++++++
>   13 files changed, 292 insertions(+), 19 deletions(-)
>   create mode 100755 pyfdinfo
>