[PATCH 0/3] drm/panfrost: Expose HW counters to userspace

Fri Apr 5 15:20:45 UTC 2019

On 04/04/2019 16:20, Boris Brezillon wrote:
> Hello,
> 
> This patch adds new ioctls to expose GPU counters to userspace.
> These will be used by the mesa driver (should be posted soon).
> 
> A few words about the implementation: I followed the VC4/Etnaviv model
> where perf counters are retrieved on a per-job basis. This allows one
> to have get accurate results when there are users using the GPU
> concurrently.
> AFAICT, the mali kbase is using a different approach where several
> users can register a performance monitor but with no way to have fined
> grained control over what job/GPU-context to track.

mali_kbase submits overlapping jobs. The jobs on slot 0 and slot 1 can
be from different contexts (address spaces), and mali_kbase also fully
uses the _NEXT registers. So there can be a job from one context
executing on slot 0 and a job from a different context waiting in the
_NEXT registers. (And the same for slot 1). This means that there's no
(visible) gap between the first job finishing and the second job
starting. Early versions of the driver even had a throttle to avoid
interrupt storms (see JOB_IRQ_THROTTLE) which would further delay the
IRQ - but thankfully that's gone.

The upshot is that it's basically impossible to measure "per-job"
counters when running at full speed. Because multiple jobs are running
and the driver doesn't actually know when one ends and the next starts.

Since one of the primary use cases is to draw pretty graphs of the
system load [1], this "per-job" information isn't all that relevant (and
minimal performance overhead is important). And if you want to monitor
just one application it is usually easiest to ensure that it is the only
thing running.

[1]
https://developer.arm.com/tools-and-software/embedded/arm-development-studio/components/streamline-performance-analyzer

> This design choice comes at a cost: every time the perfmon context
> changes (the perfmon context is the list of currently active
> perfmons), the driver has to add a fence to prevent new jobs from
> corrupting counters that will be dumped by previous jobs.
> 
> Let me know if that's an issue and if you think we should approach
> things differently.

It depends what you expect to do with the counters. Per-job counters are
certainly useful sometimes. But serialising all jobs can mess up the
thing you are trying to measure the performance of.

Steve

> Regards,
> 
> Boris
> 
> Boris Brezillon (3):
>   drm/panfrost: Move gpu_{write,read}() macros to panfrost_regs.h
>   drm/panfrost: Expose HW counters to userspace
>   panfrost/drm: Define T860 perf counters
> 
>  drivers/gpu/drm/panfrost/Makefile           |   3 +-
>  drivers/gpu/drm/panfrost/panfrost_device.c  |   8 +
>  drivers/gpu/drm/panfrost/panfrost_device.h  |  11 +
>  drivers/gpu/drm/panfrost/panfrost_drv.c     |  22 +-
>  drivers/gpu/drm/panfrost/panfrost_gpu.c     |  46 +-
>  drivers/gpu/drm/panfrost/panfrost_job.c     |  24 +
>  drivers/gpu/drm/panfrost/panfrost_job.h     |   4 +
>  drivers/gpu/drm/panfrost/panfrost_perfcnt.c | 954 ++++++++++++++++++++
>  drivers/gpu/drm/panfrost/panfrost_perfcnt.h |  59 ++
>  drivers/gpu/drm/panfrost/panfrost_regs.h    |  22 +
>  include/uapi/drm/panfrost_drm.h             | 122 +++
>  11 files changed, 1268 insertions(+), 7 deletions(-)
>  create mode 100644 drivers/gpu/drm/panfrost/panfrost_perfcnt.c
>  create mode 100644 drivers/gpu/drm/panfrost/panfrost_perfcnt.h
>