Robert Bragg robert at sixbynine.org
Tue May 5 17:53:48 PDT 2015

As we've learned more about the observability capabilities of Gen
graphics we've found that it's not enough to only try and configure the
OA unit from userspace without any dedicated support from the kernel.

As it is currently the i965 backends for both AMD_performance_monitor
and INTEL_performance_query aren't able to report normalized metrics
useful to application developers due to the limitations of configuring
the OA unit from userspace via LRIs.

More recently we've developed a perf PMU (performance monitoring unit)
driver within the drm i915 driver ("i915_oa") that lets userspace
configure and open an event fd via the perf_event_open syscall which
provides us a more complete interface for configuring the Gen graphics
OA unit.

With help from the kernel we can support periodic sampling (where the
hardware writes reports into a gpu mapped circular buffer that we can
forward as perf samples), we can deal with the clock gating + PM
limitations imposed by the observability hw and also manage + maintain
the selection of performance counters.

The perf_event_open(2) man page is a good starting point for anyone
wanting to learn about the Linux perf interface. Something to beware of
is that there's currently no precedent upstream for exposing device
metrics via a perf PMU and although early feedback was sought for this
work, some of this may be subject to change based on feedback from the
core perf maintainers as well as the i915 drm driver maintainers.

This PRM is a good starting point for anyone wanting to learn about the
Gen graphics Observability hardware. Some important information is
currently missing and this should be updated soon, but that's more
directly related to the i915_oa perf driver. Notably though the report
formats described here need to be understood by Mesa, since the perf
samples simply forward the raw reports from the OA hardware.


This series re-works the i965 driver's support for exposing performance
counters, taking advantage of this i915_oa perf event interface.

A corresponding kernel branch with an initial i915_oa driver for Haswell
can be found here:

https://github.com/rib/linux  wip/rib/oa-hsw-4.0.0

A corresponding libdrm branch can be found here:

https://github.com/rib/drm  wip/rib/oa-hsw-4.0.0

In case it's helpful to see another example using the i915_oa perf
interface I've also been developing a 'gputop' tool that both lets me
test the INTEL_performance_query interface to collect per-context
metrics from Mesa and can also visualize system wide metrics (i.e.
across all gpu contexts) using perf directly:


Although I haven't updated the branches in a while, I could share some
initial code adding support for Broadwell if anyone's interested to get
a sense of what's involved in supporting later hardware generations.

I still anticipate some (hopefully relatively minor) tweaking of
implementation details based on review feedback for the i915_oa driver,
but I hope that this is a good point to ask for some feedback on the
Mesa changes.

If it's more convenient, these patches can also be fetched from here:

https://github.com/rib/mesa  wip/rib/oa-hsw-4.0.0

- Robert

Robert Bragg (6):
  i965: Remove perf monitor/query backend
  Separate INTEL_performance_query frontend
  Model INTEL perf query backend after query object BE
  i965: Implement INTEL_performance_query extension
  i965: Expose OA counters via INTEL_performance_query
  i965: Adds further support for "3D" OA counters

 src/mapi/glapi/gen/gl_genexec.py                   |    1 +
 src/mesa/Makefile.sources                          |    2 +
 src/mesa/drivers/dri/i965/Makefile.sources         |    2 +-
 src/mesa/drivers/dri/i965/brw_context.c            |    5 +-
 src/mesa/drivers/dri/i965/brw_context.h            |  101 +-
 .../drivers/dri/i965/brw_performance_monitor.c     | 1472 ------------
 src/mesa/drivers/dri/i965/brw_performance_query.c  | 2356 ++++++++++++++++++++
 src/mesa/drivers/dri/i965/intel_batchbuffer.c      |   10 +-
 src/mesa/drivers/dri/i965/intel_extensions.c       |   69 +-
 src/mesa/main/context.c                            |    3 +
 src/mesa/main/dd.h                                 |   39 +
 src/mesa/main/mtypes.h                             |   28 +
 src/mesa/main/performance_monitor.c                |  579 -----
 src/mesa/main/performance_monitor.h                |   39 -
 src/mesa/main/performance_query.c                  |  608 +++++
 src/mesa/main/performance_query.h                  |   80 +
 16 files changed, 3197 insertions(+), 2197 deletions(-)
 delete mode 100644 src/mesa/drivers/dri/i965/brw_performance_monitor.c
 create mode 100644 src/mesa/drivers/dri/i965/brw_performance_query.c
 create mode 100644 src/mesa/main/performance_query.c
 create mode 100644 src/mesa/main/performance_query.h


