[Intel-gfx] [PATCH 00/11] Framework to collect gpu metrics using i915 perf infrastructure

Tue Feb 16 05:27:08 UTC 2016

From: Sourab Gupta <sourab.gupta at intel.com>

This series adds framework for collection of gpu performance metrics
associated with the command stream of a particular engine. These metrics
include OA reports, timestamps, mmio metrics, etc. These metrics are
are collected around batchbuffer boundaries.

This work utilizes the underlying infrastructure introduced in Robert Bragg's
patches for collecting periodic OA counter snapshots (based on Haswell):
https://lists.freedesktop.org/archives/intel-gfx/2016-February/086909.html

This patch set is based on Gen8+ version of Robert's patch series, which can
be found here: https://github.com/rib/linux/tree/wip/rib/oa-next
These are not yet individually floated in the mailing list, which I hope
doesn't lead to any significant loss of clarity in order to review the work
proposed in this patch series.

Compared to last series sent earlier, this series is based on drm i915 ioctl
based implementation ( which can be referred to, in Robert's work). As such,
the design has been changed (and simplified) due to some earlier core perf
assumptions going away.

Few salient features are listed below:
* Ability to collect command stream based OA reports on render engine, in
  conjunction with the periodic reports generated with the OA unit. These
  would be collected in seperate buffers and forwarded to userspace in the
  respective timestamp order. The samples are differentiated in userspace by
  distinguishing the value of OA sample source field.

* Ability to collect timestamps and mmio metrics, associated with command
  stream of any particular gpu engine. The particular sample metrics to be
  collected are requested by userspace client in the properties associated
  with the stream being opened. The samples generated depend on original
  sample flags requested in the stream properties.

* Ability to collect associated metadata information with the samples such as
  pid, tags, etc. These are collected at the time of inserting the commands into
  the command stream of particular gpu engine, and forwarded along with samples

* Multiple streams belonging to different engines can be opened concurrently
  (while restricting only one instance of open stream per engine). This allows
  us to open simultaneously streams belonging to different gpu engines to
  collect samples belonging to all of them concurrently.

* The different stages of a single workload (belonging to a single context) can
  be delimited by using 'execbuffer tagging' mechanism introduced here.
  For e.g. for the media pipeline, CodecHAL encoding stage has a single context
  and involves multiple stages such as Scaling, ME, MBEnc, PAK for which there
  are separate execbuffer calls. There is a need to have the samples generated
  to have such information, so as to be able to associate them with the
  particular workload stage. The presence of a tag sample_type, which is passed
  in by userspace during execbuffer ioctl fulfills this requirement.

I am looking for feedback on the design proposed here, particularly pertaining
to the mechanics of metrics collection through insertion of commands in the
command stream of associated gpu engines, sample generation according to the
requested sample flags in stream properties, concurrent operation of different
streams to collect the samples from multiple gpu engines, and any such
design/implementation aspects per se.

Few open issues which I'm working on include:
* In case both timestamp and OA sample type are requested for render engine,
  the ts information should be able to be derived from OA report only, and we
  should not need to insert seperate commands for dumping timestamps. Though,
  we need to apply relevant timestamp base conversion for converting from OA
  timestamps into ns.

* The sample consistency has to be maintained between the periodic OA reports
  and the ones generated by command stream. This implies, for e.g., that if pid
  sample_type is requested, the most recent pid collected in the CS samples
  should be used to populate the relevant field in the periodic samples.
  Likewise, the field 'ctx_id' needs to be deduced from the periodic OA reports
  and mapped to 'intel_context::global_id', for periodic OA reports.

These open issues, though, shouldn't be distracting us too much from reviewing
the general mechanism proposed here, and these can be ironed out subsequently,
if there's a general agreement on the design here.

Also, one of the pre-requisite for this work is presence of globally unique id
associated with each context. The present context id is specific to drm fd, and
as such, it can't uniquely be used to associate the reports generated with the
corresponding context scheduled from userspace in a global way.
The first few patches in the series introduce the globally unique context id,
and subsequent ones introduce the framework for collection of metrics.

Robert Bragg (2):
  drm/i915: Constrain intel_context::global_id to 20 bits
  drm/i915: return ctx->global_id from intel_execlists_ctx_id()

Sourab Gupta (9):
  drm/i915: Introduce global id for contexts
  drm/i915: Add ctx getparam ioctl parameter to retrieve ctx global id
  drm/i915: Expose OA sample source to userspace
  drm/i915: Framework for capturing command stream based OA reports
  drm/i915: Add support for having pid output with OA report
  drm/i915: Add support to add execbuffer tags to OA counter reports
  drm/i915: Extend i915 perf framework for collecting timestamps on all
    gpu engines
  drm/i915: Support opening multiple concurrent perf streams
  drm/i915: Support for capturing MMIO register values

 drivers/gpu/drm/i915/i915_debugfs.c        |    7 +-
 drivers/gpu/drm/i915/i915_drv.h            |   68 +-
 drivers/gpu/drm/i915/i915_gem_context.c    |   23 +
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |    5 +
 drivers/gpu/drm/i915/i915_perf.c           | 1300 +++++++++++++++++++++++++---
 drivers/gpu/drm/i915/i915_reg.h            |    2 +
 drivers/gpu/drm/i915/intel_lrc.c           |   26 +-
 drivers/gpu/drm/i915/intel_lrc.h           |    2 +-
 include/uapi/drm/i915_drm.h                |   72 ++
 9 files changed, 1349 insertions(+), 156 deletions(-)

-- 
1.9.1