[Intel-gfx] [PATCH 00/11] Framework to collect gpu metrics using i915 perf infrastructure
sourab.gupta at intel.com
sourab.gupta at intel.com
Tue Feb 16 05:27:08 UTC 2016
From: Sourab Gupta <sourab.gupta at intel.com>
This series adds framework for collection of gpu performance metrics
associated with the command stream of a particular engine. These metrics
include OA reports, timestamps, mmio metrics, etc. These metrics are
are collected around batchbuffer boundaries.
This work utilizes the underlying infrastructure introduced in Robert Bragg's
patches for collecting periodic OA counter snapshots (based on Haswell):
This patch set is based on Gen8+ version of Robert's patch series, which can
be found here: https://github.com/rib/linux/tree/wip/rib/oa-next
These are not yet individually floated in the mailing list, which I hope
doesn't lead to any significant loss of clarity in order to review the work
proposed in this patch series.
Compared to last series sent earlier, this series is based on drm i915 ioctl
based implementation ( which can be referred to, in Robert's work). As such,
the design has been changed (and simplified) due to some earlier core perf
assumptions going away.
Few salient features are listed below:
* Ability to collect command stream based OA reports on render engine, in
conjunction with the periodic reports generated with the OA unit. These
would be collected in seperate buffers and forwarded to userspace in the
respective timestamp order. The samples are differentiated in userspace by
distinguishing the value of OA sample source field.
* Ability to collect timestamps and mmio metrics, associated with command
stream of any particular gpu engine. The particular sample metrics to be
collected are requested by userspace client in the properties associated
with the stream being opened. The samples generated depend on original
sample flags requested in the stream properties.
* Ability to collect associated metadata information with the samples such as
pid, tags, etc. These are collected at the time of inserting the commands into
the command stream of particular gpu engine, and forwarded along with samples
* Multiple streams belonging to different engines can be opened concurrently
(while restricting only one instance of open stream per engine). This allows
us to open simultaneously streams belonging to different gpu engines to
collect samples belonging to all of them concurrently.
* The different stages of a single workload (belonging to a single context) can
be delimited by using 'execbuffer tagging' mechanism introduced here.
For e.g. for the media pipeline, CodecHAL encoding stage has a single context
and involves multiple stages such as Scaling, ME, MBEnc, PAK for which there
are separate execbuffer calls. There is a need to have the samples generated
to have such information, so as to be able to associate them with the
particular workload stage. The presence of a tag sample_type, which is passed
in by userspace during execbuffer ioctl fulfills this requirement.
I am looking for feedback on the design proposed here, particularly pertaining
to the mechanics of metrics collection through insertion of commands in the
command stream of associated gpu engines, sample generation according to the
requested sample flags in stream properties, concurrent operation of different
streams to collect the samples from multiple gpu engines, and any such
design/implementation aspects per se.
Few open issues which I'm working on include:
* In case both timestamp and OA sample type are requested for render engine,
the ts information should be able to be derived from OA report only, and we
should not need to insert seperate commands for dumping timestamps. Though,
we need to apply relevant timestamp base conversion for converting from OA
timestamps into ns.
* The sample consistency has to be maintained between the periodic OA reports
and the ones generated by command stream. This implies, for e.g., that if pid
sample_type is requested, the most recent pid collected in the CS samples
should be used to populate the relevant field in the periodic samples.
Likewise, the field 'ctx_id' needs to be deduced from the periodic OA reports
and mapped to 'intel_context::global_id', for periodic OA reports.
These open issues, though, shouldn't be distracting us too much from reviewing
the general mechanism proposed here, and these can be ironed out subsequently,
if there's a general agreement on the design here.
Also, one of the pre-requisite for this work is presence of globally unique id
associated with each context. The present context id is specific to drm fd, and
as such, it can't uniquely be used to associate the reports generated with the
corresponding context scheduled from userspace in a global way.
The first few patches in the series introduce the globally unique context id,
and subsequent ones introduce the framework for collection of metrics.
Robert Bragg (2):
drm/i915: Constrain intel_context::global_id to 20 bits
drm/i915: return ctx->global_id from intel_execlists_ctx_id()
Sourab Gupta (9):
drm/i915: Introduce global id for contexts
drm/i915: Add ctx getparam ioctl parameter to retrieve ctx global id
drm/i915: Expose OA sample source to userspace
drm/i915: Framework for capturing command stream based OA reports
drm/i915: Add support for having pid output with OA report
drm/i915: Add support to add execbuffer tags to OA counter reports
drm/i915: Extend i915 perf framework for collecting timestamps on all
drm/i915: Support opening multiple concurrent perf streams
drm/i915: Support for capturing MMIO register values
drivers/gpu/drm/i915/i915_debugfs.c | 7 +-
drivers/gpu/drm/i915/i915_drv.h | 68 +-
drivers/gpu/drm/i915/i915_gem_context.c | 23 +
drivers/gpu/drm/i915/i915_gem_execbuffer.c | 5 +
drivers/gpu/drm/i915/i915_perf.c | 1300 +++++++++++++++++++++++++---
drivers/gpu/drm/i915/i915_reg.h | 2 +
drivers/gpu/drm/i915/intel_lrc.c | 26 +-
drivers/gpu/drm/i915/intel_lrc.h | 2 +-
include/uapi/drm/i915_drm.h | 72 ++
9 files changed, 1349 insertions(+), 156 deletions(-)
More information about the Intel-gfx