[Intel-gfx] [PATCH 00/16] Framework to collect command stream gpu metrics using i915 perf

sourab.gupta at intel.com sourab.gupta at intel.com
Fri Apr 22 11:33:49 UTC 2016


From: Sourab Gupta <sourab.gupta at intel.com>

This series adds framework for collection of gpu performance metrics
associated with the command stream of a particular engine. These metrics
include OA reports, timestamps, mmio metrics, etc. These metrics are
are collected around batchbuffer boundaries.

This work utilizes the underlying infrastructure introduced in Robert Bragg's
patches for collecting periodic OA counter snapshots (based on Haswell):
https://lists.freedesktop.org/archives/intel-gfx/2016-April/093206.html

This patch set is based on Gen8+ version of Robert's patches which can be found
here: https://github.com/rib/linux/commits/wip/rib/oa-2016-04-18-nightly
These are not yet individually floated in the mailing list, which I hope
doesn't lead to any significant loss of clarity in order to review the work
proposed in this patch series.

Compared to last series I floated earlier,
(https://lists.freedesktop.org/archives/intel-gfx/2016-February/087686.html),
this series incorporates the following changes/fixes, besides rebasing
on Robert's latest work:

* Few refinements related to flushing periodic OA samples in case of no pending
  CS samples, but not doing so in case there are pending CS samples (queued but
  requests not yet completed).

* For the case of overflow of command stream buf, we can choose to overwrite old
  entries or stop collecting more samples. This is right now controlled via
  compile time macro. We can move to either of these behaviors going forward.

* The sample consistency is maintained between the periodic OA reports
  and command stream ones. This implies, for e.g., that if ctx_id/pid
  sample type is requested, the most recent pid collected in the CS samples
  is used to populate the relevant field in the periodic samples.

* In case both timestamp and OA sample type are requested for render engine,
  the raw gpu timestamps are extracted from OA report only, and we don't
  need to insert seperate commands for retreiving timestamps.

* Introduction of a new property to request inclusion of CLOCK_MONOTONIC time
  in the samples. Being able to correlate gpu events/samples with
  CLOCK_MONOTONIC is of practical use to userspace, for usecases involving
  correlation of gpu events with system time. This may, for e.g., involve
  plotting gpu and system events on the same timeline (such as vblank events,
  or timestamps for when work was submitted to the kernel, etc.). The patch
  introduces a sync mechanism in order to correlate the gpu timestamps with
  CLOCK_MONOTONIC time to begin with. This can further be extended for other
  clock domains. Sync is needed because published gpu timestamp clock frequency
  may differ and lead to clock drift. The sync mechanism may be crude right now
  and improved upon going forward, but this is the general thinking behind
  introduction of this mechanism.

* The gpu raw timestamp can also be forwarded in conjunction with
  CLOCK_MONOTONIC time, since userspace may have a need for both. E.g. the raw
  timestamps are exposed to userspace if it uses PIPE_CONTROL +
  post_sync_op='write timestamp' and userspace may want to correlate these with
  perf metrics.

For reference, the patches can be fetched from here:
https://github.com/sourabgu/linux/tree/perf-2016-04-19

Robert Bragg (2):
  drm/i915: Constrain intel_context::global_id to 20 bits
  drm/i915: return ctx->global_id from intel_execlists_ctx_id()

Sourab Gupta (14):
  drm/i915: Introduce global id for contexts
  drm/i915: Add ctx getparam ioctl parameter to retrieve ctx global id
  drm/i915: Expose OA sample source to userspace
  drm/i915: Framework for capturing command stream based OA reports
  drm/i915: flush periodic samples, in case of no pending CS sample
    requests
  drm/i915: Handle the overflow condition for command stream buf
  drm/i915: Populate ctx ID for periodic OA reports
  drm/i915: Add support for having pid output with OA report
  drm/i915: Add support for emitting execbuffer tags through OA counter
    reports
  drm/i915: Extend i915 perf framework for collecting timestamps on all
    gpu engines
  drm/i915: Extract raw GPU timestamps from OA reports to forward in
    perf samples
  drm/i915: Support opening multiple concurrent perf streams
  drm/i915: Mechanism to forward clock monotonic time in perf samples
  drm/i915: Support for capturing MMIO register values

 drivers/gpu/drm/i915/i915_debugfs.c        |    4 +-
 drivers/gpu/drm/i915/i915_drv.h            |   97 +-
 drivers/gpu/drm/i915/i915_gem_context.c    |   23 +
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |    5 +
 drivers/gpu/drm/i915/i915_perf.c           | 1842 +++++++++++++++++++++++++---
 drivers/gpu/drm/i915/i915_reg.h            |   16 +
 drivers/gpu/drm/i915/intel_lrc.c           |   32 +-
 drivers/gpu/drm/i915/intel_lrc.h           |    3 +-
 include/uapi/drm/i915_drm.h                |   79 ++
 9 files changed, 1907 insertions(+), 194 deletions(-)

-- 
1.9.1



More information about the Intel-gfx mailing list