[PATCH 00/14] i915 perf support for command stream based OA, GPU and workload metrics capture

Sagar Arun Kamble sagar.a.kamble at intel.com
Wed Jul 12 09:56:48 UTC 2017


This series is prepared from below two series posted by Sourab in March.
1. https://patchwork.freedesktop.org/series/21351/ - Collect command stream
   based OA reports using i915 perf
2. https://patchwork.freedesktop.org/series/21352/ - Collect command stream
   based GPU metrics for all engines using i915 perf

This series addresses most of the review comments from above two. Major change
is moving the stream structure and information from dev_priv to per-engine
structures. Reframing below the intent of this series from cover letter of
earlier series.

This series adds framework for
1. Collection of OA reports associated with the render command stream, which
are collected around batchbuffer boundaries.
2. Collect other metadata such as ctx_id, pid, tag etc. with the samples,
and thus we can establish the association of samples collected with the
corresponding process/workload.
3. Collection of GPU performance metrics associated with the command stream of
a particular engine. These metrics include timestamps of work submission and
completion on engines, mmio metrics, etc. These metrics are are collected
around batchbuffer boundaries.

There are a couple of patches which add support for using the cross-timestamp
framework for retrieving tightly coupled device/system timestamps.
In our case, this framework enables us to have correlated pairs of gpu+system
time which can be used over a period of time to correct the frequency of
timestamp clock, and thus enable to accurately send system time (_MONO_RAW)
as requested to the userspace. The results are generally observed to quite
better with the use of cross timestamps and the frequency delta gradually
tapers down to 0 with increasing correction periods.
The use of cross timestamp framework though requires us to have
clockcounter/timecounter abstraction for the timestamp clocksource, and
further requires few changes in the kernel timekeeping/clocksource code.

Pending issues to be addressed in this series:
1. cross-timestamp sync patches need to be reworked as requested by kernel
   maintainers.
2. Some of the data types being collected through these patches can be done in
   the userspace and that is yet to be finalized.
3. Add support in the perf IGT tests for verifying CS based perf functionality.

Cc: Lionel Landwerlin <lionel.g.landwerlin at intel.com>
Cc: Matthew Auld <matthew.auld at intel.com>
Cc: Chris Wilson <chris at chris-wilson.co.uk>

Sourab Gupta (14):
  drm/i915: Add ctx getparam ioctl parameter to retrieve ctx unique id
  drm/i915: Expose OA sample source to userspace
  drm/i915: Framework for capturing command stream based OA reports and
    ctx id info.
  drm/i915: Flush periodic samples, in case of no pending CS sample
    requests
  drm/i915: Inform userspace about command stream OA buf overflow
  drm/i915: Populate ctx ID for periodic OA reports
  drm/i915: Add support for having pid output with OA report
  drm/i915: Add support for emitting execbuffer tags through OA counter
    reports
  drm/i915: Add support for collecting timestamps on all gpu engines
  drm/i915: Extract raw GPU timestamps from OA reports to forward in
    perf samples
  drm/i915: Async check for streams data availability with hrtimer
    rescheduling
  time: Expose current clocksource in use by timekeeping framework
  drm/i915: Mechanism to forward clock monotonic raw time in perf
    samples
  drm/i915: Support for capturing MMIO register values

 drivers/gpu/drm/i915/i915_drv.c            |   15 +
 drivers/gpu/drm/i915/i915_drv.h            |  199 ++-
 drivers/gpu/drm/i915/i915_gem_context.c    |    3 +
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |   11 +
 drivers/gpu/drm/i915/i915_perf.c           | 2011 ++++++++++++++++++++++++----
 drivers/gpu/drm/i915/i915_reg.h            |    6 +
 drivers/gpu/drm/i915/intel_engine_cs.c     |    4 +
 drivers/gpu/drm/i915/intel_ringbuffer.c    |    2 +
 drivers/gpu/drm/i915/intel_ringbuffer.h    |    8 +
 include/linux/timekeeping.h                |    5 +
 include/uapi/drm/i915_drm.h                |   76 ++
 kernel/time/timekeeping.c                  |   12 +
 12 files changed, 2105 insertions(+), 247 deletions(-)

-- 
1.9.1



More information about the Intel-gfx-trybot mailing list