[PATCH 00/14] i915 perf support for command stream based OA, GPU and workload metrics capture

Lionel Landwerlin lionel.g.landwerlin at intel.com
Mon Jul 24 14:16:29 UTC 2017


There is probably a need to align the uapi structures.
If you look at the last commit, the uapi looks like this :

struct {
     struct drm_i915_perf_record_header header;

     { u32 source; } && DRM_I915_PERF_PROP_SAMPLE_OA_SOURCE
     { u32 ctx_id; } && DRM_I915_PERF_PROP_SAMPLE_CTX_ID
     { u32 pid; } && DRM_I915_PERF_PROP_SAMPLE_PID
     { u32 tag; } && DRM_I915_PERF_PROP_SAMPLE_TAG
     { u64 gpu_ts; } && DRM_I915_PERF_PROP_SAMPLE_TS
     { u64 clk_mono; } && DRM_I915_PERF_PROP_SAMPLE_CLOCK_MONOTONIC
     { u32 mmio[]; } && DRM_I915_PERF_PROP_SAMPLE_MMIO
     { u32 oa_report[]; } && DRM_I915_PERF_PROP_SAMPLE_OA
};


Which means if you have DRM_I915_PERF_PROP_SAMPLE_OA_SOURCE & 
DRM_I915_PERF_PROP_SAMPLE_TS, |gpu_ts| won't be aligned to 8 bytes.
You should probably consider padding all of those structures to 8 bytes.


On 14/07/17 19:51, Sagar Arun Kamble wrote:
> This series is prepared from below two series posted by Sourab in March.
> 1. https://patchwork.freedesktop.org/series/21351/ - Collect command stream
>     based OA reports using i915 perf
> 2. https://patchwork.freedesktop.org/series/21352/ - Collect command stream
>     based GPU metrics for all engines using i915 perf
>
> This series addresses most of the review comments from above two. Major change
> is moving the stream structure and information from dev_priv to per-engine
> structures. Reframing below the intent of this series from cover letter of
> earlier series.
>
> This series adds framework for
> 1. Collection of OA reports associated with the render command stream, which
> are collected around batchbuffer boundaries.
> 2. Collect other metadata such as ctx_id, pid, tag etc. with the samples,
> and thus we can establish the association of samples collected with the
> corresponding process/workload.
> 3. Collection of GPU performance metrics associated with the command stream of
> a particular engine. These metrics include timestamps of work submission and
> completion on engines, mmio metrics, etc. These metrics are are collected
> around batchbuffer boundaries.
>
> There are a couple of patches which add support for using the cross-timestamp
> framework for retrieving tightly coupled device/system timestamps.
> In our case, this framework enables us to have correlated pairs of gpu+system
> time which can be used over a period of time to correct the frequency of
> timestamp clock, and thus enable to accurately send system time (_MONO_RAW)
> as requested to the userspace. The results are generally observed to quite
> better with the use of cross timestamps and the frequency delta gradually
> tapers down to 0 with increasing correction periods.
> The use of cross timestamp framework though requires us to have
> clockcounter/timecounter abstraction for the timestamp clocksource, and
> further requires few changes in the kernel timekeeping/clocksource code.
>
> Pending issues to be addressed in this series:
> 1. cross-timestamp sync patches need to be reworked as requested by kernel
>     maintainers.
> 2. Some of the data types being collected through these patches can be done in
>     the userspace and that is yet to be finalized.
> 3. Add support in the perf IGT tests for verifying CS based perf functionality.
>
> Cc: Lionel Landwerlin <lionel.g.landwerlin at intel.com>
> Cc: Matthew Auld <matthew.auld at intel.com>
> Cc: Chris Wilson <chris at chris-wilson.co.uk>
>
> Sourab Gupta (14):
>    drm/i915: Add ctx getparam ioctl parameter to retrieve ctx unique id
>    drm/i915: Expose OA sample source to userspace
>    drm/i915: Framework for capturing command stream based OA reports and
>      ctx id info.
>    drm/i915: Flush periodic samples, in case of no pending CS sample
>      requests
>    drm/i915: Inform userspace about command stream OA buf overflow
>    drm/i915: Populate ctx ID for periodic OA reports
>    drm/i915: Add support for having pid output with OA report
>    drm/i915: Add support for emitting execbuffer tags through OA counter
>      reports
>    drm/i915: Add support for collecting timestamps on all gpu engines
>    drm/i915: Extract raw GPU timestamps from OA reports to forward in
>      perf samples
>    drm/i915: Async check for streams data availability with hrtimer
>      rescheduling
>    time: Expose current clocksource in use by timekeeping framework
>    drm/i915: Mechanism to forward clock monotonic raw time in perf
>      samples
>    drm/i915: Support for capturing MMIO register values
>
>   drivers/gpu/drm/i915/i915_drv.c            |   15 +
>   drivers/gpu/drm/i915/i915_drv.h            |  194 ++-
>   drivers/gpu/drm/i915/i915_gem.c            |    1 +
>   drivers/gpu/drm/i915/i915_gem_context.c    |    3 +
>   drivers/gpu/drm/i915/i915_gem_execbuffer.c |   11 +
>   drivers/gpu/drm/i915/i915_perf.c           | 2022 ++++++++++++++++++++++++----
>   drivers/gpu/drm/i915/i915_reg.h            |    6 +
>   drivers/gpu/drm/i915/intel_engine_cs.c     |    4 +
>   drivers/gpu/drm/i915/intel_ringbuffer.c    |    2 +
>   drivers/gpu/drm/i915/intel_ringbuffer.h    |    8 +
>   include/linux/timekeeping.h                |    5 +
>   include/uapi/drm/i915_drm.h                |   76 ++
>   kernel/time/timekeeping.c                  |   12 +
>   13 files changed, 2110 insertions(+), 249 deletions(-)
>



More information about the Intel-gfx-trybot mailing list