[PATCH 00/14] i915 perf support for command stream based OA, GPU and workload metrics capture
Lionel Landwerlin
lionel.g.landwerlin at intel.com
Mon Jul 24 14:16:29 UTC 2017
There is probably a need to align the uapi structures.
If you look at the last commit, the uapi looks like this :
struct {
struct drm_i915_perf_record_header header;
{ u32 source; } && DRM_I915_PERF_PROP_SAMPLE_OA_SOURCE
{ u32 ctx_id; } && DRM_I915_PERF_PROP_SAMPLE_CTX_ID
{ u32 pid; } && DRM_I915_PERF_PROP_SAMPLE_PID
{ u32 tag; } && DRM_I915_PERF_PROP_SAMPLE_TAG
{ u64 gpu_ts; } && DRM_I915_PERF_PROP_SAMPLE_TS
{ u64 clk_mono; } && DRM_I915_PERF_PROP_SAMPLE_CLOCK_MONOTONIC
{ u32 mmio[]; } && DRM_I915_PERF_PROP_SAMPLE_MMIO
{ u32 oa_report[]; } && DRM_I915_PERF_PROP_SAMPLE_OA
};
Which means if you have DRM_I915_PERF_PROP_SAMPLE_OA_SOURCE &
DRM_I915_PERF_PROP_SAMPLE_TS, |gpu_ts| won't be aligned to 8 bytes.
You should probably consider padding all of those structures to 8 bytes.
On 14/07/17 19:51, Sagar Arun Kamble wrote:
> This series is prepared from below two series posted by Sourab in March.
> 1. https://patchwork.freedesktop.org/series/21351/ - Collect command stream
> based OA reports using i915 perf
> 2. https://patchwork.freedesktop.org/series/21352/ - Collect command stream
> based GPU metrics for all engines using i915 perf
>
> This series addresses most of the review comments from above two. Major change
> is moving the stream structure and information from dev_priv to per-engine
> structures. Reframing below the intent of this series from cover letter of
> earlier series.
>
> This series adds framework for
> 1. Collection of OA reports associated with the render command stream, which
> are collected around batchbuffer boundaries.
> 2. Collect other metadata such as ctx_id, pid, tag etc. with the samples,
> and thus we can establish the association of samples collected with the
> corresponding process/workload.
> 3. Collection of GPU performance metrics associated with the command stream of
> a particular engine. These metrics include timestamps of work submission and
> completion on engines, mmio metrics, etc. These metrics are are collected
> around batchbuffer boundaries.
>
> There are a couple of patches which add support for using the cross-timestamp
> framework for retrieving tightly coupled device/system timestamps.
> In our case, this framework enables us to have correlated pairs of gpu+system
> time which can be used over a period of time to correct the frequency of
> timestamp clock, and thus enable to accurately send system time (_MONO_RAW)
> as requested to the userspace. The results are generally observed to quite
> better with the use of cross timestamps and the frequency delta gradually
> tapers down to 0 with increasing correction periods.
> The use of cross timestamp framework though requires us to have
> clockcounter/timecounter abstraction for the timestamp clocksource, and
> further requires few changes in the kernel timekeeping/clocksource code.
>
> Pending issues to be addressed in this series:
> 1. cross-timestamp sync patches need to be reworked as requested by kernel
> maintainers.
> 2. Some of the data types being collected through these patches can be done in
> the userspace and that is yet to be finalized.
> 3. Add support in the perf IGT tests for verifying CS based perf functionality.
>
> Cc: Lionel Landwerlin <lionel.g.landwerlin at intel.com>
> Cc: Matthew Auld <matthew.auld at intel.com>
> Cc: Chris Wilson <chris at chris-wilson.co.uk>
>
> Sourab Gupta (14):
> drm/i915: Add ctx getparam ioctl parameter to retrieve ctx unique id
> drm/i915: Expose OA sample source to userspace
> drm/i915: Framework for capturing command stream based OA reports and
> ctx id info.
> drm/i915: Flush periodic samples, in case of no pending CS sample
> requests
> drm/i915: Inform userspace about command stream OA buf overflow
> drm/i915: Populate ctx ID for periodic OA reports
> drm/i915: Add support for having pid output with OA report
> drm/i915: Add support for emitting execbuffer tags through OA counter
> reports
> drm/i915: Add support for collecting timestamps on all gpu engines
> drm/i915: Extract raw GPU timestamps from OA reports to forward in
> perf samples
> drm/i915: Async check for streams data availability with hrtimer
> rescheduling
> time: Expose current clocksource in use by timekeeping framework
> drm/i915: Mechanism to forward clock monotonic raw time in perf
> samples
> drm/i915: Support for capturing MMIO register values
>
> drivers/gpu/drm/i915/i915_drv.c | 15 +
> drivers/gpu/drm/i915/i915_drv.h | 194 ++-
> drivers/gpu/drm/i915/i915_gem.c | 1 +
> drivers/gpu/drm/i915/i915_gem_context.c | 3 +
> drivers/gpu/drm/i915/i915_gem_execbuffer.c | 11 +
> drivers/gpu/drm/i915/i915_perf.c | 2022 ++++++++++++++++++++++++----
> drivers/gpu/drm/i915/i915_reg.h | 6 +
> drivers/gpu/drm/i915/intel_engine_cs.c | 4 +
> drivers/gpu/drm/i915/intel_ringbuffer.c | 2 +
> drivers/gpu/drm/i915/intel_ringbuffer.h | 8 +
> include/linux/timekeeping.h | 5 +
> include/uapi/drm/i915_drm.h | 76 ++
> kernel/time/timekeeping.c | 12 +
> 13 files changed, 2110 insertions(+), 249 deletions(-)
>
More information about the Intel-gfx-trybot
mailing list