[Intel-gfx] [RFC 4/4] drm/i915/perf: Send system clock monotonic time in perf samples

Lionel Landwerlin lionel.g.landwerlin at intel.com
Tue Dec 5 14:22:14 UTC 2017


On 15/11/17 12:13, Sagar Arun Kamble wrote:
> From: Sourab Gupta <sourab.gupta at intel.com>
>
> Currently, we have the ability to only forward the GPU timestamps in the
> samples (which are generated via OA reports). This limits the ability to
> correlate these samples with the system events.
>
> An ability is therefore needed to report timestamps in different clock
> domains, such as CLOCK_MONOTONIC, in the perf samples to be of more
> practical use to the userspace. This ability becomes important
> when we want to correlate/plot GPU events/samples with other system events
> on the same timeline (e.g. vblank events, or timestamps when work was
> submitted to kernel, etc.)
>
> The patch here proposes a mechanism to achieve this. The correlation
> between gpu time and system time is established using the timestamp clock
> associated with the command stream, abstracted as timecounter/cyclecounter
> to retrieve gpu/system time correlated values.
>
> v2: Added i915_driver_init_late() function to capture the new late init
> phase for perf (Chris)
>
> v3: Removed cross-timestamp changes.
>
> Signed-off-by: Sourab Gupta <sourab.gupta at intel.com>
> Signed-off-by: Sagar Arun Kamble <sagar.a.kamble at intel.com>
> Cc: Lionel Landwerlin <lionel.g.landwerlin at intel.com>
> Cc: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Sourab Gupta <sourab.gupta at intel.com>
> Cc: Matthew Auld <matthew.auld at intel.com>
> ---
>   drivers/gpu/drm/i915/i915_perf.c | 27 +++++++++++++++++++++++++++
>   include/uapi/drm/i915_drm.h      |  7 +++++++
>   2 files changed, 34 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> index 3b721d7..94ee924 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -336,6 +336,7 @@
>   
>   #define SAMPLE_OA_REPORT	BIT(0)
>   #define SAMPLE_GPU_TS		BIT(1)
> +#define SAMPLE_SYSTEM_TS	BIT(2)
>   
>   /**
>    * struct perf_open_properties - for validated properties given to open a stream
> @@ -622,6 +623,7 @@ static int append_oa_sample(struct i915_perf_stream *stream,
>   	struct drm_i915_perf_record_header header;
>   	u32 sample_flags = stream->sample_flags;
>   	u64 gpu_ts = 0;
> +	u64 system_ts = 0;
>   
>   	header.type = DRM_I915_PERF_RECORD_SAMPLE;
>   	header.pad = 0;
> @@ -647,6 +649,23 @@ static int append_oa_sample(struct i915_perf_stream *stream,
>   
>   		if (copy_to_user(buf, &gpu_ts, I915_PERF_TS_SAMPLE_SIZE))
>   			return -EFAULT;
> +		buf += I915_PERF_TS_SAMPLE_SIZE;

This is a ridiculous nit, but I think using sizeof(u64) would be more 
readable than this I915_PERF_TS_SAMPLE_SIZE define.

> +	}
> +
> +	if (sample_flags & SAMPLE_SYSTEM_TS) {
> +		gpu_ts = get_gpu_ts_from_oa_report(stream, report);
> +		/*
> +		 * XXX: timecounter_cyc2time considers time backwards if delta
> +		 * timestamp is more than half the max ns time covered by
> +		 * counter. It will be ~35min for 36 bit counter. If this much
> +		 * sampling duration is needed we will have to update tc->nsec
> +		 * by explicitly reading the timecounter (timecounter_read)
> +		 * before this duration.
> +		 */
> +		system_ts = timecounter_cyc2time(&stream->tc, gpu_ts);
> +
> +		if (copy_to_user(buf, &system_ts, I915_PERF_TS_SAMPLE_SIZE))
> +			return -EFAULT;
>   	}
>   
>   	(*offset) += header.size;
> @@ -2137,6 +2156,11 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
>   		stream->sample_size += I915_PERF_TS_SAMPLE_SIZE;
>   	}
>   
> +	if (props->sample_flags & SAMPLE_SYSTEM_TS) {
> +		stream->sample_flags |= SAMPLE_SYSTEM_TS;
> +		stream->sample_size += I915_PERF_TS_SAMPLE_SIZE;
> +	}
> +
>   	dev_priv->perf.oa.oa_buffer.format_size = format_size;
>   	if (WARN_ON(dev_priv->perf.oa.oa_buffer.format_size == 0))
>   		return -EINVAL;
> @@ -2857,6 +2881,9 @@ static int read_properties_unlocked(struct drm_i915_private *dev_priv,
>   		case DRM_I915_PERF_PROP_SAMPLE_GPU_TS:
>   			props->sample_flags |= SAMPLE_GPU_TS;
>   			break;
> +		case DRM_I915_PERF_PROP_SAMPLE_SYSTEM_TS:
> +			props->sample_flags |= SAMPLE_SYSTEM_TS;
> +			break;
>   		case DRM_I915_PERF_PROP_OA_METRICS_SET:
>   			if (value == 0) {
>   				DRM_DEBUG("Unknown OA metric set ID\n");
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 0b9249e..283859c 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -1453,6 +1453,12 @@ enum drm_i915_perf_property_id {
>   	DRM_I915_PERF_PROP_SAMPLE_GPU_TS,
>   
>   	/**
> +	 * This property requests inclusion of CLOCK_MONOTONIC system time in
> +	 * the perf sample data.
> +	 */
> +	DRM_I915_PERF_PROP_SAMPLE_SYSTEM_TS,
> +
> +	/**
>   	 * The value specifies which set of OA unit metrics should be
>   	 * be configured, defining the contents of any OA unit reports.
>   	 */
> @@ -1539,6 +1545,7 @@ enum drm_i915_perf_record_type {
>   	 *
>   	 *     { u32 oa_report[]; } && DRM_I915_PERF_PROP_SAMPLE_OA
>   	 *     { u64 gpu_timestamp; } && DRM_I915_PERF_PROP_SAMPLE_GPU_TS
> +	 *     { u64 system_timestamp; } && DRM_I915_PERF_PROP_SAMPLE_SYSTEM_TS

I would just add those u64 fields before oa_report.
Since the report sizes are dictated by the hardware, I'm afraid that one 
day someone might come up with a non 64bit aligned format (however 
unlikely).
And since the new properties mean you need to be aware of the potential 
new offsets, it's not breaking existing userspace.

>   	 * };
>   	 */
>   	DRM_I915_PERF_RECORD_SAMPLE = 1,




More information about the Intel-gfx mailing list