[Intel-gfx] [RFC 4/4] drm/i915/perf: Send system clock monotonic time in perf samples
Sagar Arun Kamble
sagar.a.kamble at intel.com
Wed Dec 6 08:31:01 UTC 2017
On 12/5/2017 7:52 PM, Lionel Landwerlin wrote:
> On 15/11/17 12:13, Sagar Arun Kamble wrote:
>> From: Sourab Gupta <sourab.gupta at intel.com>
>>
>> Currently, we have the ability to only forward the GPU timestamps in the
>> samples (which are generated via OA reports). This limits the ability to
>> correlate these samples with the system events.
>>
>> An ability is therefore needed to report timestamps in different clock
>> domains, such as CLOCK_MONOTONIC, in the perf samples to be of more
>> practical use to the userspace. This ability becomes important
>> when we want to correlate/plot GPU events/samples with other system
>> events
>> on the same timeline (e.g. vblank events, or timestamps when work was
>> submitted to kernel, etc.)
>>
>> The patch here proposes a mechanism to achieve this. The correlation
>> between gpu time and system time is established using the timestamp
>> clock
>> associated with the command stream, abstracted as
>> timecounter/cyclecounter
>> to retrieve gpu/system time correlated values.
>>
>> v2: Added i915_driver_init_late() function to capture the new late init
>> phase for perf (Chris)
>>
>> v3: Removed cross-timestamp changes.
>>
>> Signed-off-by: Sourab Gupta <sourab.gupta at intel.com>
>> Signed-off-by: Sagar Arun Kamble <sagar.a.kamble at intel.com>
>> Cc: Lionel Landwerlin <lionel.g.landwerlin at intel.com>
>> Cc: Chris Wilson <chris at chris-wilson.co.uk>
>> Cc: Sourab Gupta <sourab.gupta at intel.com>
>> Cc: Matthew Auld <matthew.auld at intel.com>
>> ---
>> drivers/gpu/drm/i915/i915_perf.c | 27 +++++++++++++++++++++++++++
>> include/uapi/drm/i915_drm.h | 7 +++++++
>> 2 files changed, 34 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_perf.c
>> b/drivers/gpu/drm/i915/i915_perf.c
>> index 3b721d7..94ee924 100644
>> --- a/drivers/gpu/drm/i915/i915_perf.c
>> +++ b/drivers/gpu/drm/i915/i915_perf.c
>> @@ -336,6 +336,7 @@
>> #define SAMPLE_OA_REPORT BIT(0)
>> #define SAMPLE_GPU_TS BIT(1)
>> +#define SAMPLE_SYSTEM_TS BIT(2)
>> /**
>> * struct perf_open_properties - for validated properties given to
>> open a stream
>> @@ -622,6 +623,7 @@ static int append_oa_sample(struct
>> i915_perf_stream *stream,
>> struct drm_i915_perf_record_header header;
>> u32 sample_flags = stream->sample_flags;
>> u64 gpu_ts = 0;
>> + u64 system_ts = 0;
>> header.type = DRM_I915_PERF_RECORD_SAMPLE;
>> header.pad = 0;
>> @@ -647,6 +649,23 @@ static int append_oa_sample(struct
>> i915_perf_stream *stream,
>> if (copy_to_user(buf, &gpu_ts, I915_PERF_TS_SAMPLE_SIZE))
>> return -EFAULT;
>> + buf += I915_PERF_TS_SAMPLE_SIZE;
>
> This is a ridiculous nit, but I think using sizeof(u64) would be more
> readable than this I915_PERF_TS_SAMPLE_SIZE define.
Will update. Thanks.
>
>> + }
>> +
>> + if (sample_flags & SAMPLE_SYSTEM_TS) {
>> + gpu_ts = get_gpu_ts_from_oa_report(stream, report);
>> + /*
>> + * XXX: timecounter_cyc2time considers time backwards if delta
>> + * timestamp is more than half the max ns time covered by
>> + * counter. It will be ~35min for 36 bit counter. If this much
>> + * sampling duration is needed we will have to update tc->nsec
>> + * by explicitly reading the timecounter (timecounter_read)
>> + * before this duration.
>> + */
>> + system_ts = timecounter_cyc2time(&stream->tc, gpu_ts);
>> +
>> + if (copy_to_user(buf, &system_ts, I915_PERF_TS_SAMPLE_SIZE))
>> + return -EFAULT;
>> }
>> (*offset) += header.size;
>> @@ -2137,6 +2156,11 @@ static int i915_oa_stream_init(struct
>> i915_perf_stream *stream,
>> stream->sample_size += I915_PERF_TS_SAMPLE_SIZE;
>> }
>> + if (props->sample_flags & SAMPLE_SYSTEM_TS) {
>> + stream->sample_flags |= SAMPLE_SYSTEM_TS;
>> + stream->sample_size += I915_PERF_TS_SAMPLE_SIZE;
>> + }
>> +
>> dev_priv->perf.oa.oa_buffer.format_size = format_size;
>> if (WARN_ON(dev_priv->perf.oa.oa_buffer.format_size == 0))
>> return -EINVAL;
>> @@ -2857,6 +2881,9 @@ static int read_properties_unlocked(struct
>> drm_i915_private *dev_priv,
>> case DRM_I915_PERF_PROP_SAMPLE_GPU_TS:
>> props->sample_flags |= SAMPLE_GPU_TS;
>> break;
>> + case DRM_I915_PERF_PROP_SAMPLE_SYSTEM_TS:
>> + props->sample_flags |= SAMPLE_SYSTEM_TS;
>> + break;
>> case DRM_I915_PERF_PROP_OA_METRICS_SET:
>> if (value == 0) {
>> DRM_DEBUG("Unknown OA metric set ID\n");
>> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
>> index 0b9249e..283859c 100644
>> --- a/include/uapi/drm/i915_drm.h
>> +++ b/include/uapi/drm/i915_drm.h
>> @@ -1453,6 +1453,12 @@ enum drm_i915_perf_property_id {
>> DRM_I915_PERF_PROP_SAMPLE_GPU_TS,
>> /**
>> + * This property requests inclusion of CLOCK_MONOTONIC system
>> time in
>> + * the perf sample data.
>> + */
>> + DRM_I915_PERF_PROP_SAMPLE_SYSTEM_TS,
>> +
>> + /**
>> * The value specifies which set of OA unit metrics should be
>> * be configured, defining the contents of any OA unit reports.
>> */
>> @@ -1539,6 +1545,7 @@ enum drm_i915_perf_record_type {
>> *
>> * { u32 oa_report[]; } && DRM_I915_PERF_PROP_SAMPLE_OA
>> * { u64 gpu_timestamp; } && DRM_I915_PERF_PROP_SAMPLE_GPU_TS
>> + * { u64 system_timestamp; } &&
>> DRM_I915_PERF_PROP_SAMPLE_SYSTEM_TS
>
> I would just add those u64 fields before oa_report.
> Since the report sizes are dictated by the hardware, I'm afraid that
> one day someone might come up with a non 64bit aligned format (however
> unlikely).
> And since the new properties mean you need to be aware of the
> potential new offsets, it's not breaking existing userspace.
Sure. Will update.
>
>> * };
>> */
>> DRM_I915_PERF_RECORD_SAMPLE = 1,
>
>
More information about the Intel-gfx
mailing list