[Intel-gfx] [RFC 4/4] drm/i915/perf: Send system clock monotonic time in perf samples

Sagar Arun Kamble sagar.a.kamble at intel.com
Wed Dec 6 08:31:01 UTC 2017



On 12/5/2017 7:52 PM, Lionel Landwerlin wrote:
> On 15/11/17 12:13, Sagar Arun Kamble wrote:
>> From: Sourab Gupta <sourab.gupta at intel.com>
>>
>> Currently, we have the ability to only forward the GPU timestamps in the
>> samples (which are generated via OA reports). This limits the ability to
>> correlate these samples with the system events.
>>
>> An ability is therefore needed to report timestamps in different clock
>> domains, such as CLOCK_MONOTONIC, in the perf samples to be of more
>> practical use to the userspace. This ability becomes important
>> when we want to correlate/plot GPU events/samples with other system 
>> events
>> on the same timeline (e.g. vblank events, or timestamps when work was
>> submitted to kernel, etc.)
>>
>> The patch here proposes a mechanism to achieve this. The correlation
>> between gpu time and system time is established using the timestamp 
>> clock
>> associated with the command stream, abstracted as 
>> timecounter/cyclecounter
>> to retrieve gpu/system time correlated values.
>>
>> v2: Added i915_driver_init_late() function to capture the new late init
>> phase for perf (Chris)
>>
>> v3: Removed cross-timestamp changes.
>>
>> Signed-off-by: Sourab Gupta <sourab.gupta at intel.com>
>> Signed-off-by: Sagar Arun Kamble <sagar.a.kamble at intel.com>
>> Cc: Lionel Landwerlin <lionel.g.landwerlin at intel.com>
>> Cc: Chris Wilson <chris at chris-wilson.co.uk>
>> Cc: Sourab Gupta <sourab.gupta at intel.com>
>> Cc: Matthew Auld <matthew.auld at intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_perf.c | 27 +++++++++++++++++++++++++++
>>   include/uapi/drm/i915_drm.h      |  7 +++++++
>>   2 files changed, 34 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_perf.c 
>> b/drivers/gpu/drm/i915/i915_perf.c
>> index 3b721d7..94ee924 100644
>> --- a/drivers/gpu/drm/i915/i915_perf.c
>> +++ b/drivers/gpu/drm/i915/i915_perf.c
>> @@ -336,6 +336,7 @@
>>     #define SAMPLE_OA_REPORT    BIT(0)
>>   #define SAMPLE_GPU_TS        BIT(1)
>> +#define SAMPLE_SYSTEM_TS    BIT(2)
>>     /**
>>    * struct perf_open_properties - for validated properties given to 
>> open a stream
>> @@ -622,6 +623,7 @@ static int append_oa_sample(struct 
>> i915_perf_stream *stream,
>>       struct drm_i915_perf_record_header header;
>>       u32 sample_flags = stream->sample_flags;
>>       u64 gpu_ts = 0;
>> +    u64 system_ts = 0;
>>         header.type = DRM_I915_PERF_RECORD_SAMPLE;
>>       header.pad = 0;
>> @@ -647,6 +649,23 @@ static int append_oa_sample(struct 
>> i915_perf_stream *stream,
>>             if (copy_to_user(buf, &gpu_ts, I915_PERF_TS_SAMPLE_SIZE))
>>               return -EFAULT;
>> +        buf += I915_PERF_TS_SAMPLE_SIZE;
>
> This is a ridiculous nit, but I think using sizeof(u64) would be more 
> readable than this I915_PERF_TS_SAMPLE_SIZE define.

Will update. Thanks.

>
>> +    }
>> +
>> +    if (sample_flags & SAMPLE_SYSTEM_TS) {
>> +        gpu_ts = get_gpu_ts_from_oa_report(stream, report);
>> +        /*
>> +         * XXX: timecounter_cyc2time considers time backwards if delta
>> +         * timestamp is more than half the max ns time covered by
>> +         * counter. It will be ~35min for 36 bit counter. If this much
>> +         * sampling duration is needed we will have to update tc->nsec
>> +         * by explicitly reading the timecounter (timecounter_read)
>> +         * before this duration.
>> +         */
>> +        system_ts = timecounter_cyc2time(&stream->tc, gpu_ts);
>> +
>> +        if (copy_to_user(buf, &system_ts, I915_PERF_TS_SAMPLE_SIZE))
>> +            return -EFAULT;
>>       }
>>         (*offset) += header.size;
>> @@ -2137,6 +2156,11 @@ static int i915_oa_stream_init(struct 
>> i915_perf_stream *stream,
>>           stream->sample_size += I915_PERF_TS_SAMPLE_SIZE;
>>       }
>>   +    if (props->sample_flags & SAMPLE_SYSTEM_TS) {
>> +        stream->sample_flags |= SAMPLE_SYSTEM_TS;
>> +        stream->sample_size += I915_PERF_TS_SAMPLE_SIZE;
>> +    }
>> +
>>       dev_priv->perf.oa.oa_buffer.format_size = format_size;
>>       if (WARN_ON(dev_priv->perf.oa.oa_buffer.format_size == 0))
>>           return -EINVAL;
>> @@ -2857,6 +2881,9 @@ static int read_properties_unlocked(struct 
>> drm_i915_private *dev_priv,
>>           case DRM_I915_PERF_PROP_SAMPLE_GPU_TS:
>>               props->sample_flags |= SAMPLE_GPU_TS;
>>               break;
>> +        case DRM_I915_PERF_PROP_SAMPLE_SYSTEM_TS:
>> +            props->sample_flags |= SAMPLE_SYSTEM_TS;
>> +            break;
>>           case DRM_I915_PERF_PROP_OA_METRICS_SET:
>>               if (value == 0) {
>>                   DRM_DEBUG("Unknown OA metric set ID\n");
>> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
>> index 0b9249e..283859c 100644
>> --- a/include/uapi/drm/i915_drm.h
>> +++ b/include/uapi/drm/i915_drm.h
>> @@ -1453,6 +1453,12 @@ enum drm_i915_perf_property_id {
>>       DRM_I915_PERF_PROP_SAMPLE_GPU_TS,
>>         /**
>> +     * This property requests inclusion of CLOCK_MONOTONIC system 
>> time in
>> +     * the perf sample data.
>> +     */
>> +    DRM_I915_PERF_PROP_SAMPLE_SYSTEM_TS,
>> +
>> +    /**
>>        * The value specifies which set of OA unit metrics should be
>>        * be configured, defining the contents of any OA unit reports.
>>        */
>> @@ -1539,6 +1545,7 @@ enum drm_i915_perf_record_type {
>>        *
>>        *     { u32 oa_report[]; } && DRM_I915_PERF_PROP_SAMPLE_OA
>>        *     { u64 gpu_timestamp; } && DRM_I915_PERF_PROP_SAMPLE_GPU_TS
>> +     *     { u64 system_timestamp; } && 
>> DRM_I915_PERF_PROP_SAMPLE_SYSTEM_TS
>
> I would just add those u64 fields before oa_report.
> Since the report sizes are dictated by the hardware, I'm afraid that 
> one day someone might come up with a non 64bit aligned format (however 
> unlikely).
> And since the new properties mean you need to be aware of the 
> potential new offsets, it's not breaking existing userspace.

Sure. Will update.

>
>>        * };
>>        */
>>       DRM_I915_PERF_RECORD_SAMPLE = 1,
>
>



More information about the Intel-gfx mailing list