[Intel-gfx] [PATCH] i915/query: Correlate engine and cpu timestamps with better accuracy

Lionel Landwerlin lionel.g.landwerlin at intel.com
Thu Mar 4 08:28:59 UTC 2021


On 04/03/2021 02:09, Chris Wilson wrote:
> Quoting Umesh Nerlige Ramappa (2021-03-03 21:28:00)
>> Perf measurements rely on CPU and engine timestamps to correlate
>> events of interest across these time domains. Current mechanisms get
>> these timestamps separately and the calculated delta between these
>> timestamps lack enough accuracy.
>>
>> To improve the accuracy of these time measurements to within a few us,
>> add a query that returns the engine and cpu timestamps captured as
>> close to each other as possible.
>>
>> v2: (Tvrtko)
>> - document clock reference used
>> - return cpu timestamp always
>> - capture cpu time just before lower dword of cs timestamp
>>
>> v3: (Chris)
>> - use uncore-rpm
>> - use __query_cs_timestamp helper
>>
>> v4: (Lionel)
>> - Kernel perf subsytem allows users to specify the clock id to be used
>>    in perf_event_open. This clock id is used by the perf subsystem to
>>    return the appropriate cpu timestamp in perf events. Similarly, let
>>    the user pass the clockid to this query so that cpu timestamp
>>    corresponds to the clock id requested.
>>
>> v5: (Tvrtko)
>> - Use normal ktime accessors instead of fast versions
>> - Add more uApi documentation
>>
>> v6: (Lionel)
>> - Move switch out of spinlock
>>
>> v7: (Chris)
>> - cs_timestamp is a misnomer, use cs_cycles instead
>> - return the cs cycle frequency as well in the query
>>
>> v8:
>> - Add platform and engine specific checks
>>
>> v9: (Lionel)
>> - Return 2 cpu timestamps in the query - captured before and after the
>>    register read
>>
>> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa at intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_query.c | 144 ++++++++++++++++++++++++++++++
>>   include/uapi/drm/i915_drm.h       |  47 ++++++++++
>>   2 files changed, 191 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_query.c b/drivers/gpu/drm/i915/i915_query.c
>> index fed337ad7b68..acca22ee6014 100644
>> --- a/drivers/gpu/drm/i915/i915_query.c
>> +++ b/drivers/gpu/drm/i915/i915_query.c
>> @@ -6,6 +6,8 @@
>>   
>>   #include <linux/nospec.h>
>>   
>> +#include "gt/intel_engine_pm.h"
>> +#include "gt/intel_engine_user.h"
>>   #include "i915_drv.h"
>>   #include "i915_perf.h"
>>   #include "i915_query.h"
>> @@ -90,6 +92,147 @@ static int query_topology_info(struct drm_i915_private *dev_priv,
>>          return total_length;
>>   }
>>   
>> +typedef u64 (*__ktime_func_t)(void);
>> +static __ktime_func_t __clock_id_to_func(clockid_t clk_id)
>> +{
>> +       /*
>> +        * Use logic same as the perf subsystem to allow user to select the
>> +        * reference clock id to be used for timestamps.
>> +        */
>> +       switch (clk_id) {
>> +       case CLOCK_MONOTONIC:
>> +               return &ktime_get_ns;
>> +       case CLOCK_MONOTONIC_RAW:
>> +               return &ktime_get_raw_ns;
>> +       case CLOCK_REALTIME:
>> +               return &ktime_get_real_ns;
>> +       case CLOCK_BOOTTIME:
>> +               return &ktime_get_boottime_ns;
>> +       case CLOCK_TAI:
>> +               return &ktime_get_clocktai_ns;
>> +       default:
>> +               return NULL;
>> +       }
>> +}
>> +
>> +static inline int
>> +__read_timestamps(struct intel_uncore *uncore,
>> +                 i915_reg_t lower_reg,
>> +                 i915_reg_t upper_reg,
>> +                 u64 *cs_ts,
>> +                 u64 *cpu_ts,
>> +                 __ktime_func_t cpu_clock)
>> +{
>> +       u32 upper, lower, old_upper, loop = 0;
>> +
>> +       upper = intel_uncore_read_fw(uncore, upper_reg);
>> +       do {
>> +               cpu_ts[0] = cpu_clock();
>> +               lower = intel_uncore_read_fw(uncore, lower_reg);
>> +               cpu_ts[1] = cpu_clock();
>> +               old_upper = upper;
>> +               upper = intel_uncore_read_fw(uncore, upper_reg);
> Both register reads comprise the timestamp returned to userspace, so
> presumably you want cpu_ts[] to wrap both.
>
>         do {
>                 old_upper = upper;
>
>                 cpu_ts[0] = cpu_clock();
>                 lower = intel_uncore_read_fw(uncore, lower_reg);
>                 upper = intel_uncore_read_fw(uncore, upper_reg);
>                 cpu_ts[1] = cpu_clock();
>         } while (upper != old_upper && loop++ < 2);

Actually if we want the best accuracy we can just deal with the lower dword.

We can check the upper one hasn't changed outside of the 2 cpu_clock() 
calls.


-Lionel




More information about the Intel-gfx mailing list