[Intel-gfx] [PATCH] drm/i915/pmu: Measure sampler intervals

Chris Wilson chris at chris-wilson.co.uk
Wed May 30 11:07:49 UTC 2018


Quoting Tvrtko Ursulin (2018-05-30 11:57:39)
> 
> On 25/05/2018 18:45, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2018-05-25 18:31:35)
> >>
> >> On 25/05/2018 18:11, Chris Wilson wrote:
> >>> hrtimer is not reliable enough to assume fixed intervals, and so even
> >>> coarse accuracy (in the face of kasan and similar heavy debugging) we
> >>> need to measure the actual interval between sample.
> >>
> >> It doesn't even average out to something acceptable under such Kconfigs?
> >> Horror.. precise but inaccurate. /O\
> >>
> >>> While using a single timestamp to compute the interval does not allow
> >>> very fine accuracy (consider the impact of a slow forcewake between
> >>> different samples after the timestamp is read) is much better than
> >>> assuming the interval.
> >>>
> >>> Testcase: igt/perf_pmu #ivb
> >>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> >>> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> >>> ---
> >>>    drivers/gpu/drm/i915/i915_pmu.c | 20 +++++++++++++-------
> >>>    drivers/gpu/drm/i915/i915_pmu.h |  4 ++++
> >>>    2 files changed, 17 insertions(+), 7 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
> >>> index dc87797db500..f5087515eb43 100644
> >>> --- a/drivers/gpu/drm/i915/i915_pmu.c
> >>> +++ b/drivers/gpu/drm/i915/i915_pmu.c
> >>> @@ -127,6 +127,7 @@ static void __i915_pmu_maybe_start_timer(struct drm_i915_private *i915)
> >>>    {
> >>>        if (!i915->pmu.timer_enabled && pmu_needs_timer(i915, true)) {
> >>>                i915->pmu.timer_enabled = true;
> >>> +             i915->pmu.timestamp = ktime_get();
> >>>                hrtimer_start_range_ns(&i915->pmu.timer,
> >>>                                       ns_to_ktime(PERIOD), 0,
> >>>                                       HRTIMER_MODE_REL_PINNED);
> >>> @@ -160,7 +161,7 @@ update_sample(struct i915_pmu_sample *sample, u32 unit, u32 val)
> >>>        sample->cur += mul_u32_u32(val, unit);
> >>>    }
> >>>    
> >>> -static void engines_sample(struct drm_i915_private *dev_priv)
> >>> +static void engines_sample(struct drm_i915_private *dev_priv, u64 period)
> >>>    {
> >>>        struct intel_engine_cs *engine;
> >>>        enum intel_engine_id id;
> >>> @@ -183,7 +184,7 @@ static void engines_sample(struct drm_i915_private *dev_priv)
> >>>                val = !i915_seqno_passed(current_seqno, last_seqno);
> >>>    
> >>>                update_sample(&engine->pmu.sample[I915_SAMPLE_BUSY],
> >>> -                           PERIOD, val);
> >>> +                           period, val);
> >>>    
> >>>                if (val && (engine->pmu.enable &
> >>>                    (BIT(I915_SAMPLE_WAIT) | BIT(I915_SAMPLE_SEMA)))) {
> >>> @@ -195,10 +196,10 @@ static void engines_sample(struct drm_i915_private *dev_priv)
> >>>                }
> >>>    
> >>>                update_sample(&engine->pmu.sample[I915_SAMPLE_WAIT],
> >>> -                           PERIOD, !!(val & RING_WAIT));
> >>> +                           period, !!(val & RING_WAIT));
> >>>    
> >>>                update_sample(&engine->pmu.sample[I915_SAMPLE_SEMA],
> >>> -                           PERIOD, !!(val & RING_WAIT_SEMAPHORE));
> >>> +                           period, !!(val & RING_WAIT_SEMAPHORE));
> >>>        }
> >>>    
> >>>        if (fw)
> >>> @@ -207,7 +208,7 @@ static void engines_sample(struct drm_i915_private *dev_priv)
> >>>        intel_runtime_pm_put(dev_priv);
> >>>    }
> >>>    
> >>> -static void frequency_sample(struct drm_i915_private *dev_priv)
> >>> +static void frequency_sample(struct drm_i915_private *dev_priv, u64 period)
> >>
> >> Period is unused in this function.
> >>
> >> But more importantly that leads to a problem. When reading the counter
> >> the frequencies accumulator is divided by FREQUENCY define, which is
> >> inverse of PERIOD. If the error is big enough to mess up the engines
> >> sampling, is it big enough to affect the frequencies as well?
> > 
> > Yes, but fixing up frequencies I left for another patch, because it's
> > going to be more involved (having to choose the divider more carefully)
> > and would you believe it, but CI only complains about busy sampling ;)
> > 
> > I passed in the period as a reminder.
> 
> It might only remind someone to remove the unused variable so I am not 
> sure it is worth it.
> 
> >> Improving that would need average frequency between two counter reads.
> >> Which looks tricky to shoehorn into the pmu api. Maybe primitive running
> >> average would do.
> > 
> > My plan was to expose cycles (Frequency x time) to the user, and then
> > they calculate frequency by the comparing their own samples. (Since we
> > give them (time, samples) for each pmu read).
> 
> Frequency times which time? On read? Don't exactly follow. :(

On sample: sample[FREQ] += period * instantaneous_measurement

The (perf_event_read) caller has to compute freq by d_cycles / d_time.
i.e. we don't have a frequency sampler, but a cycles sampler. I think
this is a big enough change that we'd have to declare new ABI (deprecate
the old samplers, add new).

[snip]

> I am also thinking that with this approach we could start allowing timer 
> slack, since we are measuring it anyway. It would release back some of 
> the added cost of time queries. Well, not that they are significant. 
> Digression anyway.

Interesting, yeah.
 
> I want to understand why this is a problem on IVB and what is the 
> solution for frequency counters.

I can only say that from the looks of it, ivb has very poor hrtimer
resolution (I wonder if tsc/hpet are being used!). I think ivb was
before art? I too do not know quite why ivb is so special, but it is the
only one to have consistency failed in all drm-tip shard runs.
-Chris


More information about the Intel-gfx mailing list