[Intel-gfx] [PATCH] drm/i915/pmu: Measure sampler intervals

Chris Wilson chris at chris-wilson.co.uk
Fri May 25 17:45:20 UTC 2018


Quoting Tvrtko Ursulin (2018-05-25 18:31:35)
> 
> On 25/05/2018 18:11, Chris Wilson wrote:
> > hrtimer is not reliable enough to assume fixed intervals, and so even
> > coarse accuracy (in the face of kasan and similar heavy debugging) we
> > need to measure the actual interval between sample.
> 
> It doesn't even average out to something acceptable under such Kconfigs? 
> Horror.. precise but inaccurate. /O\
> 
> > While using a single timestamp to compute the interval does not allow
> > very fine accuracy (consider the impact of a slow forcewake between
> > different samples after the timestamp is read) is much better than
> > assuming the interval.
> > 
> > Testcase: igt/perf_pmu #ivb
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> > ---
> >   drivers/gpu/drm/i915/i915_pmu.c | 20 +++++++++++++-------
> >   drivers/gpu/drm/i915/i915_pmu.h |  4 ++++
> >   2 files changed, 17 insertions(+), 7 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
> > index dc87797db500..f5087515eb43 100644
> > --- a/drivers/gpu/drm/i915/i915_pmu.c
> > +++ b/drivers/gpu/drm/i915/i915_pmu.c
> > @@ -127,6 +127,7 @@ static void __i915_pmu_maybe_start_timer(struct drm_i915_private *i915)
> >   {
> >       if (!i915->pmu.timer_enabled && pmu_needs_timer(i915, true)) {
> >               i915->pmu.timer_enabled = true;
> > +             i915->pmu.timestamp = ktime_get();
> >               hrtimer_start_range_ns(&i915->pmu.timer,
> >                                      ns_to_ktime(PERIOD), 0,
> >                                      HRTIMER_MODE_REL_PINNED);
> > @@ -160,7 +161,7 @@ update_sample(struct i915_pmu_sample *sample, u32 unit, u32 val)
> >       sample->cur += mul_u32_u32(val, unit);
> >   }
> >   
> > -static void engines_sample(struct drm_i915_private *dev_priv)
> > +static void engines_sample(struct drm_i915_private *dev_priv, u64 period)
> >   {
> >       struct intel_engine_cs *engine;
> >       enum intel_engine_id id;
> > @@ -183,7 +184,7 @@ static void engines_sample(struct drm_i915_private *dev_priv)
> >               val = !i915_seqno_passed(current_seqno, last_seqno);
> >   
> >               update_sample(&engine->pmu.sample[I915_SAMPLE_BUSY],
> > -                           PERIOD, val);
> > +                           period, val);
> >   
> >               if (val && (engine->pmu.enable &
> >                   (BIT(I915_SAMPLE_WAIT) | BIT(I915_SAMPLE_SEMA)))) {
> > @@ -195,10 +196,10 @@ static void engines_sample(struct drm_i915_private *dev_priv)
> >               }
> >   
> >               update_sample(&engine->pmu.sample[I915_SAMPLE_WAIT],
> > -                           PERIOD, !!(val & RING_WAIT));
> > +                           period, !!(val & RING_WAIT));
> >   
> >               update_sample(&engine->pmu.sample[I915_SAMPLE_SEMA],
> > -                           PERIOD, !!(val & RING_WAIT_SEMAPHORE));
> > +                           period, !!(val & RING_WAIT_SEMAPHORE));
> >       }
> >   
> >       if (fw)
> > @@ -207,7 +208,7 @@ static void engines_sample(struct drm_i915_private *dev_priv)
> >       intel_runtime_pm_put(dev_priv);
> >   }
> >   
> > -static void frequency_sample(struct drm_i915_private *dev_priv)
> > +static void frequency_sample(struct drm_i915_private *dev_priv, u64 period)
> 
> Period is unused in this function.
> 
> But more importantly that leads to a problem. When reading the counter 
> the frequencies accumulator is divided by FREQUENCY define, which is 
> inverse of PERIOD. If the error is big enough to mess up the engines 
> sampling, is it big enough to affect the frequencies as well?

Yes, but fixing up frequencies I left for another patch, because it's
going to be more involved (having to choose the divider more carefully)
and would you believe it, but CI only complains about busy sampling ;)

I passed in the period as a reminder.
 
> Improving that would need average frequency between two counter reads. 
> Which looks tricky to shoehorn into the pmu api. Maybe primitive running 
> average would do.

My plan was to expose cycles (Frequency x time) to the user, and then
they calculate frequency by the comparing their own samples. (Since we
give them (time, samples) for each pmu read).

> >   {
> >       if (dev_priv->pmu.enable &
> >           config_enabled_mask(I915_PMU_ACTUAL_FREQUENCY)) {
> > @@ -237,12 +238,17 @@ static enum hrtimer_restart i915_sample(struct hrtimer *hrtimer)
> >   {
> >       struct drm_i915_private *i915 =
> >               container_of(hrtimer, struct drm_i915_private, pmu.timer);
> > +     ktime_t now, period;
> >   
> >       if (!READ_ONCE(i915->pmu.timer_enabled))
> >               return HRTIMER_NORESTART;
> >   
> > -     engines_sample(i915);
> > -     frequency_sample(i915);
> > +     now = ktime_get();
> > +     period = ktime_sub(now, i915->pmu.timestamp);
> > +     i915->pmu.timestamp = now;
> > +
> > +     engines_sample(i915, period);
> > +     frequency_sample(i915, period);
> >   
> >       hrtimer_forward_now(hrtimer, ns_to_ktime(PERIOD));
> >       return HRTIMER_RESTART;
> > diff --git a/drivers/gpu/drm/i915/i915_pmu.h b/drivers/gpu/drm/i915/i915_pmu.h
> > index 2ba735299f7c..0f1e4642077e 100644
> > --- a/drivers/gpu/drm/i915/i915_pmu.h
> > +++ b/drivers/gpu/drm/i915/i915_pmu.h
> > @@ -52,6 +52,10 @@ struct i915_pmu {
> >        * @timer: Timer for internal i915 PMU sampling.
> >        */
> >       struct hrtimer timer;
> > +     /**
> > +      * @timestamp: Timestamp of last internal i915 PMU sampling.
> > +      */
> > +     ktime_t timestamp;
> >       /**
> >        * @enable: Bitmask of all currently enabled events.
> >        *
> > 
> 
> Patch looks okay, just loses the some of the optimisation potential so I 
> am guessing we won't be thinking about replacing multiplies and divides 
> with shift any more. :)
> 
> But the question of frequency counters is now bothering me.
> 
> And if this problem is limited to Kasan then how much we want to 
> complicate things to make that work?

Not just kasan, but ivb really. That's the only to have never worked
whatever the config. kasan affects more CI machines.
-Chris


More information about the Intel-gfx mailing list