[Intel-gfx] [PATCH] drm/i915/pmu: Do not assume fixed hrtimer period

Mon Jun 4 15:11:04 UTC 2018

Quoting Tvrtko Ursulin (2018-06-04 16:01:26)
> 
> On 04/06/2018 14:03, Chris Wilson wrote:
> > Quoting Chris Wilson (2018-06-04 13:59:12)
> >> Quoting Tvrtko Ursulin (2018-06-04 13:52:39)
> >>> From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> >>>
> >>> As Chris has discovered on his Ivybridge, and later automated test runs
> >>> have confirmed, on most of our platforms hrtimer faced with heavy GPU load
> >>> ca occasionally become sufficiently imprecise to affect PMU sampling
> >>
> >> s/ca/can/
> >>
> >>> calculations.
> >>>
> >>> This means we cannot assume sampling frequency is what we asked for, but
> >>> we need to measure the interval ourselves.
> >>>
> >>> This patch is similar to Chris' original proposal for per-engine counters,
> >>> but instead of introducing a new set to work around the problem with
> >>> frequency sampling, it swaps around the way internal frequency accounting
> >>> is done. Instead of accumulating current frequency and dividing by
> >>> sampling frequency on readout, it accumulates frequency scaled by each
> >>> period.
> >>
> >> My ulterior motive failed to win favour ;)
> >>   
> >>> Testcase: igt/perf_pmu/*busy* # snb, ivb, hsw
> >>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> >>> Suggested-by: Chris Wilson <chris at chris-wilson.co.uk>
> >>> Cc: Chris Wilson <chris at chris-wilson.co.uk>
> >> Reviewed-by: Chris Wilson <chris at chris-wilson.co.uk>
> > 
> > I should point out that even with this fix (or rather my patch), we can
> 
> I did not mean to steal it,

Don't mind, I view such patches as "this here is a problem, and what I
think can be done". I'm quite happy to be told "nope, the problem
is..."; the end result is we fix and move on.

> just that you seemed very uncompromising on 
> the new counters approach. If you prefer you can respin your patch with 
> this approach for frequency counters, it is fine by me.

I'm just not convinced yet by the frequency sampler, and I still feel
that we would be better off with cycles. I just haven't found a
convincing argument ;)

> > still see errors of 25-30us, enough to fail the test. However, without
> > the fix our errors can be orders of magnitude worse (e.g. reporting 80us
> > busy instead of 500us).
> 
> Yeah I guess if the timer is delayed it is delayed and all samples just 
> won't be there. I don't see a way to fix that elegantly. Even practically.

Right, but it's not the case that we aren't sampling. If the timer was
delayed entirely, then we would see 0 (start_val == end_val). I'm not
sure exactly what goes wrong, but it probably is related to the timer
being unreliable. (It's just that in this case we are sampling a square
wave, so really hard to mess up!)
-Chris