[Intel-gfx] [RFC] drm/i915/pmu: Micro-optimize sampling loop
Chris Wilson
chris at chris-wilson.co.uk
Wed Jan 31 16:07:02 UTC 2018
Quoting Tvrtko Ursulin (2018-01-31 16:02:38)
> From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>
> By carefully chosing slightly unsynchronized values for timer frequency
> and period, we can remove the multiplications from the sampling loop and
> replace them with shifts only.
>
> Downside is that the counter read callback now has to do a divide with a
> non-power-of-two, but the rationale is that sampling loop which runs at
> ~200 Hz, multiplied by number of engines, is more important than less
> frequent counter read-out. Furthermore, the divide in counter read-out
> seems to be optimized by the compiler to some shifts and multiply.
>
> We can only do all this at the expense of introducing a systematic error
> to the sampling counters to the amount of 0.18%. At the same time we are
> increasing the sampling rate from 200 to 238 Hz which may cancel some of
> this out.
>
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> Cc: Chris Wilson <chris at chris-wilson.co.uk>
> ---
> Motivated by Chris' insisting to use power-of-two scale factor in the
> engine queue depth PMU series. I was actually reluctant to give in there
> so this is even more questionable. :)
>
> I have no idea if it is cheaper to run the sampling timer at 200Hz with
> some multiplies, or run it at 238Hz with only shifts.
I can't say I have any insight into the timer wheel or hrtimer either.
Let's watch.
> FWIW 0.18% systematic error at least doesn't sound like a big deal when
> sampling counters are concerned.
Yeah, since even pinning it down to a 5% systematic error is hard, but
we'll see if this 0.18% is the straw that breaks CI back!
-Chris
More information about the Intel-gfx
mailing list