[Intel-gfx] [RFC] drm/i915/pmu: Micro-optimize sampling loop

Chris Wilson chris at chris-wilson.co.uk
Wed Jan 31 16:07:02 UTC 2018


Quoting Tvrtko Ursulin (2018-01-31 16:02:38)
> From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> 
> By carefully chosing slightly unsynchronized values for timer frequency
> and period, we can remove the multiplications from the sampling loop and
> replace them with shifts only.
> 
> Downside is that the counter read callback now has to do a divide with a
> non-power-of-two, but the rationale is that sampling loop which runs at
> ~200 Hz, multiplied by number of engines, is more important than less
> frequent counter read-out. Furthermore, the divide in counter read-out
> seems to be optimized by the compiler to some shifts and multiply.
> 
> We can only do all this at the expense of introducing a systematic error
> to the sampling counters to the amount of 0.18%. At the same time we are
> increasing the sampling rate from 200 to 238 Hz which may cancel some of
> this out.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> Cc: Chris Wilson <chris at chris-wilson.co.uk>
> ---
> Motivated by Chris' insisting to use power-of-two scale factor in the
> engine queue depth PMU series. I was actually reluctant to give in there
> so this is even more questionable. :)
> 
> I have no idea if it is cheaper to run the sampling timer at 200Hz with
> some multiplies, or run it at 238Hz with only shifts.

I can't say I have any insight into the timer wheel or hrtimer either.
Let's watch.

> FWIW 0.18% systematic error at least doesn't sound like a big deal when
> sampling counters are concerned.

Yeah, since even pinning it down to a 5% systematic error is hard, but
we'll see if this 0.18% is the straw that breaks CI back!
-Chris


More information about the Intel-gfx mailing list