[Intel-gfx] [RFC] drm/i915/pmu: Micro-optimize sampling loop

Tvrtko Ursulin tursulin at ursulin.net
Wed Jan 31 16:02:38 UTC 2018


From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>

By carefully chosing slightly unsynchronized values for timer frequency
and period, we can remove the multiplications from the sampling loop and
replace them with shifts only.

Downside is that the counter read callback now has to do a divide with a
non-power-of-two, but the rationale is that sampling loop which runs at
~200 Hz, multiplied by number of engines, is more important than less
frequent counter read-out. Furthermore, the divide in counter read-out
seems to be optimized by the compiler to some shifts and multiply.

We can only do all this at the expense of introducing a systematic error
to the sampling counters to the amount of 0.18%. At the same time we are
increasing the sampling rate from 200 to 238 Hz which may cancel some of
this out.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
Cc: Chris Wilson <chris at chris-wilson.co.uk>
---
Motivated by Chris' insisting to use power-of-two scale factor in the
engine queue depth PMU series. I was actually reluctant to give in there
so this is even more questionable. :)

I have no idea if it is cheaper to run the sampling timer at 200Hz with
some multiplies, or run it at 238Hz with only shifts.

FWIW 0.18% systematic error at least doesn't sound like a big deal when
sampling counters are concerned.
---
 drivers/gpu/drm/i915/i915_pmu.c | 16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index 3e1edc26f252..af732c086b86 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -29,9 +29,19 @@
 #include "i915_pmu.h"
 #include "intel_ringbuffer.h"
 
-/* Frequency for the sampling timer for events which need it. */
-#define FREQUENCY 200
-#define PERIOD max_t(u64, 10000, NSEC_PER_SEC / FREQUENCY)
+/*
+ * Frequency for the sampling timer for events which need it.
+ *
+ * The relationship between frequency and period is:
+ *   PERIOD(ns) = 1e9 / FREQUENCY(Hz)
+ *
+ * We set the period to a power-of-two value 0x400000 for simpler computations
+ * inside the sampling timer, at the expense of introducting a systematic error
+ * of 0.18%. This is caused by frequency (using as divisor for reading out
+ * certain counters) being rounded to 238, while 1e9 / 0x400000 = 238.418579102.
+ */
+#define FREQUENCY 238
+#define PERIOD 0x400000
 
 #define ENGINE_SAMPLE_MASK \
 	(BIT(I915_SAMPLE_BUSY) | \
-- 
2.14.1



More information about the Intel-gfx mailing list