[Intel-gfx] [PATCH] drm/i915: Do not use iowait while waiting for the GPU

Mon Jul 30 12:56:30 UTC 2018

Quoting Francisco Jerez (2018-07-29 20:29:42)
> Chris Wilson <chris at chris-wilson.co.uk> writes:
> 
> > Quoting Francisco Jerez (2018-07-28 21:18:50)
> >> Chris Wilson <chris at chris-wilson.co.uk> writes:
> >> 
> >> > Quoting Francisco Jerez (2018-07-28 06:20:12)
> >> >> Chris Wilson <chris at chris-wilson.co.uk> writes:
> >> >> 
> >> >> > A recent trend for cpufreq is to boost the CPU frequencies for
> >> >> > iowaiters, in particularly to benefit high frequency I/O. We do the same
> >> >> > and boost the GPU clocks to try and minimise time spent waiting for the
> >> >> > GPU. However, as the igfx and CPU share the same TDP, boosting the CPU
> >> >> > frequency will result in the GPU being throttled and its frequency being
> >> >> > reduced. Thus declaring iowait negatively impacts on GPU throughput.
> >> >> >
> >> >> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107410
> >> >> > References: 52ccc4314293 ("cpufreq: intel_pstate: HWP boost performance on IO wakeup")
> >> >> 
> >> >> This patch causes up to ~13% performance regressions (with significance
> >> >> 5%) on several latency-sensitive tests on my BXT:
> >> >> 
> >> >>  jxrendermark/rendering-test=Linear Gradient Blend/rendering-size=128x128:     XXX ±35.69% x53 -> XXX ±32.57% x61 d=-13.52% ±31.88% p=2.58%
> >> >
> >> 
> >> The jxrendermark Linear Gradient Blend test-case had probably the
> >> smallest effect size of all the regressions I noticed...  Can you take a
> >> look at any of the other ones instead?
> >
> > It was the biggest in the list, was it not? I didn't observe anything of
> > note in a quick look at x11perf, but didn't let it run for a good sample
> > size. They didn't seem to be as relevant as jxrendermark so I went and
> > dug that out.
> >
> 
> That was the biggest regression in absolute value, but the smallest in
> effect size (roughly 0.4 standard deviations).

d=-13.52% wasn't the delta between the two runs?

Sorry, but it appears to be redacted beyond my comprehension.

> >> > Curious, as this is just a bunch of composites and as with the others,
> >> > should never be latency sensitive (at least under bare X11).
> >> 
> >> They are largely latency-sensitive due to the poor pipelining they seem
> >> to achieve between their GPU rendering work and the X11 thread.
> >
> > Only the X11 thread is touching the GPU, and in the cases I looked at
> > it, we were either waiting for the ring to drain or on throttling.
> > Synchronisation with the GPU was only for draining the queue on timing,
> > and the cpu was able to stay ahead during the benchmark.
> >
> 
> Apparently the CPU doesn't get ahead enough for the GPU to be
> consistently loaded, which prevents us from hiding the latency of the
> CPU computation even in those cases.

The curse of reproducibility. On my bxt, I don't see the issue, so we
have a significant difference in setup.
-Chris