[Intel-gfx] [PATCH] drm/i915: Do not use iowait while waiting for the GPU

Sat Jul 28 20:58:35 UTC 2018

Quoting Francisco Jerez (2018-07-28 21:18:50)
> Chris Wilson <chris at chris-wilson.co.uk> writes:
> 
> > Quoting Francisco Jerez (2018-07-28 06:20:12)
> >> Chris Wilson <chris at chris-wilson.co.uk> writes:
> >> 
> >> > A recent trend for cpufreq is to boost the CPU frequencies for
> >> > iowaiters, in particularly to benefit high frequency I/O. We do the same
> >> > and boost the GPU clocks to try and minimise time spent waiting for the
> >> > GPU. However, as the igfx and CPU share the same TDP, boosting the CPU
> >> > frequency will result in the GPU being throttled and its frequency being
> >> > reduced. Thus declaring iowait negatively impacts on GPU throughput.
> >> >
> >> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107410
> >> > References: 52ccc4314293 ("cpufreq: intel_pstate: HWP boost performance on IO wakeup")
> >> 
> >> This patch causes up to ~13% performance regressions (with significance
> >> 5%) on several latency-sensitive tests on my BXT:
> >> 
> >>  jxrendermark/rendering-test=Linear Gradient Blend/rendering-size=128x128:     XXX ±35.69% x53 -> XXX ±32.57% x61 d=-13.52% ±31.88% p=2.58%
> >
> 
> The jxrendermark Linear Gradient Blend test-case had probably the
> smallest effect size of all the regressions I noticed...  Can you take a
> look at any of the other ones instead?

It was the biggest in the list, was it not? I didn't observe anything of
note in a quick look at x11perf, but didn't let it run for a good sample
size. They didn't seem to be as relevant as jxrendermark so I went and
dug that out.

> > Curious, as this is just a bunch of composites and as with the others,
> > should never be latency sensitive (at least under bare X11).
> 
> They are largely latency-sensitive due to the poor pipelining they seem
> to achieve between their GPU rendering work and the X11 thread.

Only the X11 thread is touching the GPU, and in the cases I looked at
it, we were either waiting for the ring to drain or on throttling.
Synchronisation with the GPU was only for draining the queue on timing,
and the cpu was able to stay ahead during the benchmark.

Off the top of my head, for X to be latency sensitive you need to mix
client and Xserver rendering, along the lines of Paint; GetImage, in the
extreme becoming gem_sync. Adding a compositor is also interesting for
the context switching will prevent us merging requests (but that all
depends on the frequency of compositor updates ofc), and we would
need more CPU and require reasonably low latency (less than the next
request) to keep the GPU busy. However, that is driven directly off
interrupts, iowait isn't a factor -- but your hook could still be useful
to provide pm_qos.
-Chris