[Intel-gfx] [PATCH] drm/i915: Do not use iowait while waiting for the GPU

Sat Jul 28 16:27:46 UTC 2018

Quoting Francisco Jerez (2018-07-28 06:20:12)
> Chris Wilson <chris at chris-wilson.co.uk> writes:
> 
> > A recent trend for cpufreq is to boost the CPU frequencies for
> > iowaiters, in particularly to benefit high frequency I/O. We do the same
> > and boost the GPU clocks to try and minimise time spent waiting for the
> > GPU. However, as the igfx and CPU share the same TDP, boosting the CPU
> > frequency will result in the GPU being throttled and its frequency being
> > reduced. Thus declaring iowait negatively impacts on GPU throughput.
> >
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107410
> > References: 52ccc4314293 ("cpufreq: intel_pstate: HWP boost performance on IO wakeup")
> 
> This patch causes up to ~13% performance regressions (with significance
> 5%) on several latency-sensitive tests on my BXT:
> 
>  jxrendermark/rendering-test=Linear Gradient Blend/rendering-size=128x128:     XXX ±35.69% x53 -> XXX ±32.57% x61 d=-13.52% ±31.88% p=2.58%
>  jxrendermark/rendering-test=Transformed Blit Bilinear/rendering-size=128x128: XXX ±3.51% x21 ->  XXX ±3.77% x21   d=-12.08% ±3.41% p=0.00%
>  gtkperf/gtk-test=GtkComboBox:                                                 XXX ±1.90% x19 ->  XXX ±1.59% x20    d=-4.74% ±1.71% p=0.00%
>  x11perf/test=500px Compositing From Pixmap To Window:                         XXX ±2.35% x21 ->  XXX ±1.73% x21    d=-2.69% ±2.04% p=0.01%
>  qgears2/render-backend=XRender Extension/test-mode=Text:                      XXX ±0.38% x21 ->  XXX ±0.40% x25    d=-2.20% ±0.38% p=0.00%
>  x11perf/test=500px Compositing From Pixmap To Window:                         XXX ±2.78% x53 ->  XXX ±2.27% x61    d=-1.77% ±2.50% p=0.03%
> 
> It's unsurprising to see latency-sensitive workloads relying on the
> lower latency offered by io_schedule_timeout(), since the CPUFREQ
> governor will have substantial downward bias without it, in response to
> the intermittent CPU usage pattern of those benchmarks.

Fwiw, I have a better example, gem_sync --run store-default. This test
waits on a short batch,

with io_schedule_timeout:
Completed 987136 cycles: 152.092 us

with schedule_timeout:
Completed 157696 cycles: 956.403 us

Though note that for a no-op batch, we see no difference as the sleep is
short enough, both take on average 52us. But microbenchmarks be micro.
-Chris