[Intel-gfx] [PATCH] drm/i915: Do not use iowait while waiting for the GPU
Chris Wilson
chris at chris-wilson.co.uk
Sat Jul 28 16:27:46 UTC 2018
Quoting Francisco Jerez (2018-07-28 06:20:12)
> Chris Wilson <chris at chris-wilson.co.uk> writes:
>
> > A recent trend for cpufreq is to boost the CPU frequencies for
> > iowaiters, in particularly to benefit high frequency I/O. We do the same
> > and boost the GPU clocks to try and minimise time spent waiting for the
> > GPU. However, as the igfx and CPU share the same TDP, boosting the CPU
> > frequency will result in the GPU being throttled and its frequency being
> > reduced. Thus declaring iowait negatively impacts on GPU throughput.
> >
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107410
> > References: 52ccc4314293 ("cpufreq: intel_pstate: HWP boost performance on IO wakeup")
>
> This patch causes up to ~13% performance regressions (with significance
> 5%) on several latency-sensitive tests on my BXT:
>
> jxrendermark/rendering-test=Linear Gradient Blend/rendering-size=128x128: XXX ±35.69% x53 -> XXX ±32.57% x61 d=-13.52% ±31.88% p=2.58%
> jxrendermark/rendering-test=Transformed Blit Bilinear/rendering-size=128x128: XXX ±3.51% x21 -> XXX ±3.77% x21 d=-12.08% ±3.41% p=0.00%
> gtkperf/gtk-test=GtkComboBox: XXX ±1.90% x19 -> XXX ±1.59% x20 d=-4.74% ±1.71% p=0.00%
> x11perf/test=500px Compositing From Pixmap To Window: XXX ±2.35% x21 -> XXX ±1.73% x21 d=-2.69% ±2.04% p=0.01%
> qgears2/render-backend=XRender Extension/test-mode=Text: XXX ±0.38% x21 -> XXX ±0.40% x25 d=-2.20% ±0.38% p=0.00%
> x11perf/test=500px Compositing From Pixmap To Window: XXX ±2.78% x53 -> XXX ±2.27% x61 d=-1.77% ±2.50% p=0.03%
>
> It's unsurprising to see latency-sensitive workloads relying on the
> lower latency offered by io_schedule_timeout(), since the CPUFREQ
> governor will have substantial downward bias without it, in response to
> the intermittent CPU usage pattern of those benchmarks.
Fwiw, I have a better example, gem_sync --run store-default. This test
waits on a short batch,
with io_schedule_timeout:
Completed 987136 cycles: 152.092 us
with schedule_timeout:
Completed 157696 cycles: 956.403 us
Though note that for a no-op batch, we see no difference as the sleep is
short enough, both take on average 52us. But microbenchmarks be micro.
-Chris
More information about the Intel-gfx
mailing list