[Intel-gfx] [PATCH] drm/i915: Do not use iowait while waiting for the GPU

Sun Jul 29 19:29:42 UTC 2018

Chris Wilson <chris at chris-wilson.co.uk> writes:

> Quoting Francisco Jerez (2018-07-28 21:18:50)
>> Chris Wilson <chris at chris-wilson.co.uk> writes:
>> 
>> > Quoting Francisco Jerez (2018-07-28 06:20:12)
>> >> Chris Wilson <chris at chris-wilson.co.uk> writes:
>> >> 
>> >> > A recent trend for cpufreq is to boost the CPU frequencies for
>> >> > iowaiters, in particularly to benefit high frequency I/O. We do the same
>> >> > and boost the GPU clocks to try and minimise time spent waiting for the
>> >> > GPU. However, as the igfx and CPU share the same TDP, boosting the CPU
>> >> > frequency will result in the GPU being throttled and its frequency being
>> >> > reduced. Thus declaring iowait negatively impacts on GPU throughput.
>> >> >
>> >> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107410
>> >> > References: 52ccc4314293 ("cpufreq: intel_pstate: HWP boost performance on IO wakeup")
>> >> 
>> >> This patch causes up to ~13% performance regressions (with significance
>> >> 5%) on several latency-sensitive tests on my BXT:
>> >> 
>> >>  jxrendermark/rendering-test=Linear Gradient Blend/rendering-size=128x128:     XXX ±35.69% x53 -> XXX ±32.57% x61 d=-13.52% ±31.88% p=2.58%
>> >
>> 
>> The jxrendermark Linear Gradient Blend test-case had probably the
>> smallest effect size of all the regressions I noticed...  Can you take a
>> look at any of the other ones instead?
>
> It was the biggest in the list, was it not? I didn't observe anything of
> note in a quick look at x11perf, but didn't let it run for a good sample
> size. They didn't seem to be as relevant as jxrendermark so I went and
> dug that out.
>

That was the biggest regression in absolute value, but the smallest in
effect size (roughly 0.4 standard deviations).

>> > Curious, as this is just a bunch of composites and as with the others,
>> > should never be latency sensitive (at least under bare X11).
>> 
>> They are largely latency-sensitive due to the poor pipelining they seem
>> to achieve between their GPU rendering work and the X11 thread.
>
> Only the X11 thread is touching the GPU, and in the cases I looked at
> it, we were either waiting for the ring to drain or on throttling.
> Synchronisation with the GPU was only for draining the queue on timing,
> and the cpu was able to stay ahead during the benchmark.
>

Apparently the CPU doesn't get ahead enough for the GPU to be
consistently loaded, which prevents us from hiding the latency of the
CPU computation even in those cases.

> Off the top of my head, for X to be latency sensitive you need to mix
> client and Xserver rendering, along the lines of Paint; GetImage, in the
> extreme becoming gem_sync. Adding a compositor is also interesting for
> the context switching will prevent us merging requests (but that all
> depends on the frequency of compositor updates ofc), and we would
> need more CPU and require reasonably low latency (less than the next
> request) to keep the GPU busy. However, that is driven directly off
> interrupts, iowait isn't a factor -- but your hook could still be useful
> to provide pm_qos.
> -Chris
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 227 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/intel-gfx/attachments/20180729/84661de2/attachment.sig>