[Intel-gfx] [PATCH 9/9] drm/i915/execlists: Report GPU rendering as IO activity to cpufreq.

Wed Mar 28 08:02:52 UTC 2018

Quoting Francisco Jerez (2018-03-28 07:38:45)
> This allows cpufreq governors to realize when the system becomes
> non-CPU-bound due to GPU rendering activity, which will cause the
> intel_pstate LP controller to behave more conservatively: CPU energy
> usage will be reduced when there isn't a good chance for system
> performance to scale with CPU frequency.  This leaves additional TDP
> budget available for the GPU to reach higher frequencies, which is
> translated into an improvement in graphics performance to the extent
> that the workload remains TDP-limited (Most non-trivial graphics
> benchmarks out there improve significantly in TDP-constrained
> platforms, see the cover letter for some numbers).  If the workload
> isn't (anymore) TDP-limited performance should stay roughly constant,
> but energy usage will be divided by a similar factor.

And that's what I thought IPS was already meant to be achieving;
intelligent distribution between different units...

> The intel_pstate LP controller is only enabled on BXT+ small-core
> platforms at this point, so this shouldn't have any effect on other
> systems.

Although that's probably only a feature for big core :)

> Signed-off-by: Francisco Jerez <currojerez at riseup.net>
> ---
>  drivers/gpu/drm/i915/intel_lrc.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 3a69b367e565..721f915115bd 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -132,6 +132,7 @@
>   *
>   */
>  #include <linux/interrupt.h>
> +#include <linux/cpufreq.h>
>  
>  #include <drm/drmP.h>
>  #include <drm/i915_drm.h>
> @@ -379,11 +380,13 @@ execlists_context_schedule_in(struct i915_request *rq)
>  {
>         execlists_context_status_change(rq, INTEL_CONTEXT_SCHEDULE_IN);
>         intel_engine_context_in(rq->engine);
> +       cpufreq_io_active_begin();

Since you only care about a binary value for GPU activity, we don't need
to do this on each context, just between submitting the first request
and the final completion, i.e. couple this to EXECLISTS_ACTIVE_USER.

Haven't yet gone back to check how heavy io_active_begin/end are, but I
trust you appreciate that this code is particularly latency sensitive.
-Chris