[Intel-gfx] [PATCH 2/3] drm/i915: Boost RPS frequency for CPU stalls

Wed Oct 2 23:57:41 CEST 2013

On Wed, Oct 02, 2013 at 11:26:47AM -0700, Jesse Barnes wrote:
> On Wed, 2 Oct 2013 01:33:24 +0100
> Chris Wilson <chris at chris-wilson.co.uk> wrote:
> 
> > On Tue, Oct 01, 2013 at 03:39:40PM -0700, Jesse Barnes wrote:
> > > On Tue, 1 Oct 2013 23:23:32 +0100
> > > Chris Wilson <chris at chris-wilson.co.uk> wrote:
> > > 
> > > > On Tue, Oct 01, 2013 at 02:54:26PM -0700, Jesse Barnes wrote:
> > > > > On Wed, 25 Sep 2013 17:34:56 +0100
> > > > > Chris Wilson <chris at chris-wilson.co.uk> wrote:
> > > > > 
> > > > > > +void gen6_rps_idle(struct drm_i915_private *dev_priv)
> > > > > > +{
> > > > > > +	mutex_lock(&dev_priv->rps.hw_lock);
> > > > > > +	if (dev_priv->info->is_valleyview)
> > > > > > +		valleyview_set_rps(dev_priv->dev, dev_priv->rps.min_delay);
> > > > > > +	else
> > > > > > +		gen6_set_rps(dev_priv->dev, dev_priv->rps.min_delay);
> > > > > > +	mutex_unlock(&dev_priv->rps.hw_lock);
> > > > > > +}
> > > > > 
> > > > > Looks pretty good, but I think these should be rpe_delay instead.  Not
> > > > > much point in going down to a less efficient frequency...
> > > > 
> > > > Less efficient for what? My concern here is only with power draw when
> > > > idle. As soon as we start to render again (well very shortly afterwards
> > > > with this particular iteration) we bump up to rpe and then beyond.
> > > > 
> > > > Correct me if I am wrong but rpe is an inflection point rather than a
> > > > minumum?
> > > 
> > > So yes, running at a lower than RPe freq will use less power, but it'll
> > > also be less efficient (perf/power) than doing the same rendering at
> > > RPe.
> > 
> > Right, so I think so long as autotuning works, we can run at low power,
> > low efficiency for as long as that is capable of sustaining the desired
> > throughput and latency. And the heuristics we have here are pretty good
> > at detecting when more power is required.
> 
> The downside is that we'll take longer to enter RC6 if we run at the
> slower speed.   And supposedly, running at RPe so we get to RC6 a
> little faster ends up saving more power than running slower, even
> though running slower uses less power while active.
> 
> Or have you taken that into account here?

No, that is a factor I had not considered. I had been concerned about
monitoring latency whilst minimising frequency and using the power gauge
as the ultimate measure of success. One thing that is apparent with ivb,
is that the power gauge (at least) is dependent upon workload. That is
you can set high frequencies, but if the ring is idle (but not in rc6)
then it consumes relatively little power. (That still may be 1W on the
big GPUs, but it does mean that at low frequencies there is not a lot of
difference between rc6 on/off, a few 10s of mW, and we are concerned
with a window of about 10ms.) I suspect the difference is likely to be
in the noise for an idle desktop workload and would only really show
itself in a synthetic benchmark.

So we want to race-to-idle, except for when we don't want to
race-to-idle. Seems like a good reason to keep improving our measuring
tools.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre