[Intel-gfx] [PATCH] drm/i915: Revert RPS UP_EI value for SandyBridge and IvyBridge

Fri Dec 21 02:57:04 CET 2012

On Thu, Dec 20, 2012 at 03:34:11PM -0800, Jesse Barnes wrote:
> On Wed, 24 Oct 2012 12:05:35 -0700
> Ben Widawsky <ben at bwidawsk.net> wrote:
> 
> > On Sun, 21 Oct 2012 15:44:02 +0100
> > Chris Wilson <chris at chris-wilson.co.uk> wrote:
> > 
> > > Even though we do not use the EI mode for determining when to change GPU
> > > frequencies for RPS, changing this value causes no up interrupts to be
> > > generated whilst an OpenGL client runs.
> > > 
> > > Fixes regression from commit 1ee9ae3244c4789f3184c5123f3b2d7e405b3f4c
> > > Author: Daniel Vetter <daniel.vetter at ffwll.ch>
> > > Date:   Wed Aug 15 10:41:45 2012 +0200
> > > 
> > >     drm/i915: use hsw rps tuning values everywhere on gen6+
> > > 
> > > Reported-by: Eric Anholt <eric at anholt.net>
> > > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > > Cc: Eric Anholt <eric at anholt.net>
> > > Cc: Ben Widawsky <ben at bwidawsk.net>
> > > Cc: Daniel Vetter <daniel.vetter at ffwll.ch>
> > > Cc: stable at vger.kernel.org
> > > ---
> > >  drivers/gpu/drm/i915/intel_pm.c |    2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> > > index 81e88c2..15b585e 100644
> > > --- a/drivers/gpu/drm/i915/intel_pm.c
> > > +++ b/drivers/gpu/drm/i915/intel_pm.c
> > > @@ -2493,7 +2493,7 @@ static void gen6_enable_rps(struct drm_device *dev)
> > >  
> > >  	I915_WRITE(GEN6_RP_UP_THRESHOLD, 59400);
> > >  	I915_WRITE(GEN6_RP_DOWN_THRESHOLD, 245000);
> > > -	I915_WRITE(GEN6_RP_UP_EI, 66000);
> > > +	I915_WRITE(GEN6_RP_UP_EI, IS_HASWELL(dev) ? 66000 : 100000);
> > >  	I915_WRITE(GEN6_RP_DOWN_EI, 350000);
> > >  
> > >  	I915_WRITE(GEN6_RP_IDLE_HYSTERSIS, 10);
> > 
> > 
> > 
> > 
> > I've not verified this with interrupts, but simply reading back the
> > current frequency using the sysfs interface.
> > 
> > What I've seen running xonotic on Ivybridge is that we do bump the
> > frequency initially, and then stops. Nearing the end of the demo, we
> > again raise the frequency. Note that on my system, both before, and
> > after this patch, I am able to get to the max GPU frequency with the
> > xonotic demo.
> > 
> > Specifically, on my IVB which has a range of 350->1100 with an RP1 of
> > 650. I see the following (the demo is roughly 2 minutes)
> > 
> > without patch:
> > Within a few seconds we cycle up to 750
> > Nothing for about 30 seconds
> > very slowly cycle up to 1100 (*just* before the demo ends)
> > demo ends; throttle down to 350 quickly
> > 
> > with patch:
> > Within a few seconds we cycle up to 1000
> > Nothing for about 30 seconds
> > cycle up to 1100
> > demo ends; throttle down to 350 slowly
> > 
> > I think if this fixes someones critical issue, it's great, but
> > unfortunately I do not see the problem the patch claims to fix.
> > Furthermore, none of us can really make sense of why this has the effect
> > that it does, but I believe a lot of that is because the workloads we
> > run (in this case xonotic) are very blackbox.
> > 
> > Personally, on this IVB, I think the behavior before the patch is more
> > desirable because it stays near RP1 for a longer period of time, and
> > drops to RP0 quickly (but it's definitely a matter of opinion).
> 
> It's a balance between power and performance.  Running at the higher
> freqs is definitely less power efficient, but without this patch we
> definitely have a performance regression (running at 750 instead of
> 1000 MHz for most of the demo, plus the cases Chris saw).
> 
> But we also don't want to prevent RC6 entry on vulnerable systems, so
> maybe we need two sets of values or a broader set of changes that work
> better for a swath of workloads.

To me this sounds like an idea which is better on paper than in
practice. Unless we can generalize the various sets of values with knobs
for users to pick from, it will really be overwhelming (I feel), and
will probably provide more bad data than good. Fine grained knobs are
already given via registers, and really pestering users can toy with
those, if they want. Perhaps it's on us to add better comments to
i915_reg.h to explain how we think it works a bit better?

> 
> Also, the initial patch to use the HSW values implies that the values
> are applicable across generations.  They're not.  They're very specific
> to a given generation and potentially even different versions of a
> given generation, so sharing the values is probably a bad idea in
> general.
> 
> -- 
> Jesse Barnes, Intel Open Source Technology Center

To set the record straight, I am pretty sure we can agree on a few
things.
1. We don't really know wtf.
2. There will be no globally good set of values.
2a. There may even be no globally decent set of values.
3. 1 workload is insufficient to determine anything

If a workload is obviously too GPU render heavy, I think a decent
solution is to just use the sysfs parameters to force a specific min. We
don't really need to be skittish about setting this as really *most*
things are too GPU render heavy for us at the moment. For the general
case I'll assert the performance 'profile' is the right thing unless
someone proves otherwise, and here we just want the set of values which
throttle down faster than they throttle up.

I think there is still some value in the proof of concept stuff you were
hoping to do around a system wide GPU power governor thing. Maybe we can
just set the RPS autothrottle values to be as dumb as possible, and then
use that governor to set the min/max frequencies as it sees fit. Another
similar option is to try to use SW RC6. If we had an intern, I'd even
suggest to pick some decent ranges for all the thresholds and have that
intern run benchmarks with all the different values.

Okay, done wasting your time now..
-- 
Ben Widawsky, Intel Open Source Technology Center