[Intel-gfx] [PATCH] drm/i915: Make sample_c messages go faster on Haswell.
Ville Syrjälä
ville.syrjala at linux.intel.com
Fri Oct 31 10:27:33 CET 2014
On Thu, Oct 30, 2014 at 12:57:04PM -0700, Kenneth Graunke wrote:
> On Thursday, October 30, 2014 09:26:01 PM Ville Syrjälä wrote:
> > On Thu, Oct 30, 2014 at 10:32:38AM -0700, Kenneth Graunke wrote:
> > > On Thursday, October 30, 2014 01:01:30 PM Ville Syrjälä wrote:
> > > > On Thu, Oct 30, 2014 at 02:32:40AM -0700, Kenneth Graunke wrote:
> > > > > On Thursday, October 30, 2014 11:00:51 AM Ville Syrjälä wrote:
> > > > > > On Thu, Oct 30, 2014 at 10:50:03AM +0200, Ville Syrjälä wrote:
> > > > > > > On Wed, Oct 29, 2014 at 03:12:43PM -0700, Kenneth Graunke wrote:
> > > > > > > > Haswell significantly improved the performance of sampler_c
> > > messages,
> > > > > > > > but the optimization appears to be off by default. Later
> platforms
> > > > > > > > remove this bit, and apparently always enable the optimization.
> > > > > > > >
> > > > > > > > Improves performance in "Counter Strike: Global Offensive" by
> 18%
> > > > > > > > at default settings on Iris Pro. No Piglit regressions.
> > > > > > >
> > > > > > > Nice. We need more bits like this ;)
> > > > > > >
> > > > > > > >
> > > > > > > > Signed-off-by: Kenneth Graunke <kenneth at whitecape.org>
> > > > > > > > ---
> > > > > > > > drivers/gpu/drm/i915/i915_reg.h | 1 +
> > > > > > > > drivers/gpu/drm/i915/intel_pm.c | 4 ++++
> > > > > > > > 2 files changed, 5 insertions(+)
> > > > > > > >
> > > > > > > > diff --git a/drivers/gpu/drm/i915/i915_reg.h
> > > > > b/drivers/gpu/drm/i915/i915_reg.h
> > > > > > > > index 77fce96..340821a 100644
> > > > > > > > --- a/drivers/gpu/drm/i915/i915_reg.h
> > > > > > > > +++ b/drivers/gpu/drm/i915/i915_reg.h
> > > > > > > > @@ -5952,6 +5952,7 @@ enum punit_power_well {
> > > > > > > > #define HSW_ROW_CHICKEN3_L3_GLOBAL_ATOMICS_DISABLE (1 << 6)
> > > > > > > >
> > > > > > > > #define HALF_SLICE_CHICKEN3 0xe184
> > > > > > > > +#define HSW_SAMPLE_C_PERFORMANCE (1<<9)
> > > > > > > > #define GEN8_CENTROID_PIXEL_OPT_DIS (1<<8)
> > > > > > > > #define GEN8_SAMPLER_POWER_BYPASS_DIS (1<<1)
> > > > > > > >
> > > > > > > > diff --git a/drivers/gpu/drm/i915/intel_pm.c
> > > > > b/drivers/gpu/drm/i915/intel_pm.c
> > > > > > > > index 7a69eba..50c72a7 100644
> > > > > > > > --- a/drivers/gpu/drm/i915/intel_pm.c
> > > > > > > > +++ b/drivers/gpu/drm/i915/intel_pm.c
> > > > > > > > @@ -5736,6 +5736,10 @@ static void
> haswell_init_clock_gating(struct
> > > > > drm_device *dev)
> > > > > > > > I915_WRITE(GEN7_GT_MODE,
> > > > > > > > GEN6_WIZ_HASHING_MASK | GEN6_WIZ_HASHING_16x4);
> > > > > > > >
> > > > > > > > + /* Make sample_c messages faster. */
> > > > > > >
> > > > > > > I found a name for it in the w/a database.
> > > > > > >
> > > > > > > WaSampleCChickenBitEnable:hsw
> > > > > > >
> > > > > > > Reviewed-by: Ville Syrjälä <ville.syrjala at linux.intel.com>
> > > > > >
> > > > > > Oh actually it says palette won't work when this bit is on. I'm
> assuming
> > > > > > that's the texture palette. Do we have any use of that anywhere?
> > > > >
> > > > > That's a good point. 3DSTATE_SAMPLER_PALETTE_LOAD and the
> A8P8/indexed
> > > > > formats aren't used by Mesa or xf86-video-intel, but it looks like
> they
> > > might
> > > > > be used by libva.
> > > > >
> > > > > Can someone confirm that libva does use the sampler palette?
> > > > >
> > > > > If they do, what do we do about it?
> > > >
> > > > I suppose the best option then would be to use an LRI from a batch,
> > > > which means the register would need to be added to the cmd parser
> > > > white list. This is one of the context saved registers so doing the
> > > > LRI just once per context should be enough.
> > >
> > > I don't like that solution. For one, it's impossible - you can't LRI from
> > > userspace batches, even if you add it to the kernel command parser's
> > > whitelist, because the hardware scanner is still enabled. Given that I've
> > > been waiting two years for this capability, I want to find a more
> immediate
> > > solution.
> >
> > Ah. I've somehow convinced myself the cmd parser might actually be doing
> > something besides just eating CPU cycles these days. But I guess not.
> >
> > >
> > > Another option is to have some sort of execbuf flag...maybe a 3D/Media
> "usage"
> > > flag. If set to 3D, write 0x6000200...if media, write 0x6000000. Or
> > > something specific. I do hate adding more junk to the execbuf path,
> though.
> > >
> > > Other ideas?
> >
> > Fast vs. slow flag? :)
> >
> > More seriously, one somewhat crappy option would be to initialize that
> > bit to 1 for all explicit contexts, and then have the kernel always turn
> > it off before executing something with the default context. It's not
> > unlike how we imagined the RS stuff would work since old userspace
> > doesn't know to turn RS off when using the default context.
>
> Interesting idea - that might work. We don't need mid-batch changes either.
>
> I don't think HALF_SLICE_CHICKEN3 is part of the logical context, FWIW.
Oh but it is. Lots of chickens like to nest in the context.
>
> Before we get too much further...we should check if libva is actually broken.
> I don't know if this means the sampler palette completely doesn't work, or if
> it just means sample_c doesn't work with the palette. If it's the latter,
> we're probably fine, because I doubt libva uses sample_c.
Yeah if we wouldn't break any existing userspace I guess we could just
flip the switch in the kernel. If anyone later wants to start doing
something that no longer works they'd have to deal with disabling the
bit using an LRI.
--
Ville Syrjälä
Intel OTC
More information about the Intel-gfx
mailing list