[Intel-gfx] [PATCH] drm/i915: Make sample_c messages go faster on Haswell.
Kenneth Graunke
kenneth at whitecape.org
Thu Oct 30 20:57:04 CET 2014
On Thursday, October 30, 2014 09:26:01 PM Ville Syrjälä wrote:
> On Thu, Oct 30, 2014 at 10:32:38AM -0700, Kenneth Graunke wrote:
> > On Thursday, October 30, 2014 01:01:30 PM Ville Syrjälä wrote:
> > > On Thu, Oct 30, 2014 at 02:32:40AM -0700, Kenneth Graunke wrote:
> > > > On Thursday, October 30, 2014 11:00:51 AM Ville Syrjälä wrote:
> > > > > On Thu, Oct 30, 2014 at 10:50:03AM +0200, Ville Syrjälä wrote:
> > > > > > On Wed, Oct 29, 2014 at 03:12:43PM -0700, Kenneth Graunke wrote:
> > > > > > > Haswell significantly improved the performance of sampler_c
> > messages,
> > > > > > > but the optimization appears to be off by default. Later
platforms
> > > > > > > remove this bit, and apparently always enable the optimization.
> > > > > > >
> > > > > > > Improves performance in "Counter Strike: Global Offensive" by
18%
> > > > > > > at default settings on Iris Pro. No Piglit regressions.
> > > > > >
> > > > > > Nice. We need more bits like this ;)
> > > > > >
> > > > > > >
> > > > > > > Signed-off-by: Kenneth Graunke <kenneth at whitecape.org>
> > > > > > > ---
> > > > > > > drivers/gpu/drm/i915/i915_reg.h | 1 +
> > > > > > > drivers/gpu/drm/i915/intel_pm.c | 4 ++++
> > > > > > > 2 files changed, 5 insertions(+)
> > > > > > >
> > > > > > > diff --git a/drivers/gpu/drm/i915/i915_reg.h
> > > > b/drivers/gpu/drm/i915/i915_reg.h
> > > > > > > index 77fce96..340821a 100644
> > > > > > > --- a/drivers/gpu/drm/i915/i915_reg.h
> > > > > > > +++ b/drivers/gpu/drm/i915/i915_reg.h
> > > > > > > @@ -5952,6 +5952,7 @@ enum punit_power_well {
> > > > > > > #define HSW_ROW_CHICKEN3_L3_GLOBAL_ATOMICS_DISABLE (1 << 6)
> > > > > > >
> > > > > > > #define HALF_SLICE_CHICKEN3 0xe184
> > > > > > > +#define HSW_SAMPLE_C_PERFORMANCE (1<<9)
> > > > > > > #define GEN8_CENTROID_PIXEL_OPT_DIS (1<<8)
> > > > > > > #define GEN8_SAMPLER_POWER_BYPASS_DIS (1<<1)
> > > > > > >
> > > > > > > diff --git a/drivers/gpu/drm/i915/intel_pm.c
> > > > b/drivers/gpu/drm/i915/intel_pm.c
> > > > > > > index 7a69eba..50c72a7 100644
> > > > > > > --- a/drivers/gpu/drm/i915/intel_pm.c
> > > > > > > +++ b/drivers/gpu/drm/i915/intel_pm.c
> > > > > > > @@ -5736,6 +5736,10 @@ static void
haswell_init_clock_gating(struct
> > > > drm_device *dev)
> > > > > > > I915_WRITE(GEN7_GT_MODE,
> > > > > > > GEN6_WIZ_HASHING_MASK | GEN6_WIZ_HASHING_16x4);
> > > > > > >
> > > > > > > + /* Make sample_c messages faster. */
> > > > > >
> > > > > > I found a name for it in the w/a database.
> > > > > >
> > > > > > WaSampleCChickenBitEnable:hsw
> > > > > >
> > > > > > Reviewed-by: Ville Syrjälä <ville.syrjala at linux.intel.com>
> > > > >
> > > > > Oh actually it says palette won't work when this bit is on. I'm
assuming
> > > > > that's the texture palette. Do we have any use of that anywhere?
> > > >
> > > > That's a good point. 3DSTATE_SAMPLER_PALETTE_LOAD and the
A8P8/indexed
> > > > formats aren't used by Mesa or xf86-video-intel, but it looks like
they
> > might
> > > > be used by libva.
> > > >
> > > > Can someone confirm that libva does use the sampler palette?
> > > >
> > > > If they do, what do we do about it?
> > >
> > > I suppose the best option then would be to use an LRI from a batch,
> > > which means the register would need to be added to the cmd parser
> > > white list. This is one of the context saved registers so doing the
> > > LRI just once per context should be enough.
> >
> > I don't like that solution. For one, it's impossible - you can't LRI from
> > userspace batches, even if you add it to the kernel command parser's
> > whitelist, because the hardware scanner is still enabled. Given that I've
> > been waiting two years for this capability, I want to find a more
immediate
> > solution.
>
> Ah. I've somehow convinced myself the cmd parser might actually be doing
> something besides just eating CPU cycles these days. But I guess not.
>
> >
> > Another option is to have some sort of execbuf flag...maybe a 3D/Media
"usage"
> > flag. If set to 3D, write 0x6000200...if media, write 0x6000000. Or
> > something specific. I do hate adding more junk to the execbuf path,
though.
> >
> > Other ideas?
>
> Fast vs. slow flag? :)
>
> More seriously, one somewhat crappy option would be to initialize that
> bit to 1 for all explicit contexts, and then have the kernel always turn
> it off before executing something with the default context. It's not
> unlike how we imagined the RS stuff would work since old userspace
> doesn't know to turn RS off when using the default context.
Interesting idea - that might work. We don't need mid-batch changes either.
I don't think HALF_SLICE_CHICKEN3 is part of the logical context, FWIW.
Before we get too much further...we should check if libva is actually broken.
I don't know if this means the sampler palette completely doesn't work, or if
it just means sample_c doesn't work with the palette. If it's the latter,
we're probably fine, because I doubt libva uses sample_c.
--Ken
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.freedesktop.org/archives/intel-gfx/attachments/20141030/e7aa5cca/attachment.sig>
More information about the Intel-gfx
mailing list