[Mesa-dev] i965 SamplerUnits rework

Fri Aug 24 03:05:53 PDT 2012

Greetings!

This series reworks how i965 deals with sampler indirections, changing it
to use linker-assigned sampler variable IDs in SEND instructions rather 
than baking in the ID of the texture unit they happen to be bound to.

Instead, it now encodes that mapping in the binding, sampler state,
and sampler default color tables, which are updated and re-emitted at least
once per batch anyway.

This has several advantages:
- We no longer need to recompile fragment shaders every time an application
  calls glUniform1i to set their sampler uniforms' values.

- The game "Cogs" (from Humble Bundle 3) drops from 99% CPU usage (as it
  continually recompiles fragment shaders due to ping-ponging between
  texture units 0 and 1) to a mere 30%.  (It's still slow, but that's an
  unrelated issue.)

  This also fixes that issue for Gallium drivers, which may make it
  playable on Radeon and Nouveau.

- Our sampler state and sampler default color tables are now compact,
  only containing as many entries as necessary, rather than covering all
  texture units (sparsely, whether used or not).

- Without this change, fragment shader precompiles are basically useless:
  we compiled assembly at glLinkShader() time, before the application had
  a chance to call glUniform1i() to bind sampler variables to texture units.
  When it does, we would get a ProgramStringNotify, bumping the ID in the
  program key and making our nice precompiled shader irrelevant for eternity.

- I think it'll be helpful for Haswell.

The only downside I see is that the tables now depend on the active
program, which means we may need to re-emit them more often.  The cost
of emitting state is much, much lower than recompiling, and we already
re-emit these at least once per batch anyway, so it shouldn't be too bad.

There are a few quirky aspects to think about:
- This doesn't eliminate all the texture-related NOS (non-orthagonal state):
  DEPTH_TEXTURE_MODE and EXT_texture_swizzle are properties of the currently
  bound textures, and require extra instructions to do the swizzling.

  We still listen to _NEW_TEXTURE and recompute the program keys.  However,
  unless the app has actually /changed/ the swizzling, it should be a cache
  hit, and we won't have to recompile.

- No changes are necessary in the old brw_vs_emit/brw_wm_emit backends:
  ARB_fragment_program uses texture unit numbers directly, so SamplerUnits
  is actually the identity mapping.  Fixed function fragment shading uses
  the new brw_fs backend.  Pre-GLSL, vertex processing couldn't use textures.

Those who talked to me earlier might be surprised to see that this doesn't
have separate VS/FS sampler state tables.  I realized that since the
indices are actually part of the linked program, we basically already have
a single combined table anyway; splitting them actually just wasted space
(and required a lot of unnecessary code churn).

No Piglit regressions on Sandybridge.  Untested on Ivybridge.
And no, I haven't run oglconform.

Please review.  It would be great to get this into 9.0.

--Ken