[Mesa-dev] Clamp/saturate optimizations v3
Abdiel Janulgue
abdiel.janulgue at linux.intel.com
Mon Aug 18 05:17:41 PDT 2014
v3 of clamp and saturate optimizations
Changes since v1:
- Only remove the old try_emit_saturate operations after the new optimizations are
in place. (Matt, Ian)
- Output [min/max](saturate(x),b) instead of saturate([min/max](x,b)) as suggested
by Ilia Mirkin.
- The change above required some refactoring in the fs/vec4 backend to allow
propagation of certain instructions with saturate flag to SEL. For other instructions,
we don't propagate saturate instructions, similar to the previous behaviour.
Since v2:
- Fix comments to reflect we are doing a commutative operation, add missing conditions
when optimizing clamp in opt_algebraic pass.
- Refactor try_emit_saturate() in i965/fs instead of completely removing it. This fixed a
a regression where the changes emitted an (extra) unnecessary saturated mov when the
expression generating src can do saturate directly instead.
- Fix regression in the i965/vec4 copy-propagate optimization caused by ignoring
channels in the propagated instruction.
- Count generated loops from the fs/vec4 generator.
Results from our shader-db:
total instructions in shared programs: 4538627 -> 4560104 (0.47%)
instructions in affected programs: 45144 -> 66621 (47.57%)
total loops in shared programs: 887 -> 711 (-19.84%)
GAINED: 0
LOST: 36
I modified shader-db a bit to catch loops unrolls. The shaders that show increase in
instruction count are all due to the loop unroll pass triggered by this optimization
on games that contain looped clamp/saturate operation. The unroll pass also
resulted in a few shaders with looped clamp/sat skipping SIMD16 generation.
** No piglit regressions observed **
Abdiel Janulgue (17):
i965/vec4/fs: Count loops in shader debug
glsl: Add ir_unop_saturate
glsl: Add constant evaluation of ir_unop_saturate
glsl: Add a pass to lower ir_unop_saturate to clamp(x, 0, 1)
ir_to_mesa, glsl_to_tgsi: lower ir_unop_saturate
ir_to_mesa, glsl_to_tgsi: Add support for ir_unop_saturate
i965/fs: Add support for ir_unop_saturate
i965/vec4: Add support for ir_unop_saturate
glsl: Implement saturate as ir_unop_saturate
glsl: Optimize clamp(x, 0, 1) as saturate(x)
glsl: Optimize clamp(x, 0.0, b), where b < 1.0 as min(saturate(x),b)
glsl: Optimize clamp(x, b, 1.0), where b > 0.0 as max(saturate(x),b)
i965/fs: Allow propagation of instructions with saturate flag to sel
i965/vec4: Allow propagation of instructions with saturate flag to sel
ir_to_mesa, glsl_to_tgsi: Remove try_emit_saturate
i965/fs: Refactor try_emit_saturate
i965/vec4: Remove try_emit_saturate
src/glsl/ir.cpp | 2 +
src/glsl/ir.h | 1 +
src/glsl/ir_builder.cpp | 6 +-
src/glsl/ir_constant_expression.cpp | 6 ++
src/glsl/ir_optimization.h | 1 +
src/glsl/ir_validate.cpp | 1 +
src/glsl/lower_instructions.cpp | 29 ++++++++
src/glsl/opt_algebraic.cpp | 98 ++++++++++++++++++++++++++
src/mesa/drivers/dri/i965/brw_fs_channel_expressions.cpp | 1 +
src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp | 18 ++++-
src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 6 +-
src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 27 ++++---
src/mesa/drivers/dri/i965/brw_vec4.h | 2 +-
src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp | 85 +++++++++++++++-------
src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 6 +-
src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 25 ++-----
src/mesa/program/ir_to_mesa.cpp | 59 +++-------------
src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 63 +++--------------
18 files changed, 261 insertions(+), 175 deletions(-)
More information about the mesa-dev
mailing list