[Mesa-dev] Clamp/saturate optimizations v3

Ian Romanick idr at freedesktop.org
Fri Aug 22 16:57:14 PDT 2014


Patches 2, 3, 4, 5, 6, 9, 10, 11, 12, 15, and 17 are

Reviewed-by: Ian Romanick <ian.d.romanick at intel.com>

(Additional question below.)

On 08/18/2014 05:17 AM, Abdiel Janulgue wrote:
> v3 of clamp and saturate optimizations
> 
> Changes since v1: 
>  - Only remove the old try_emit_saturate operations after the new optimizations are
>    in place. (Matt, Ian)
>  - Output [min/max](saturate(x),b) instead of saturate([min/max](x,b)) as suggested
>    by Ilia Mirkin.
>  - The change above required some refactoring in the fs/vec4 backend to allow
>    propagation of certain instructions with saturate flag to SEL. For other instructions,
>    we don't propagate saturate instructions, similar to the previous behaviour.
> Since v2:
>  - Fix comments to reflect we are doing a commutative operation, add missing conditions
>    when optimizing clamp in opt_algebraic pass.
>  - Refactor try_emit_saturate() in i965/fs instead of completely removing it. This fixed a
>    a regression where the changes emitted an (extra) unnecessary saturated mov when the 
>    expression generating src can do saturate directly instead.
>  - Fix regression in the i965/vec4 copy-propagate optimization caused by ignoring 
>    channels in the propagated instruction.
>  - Count generated loops from the fs/vec4 generator.
> 
> Results from our shader-db:
> 
> total instructions in shared programs: 4538627 -> 4560104 (0.47%)
> instructions in affected programs:     45144 -> 66621 (47.57%)
> total loops in shared programs:        887 -> 711 (-19.84%)
> GAINED:                                0
> LOST:                                  36

Can we try benchmarking the applications that have shaders that lost
SIMD16 before pushing these changes?  I'd hate to have an "optimization"
that actually makes performance worse. :(

> I modified shader-db a bit to catch loops unrolls. The shaders that show increase in
> instruction count are all due to the loop unroll pass triggered by this optimization
> on games that contain looped clamp/saturate operation. The unroll pass also
> resulted in a few shaders with looped clamp/sat skipping SIMD16 generation.
> 
> ** No piglit regressions observed **
> 
> Abdiel Janulgue (17):
>       i965/vec4/fs: Count loops in shader debug
>       glsl: Add ir_unop_saturate
>       glsl: Add constant evaluation of ir_unop_saturate
>       glsl: Add a pass to lower ir_unop_saturate to clamp(x, 0, 1)
>       ir_to_mesa, glsl_to_tgsi: lower ir_unop_saturate
>       ir_to_mesa, glsl_to_tgsi: Add support for ir_unop_saturate
>       i965/fs: Add support for ir_unop_saturate
>       i965/vec4: Add support for ir_unop_saturate
>       glsl: Implement saturate as ir_unop_saturate
>       glsl: Optimize clamp(x, 0, 1) as saturate(x)
>       glsl: Optimize clamp(x, 0.0, b), where b < 1.0 as min(saturate(x),b)
>       glsl: Optimize clamp(x, b, 1.0), where b > 0.0 as max(saturate(x),b)
>       i965/fs: Allow propagation of instructions with saturate flag to sel
>       i965/vec4: Allow propagation of instructions with saturate flag to sel
>       ir_to_mesa, glsl_to_tgsi: Remove try_emit_saturate
>       i965/fs: Refactor try_emit_saturate
>       i965/vec4: Remove try_emit_saturate
> 
>  src/glsl/ir.cpp                                          |  2 +
>  src/glsl/ir.h                                            |  1 +
>  src/glsl/ir_builder.cpp                                  |  6 +-
>  src/glsl/ir_constant_expression.cpp                      |  6 ++
>  src/glsl/ir_optimization.h                               |  1 +
>  src/glsl/ir_validate.cpp                                 |  1 +
>  src/glsl/lower_instructions.cpp                          | 29 ++++++++
>  src/glsl/opt_algebraic.cpp                               | 98 ++++++++++++++++++++++++++
>  src/mesa/drivers/dri/i965/brw_fs_channel_expressions.cpp |  1 +
>  src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp    | 18 ++++-
>  src/mesa/drivers/dri/i965/brw_fs_generator.cpp           |  6 +-
>  src/mesa/drivers/dri/i965/brw_fs_visitor.cpp             | 27 ++++---
>  src/mesa/drivers/dri/i965/brw_vec4.h                     |  2 +-
>  src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp  | 85 +++++++++++++++-------
>  src/mesa/drivers/dri/i965/brw_vec4_generator.cpp         |  6 +-
>  src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp           | 25 ++-----
>  src/mesa/program/ir_to_mesa.cpp                          | 59 +++-------------
>  src/mesa/state_tracker/st_glsl_to_tgsi.cpp               | 63 +++--------------
>  18 files changed, 261 insertions(+), 175 deletions(-)
> 
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 



More information about the mesa-dev mailing list