[Mesa-dev] Clamp/saturate optimizations v3

Tue Aug 26 05:17:14 PDT 2014

On 23.08.2014 02:57, Ian Romanick wrote:
> Patches 2, 3, 4, 5, 6, 9, 10, 11, 12, 15, and 17 are
> 
> Reviewed-by: Ian Romanick <ian.d.romanick at intel.com>
> 
> (Additional question below.)
> 
> On 08/18/2014 05:17 AM, Abdiel Janulgue wrote:
>> v3 of clamp and saturate optimizations
>>
>> Changes since v1: 
>>  - Only remove the old try_emit_saturate operations after the new optimizations are
>>    in place. (Matt, Ian)
>>  - Output [min/max](saturate(x),b) instead of saturate([min/max](x,b)) as suggested
>>    by Ilia Mirkin.
>>  - The change above required some refactoring in the fs/vec4 backend to allow
>>    propagation of certain instructions with saturate flag to SEL. For other instructions,
>>    we don't propagate saturate instructions, similar to the previous behaviour.
>> Since v2:
>>  - Fix comments to reflect we are doing a commutative operation, add missing conditions
>>    when optimizing clamp in opt_algebraic pass.
>>  - Refactor try_emit_saturate() in i965/fs instead of completely removing it. This fixed a
>>    a regression where the changes emitted an (extra) unnecessary saturated mov when the 
>>    expression generating src can do saturate directly instead.
>>  - Fix regression in the i965/vec4 copy-propagate optimization caused by ignoring 
>>    channels in the propagated instruction.
>>  - Count generated loops from the fs/vec4 generator.
>>
>> Results from our shader-db:
>>
>> total instructions in shared programs: 4538627 -> 4560104 (0.47%)
>> instructions in affected programs:     45144 -> 66621 (47.57%)
>> total loops in shared programs:        887 -> 711 (-19.84%)
>> GAINED:                                0
>> LOST:                                  36
> 
> Can we try benchmarking the applications that have shaders that lost
> SIMD16 before pushing these changes?  I'd hate to have an "optimization"
> that actually makes performance worse. :(

There were a couple of games that lost simd16: heroes-of-newerth and
savage2. However, it is interesting that the same games also reported an
even higher amount of shaders that gained improvements (loop reductions
and instruction count reduction) with the saturate optimizations
switched on.

I did a preliminary comparison on HoN:

x Heroes_Of_Newearth_master
+ Heroes_Of_Newearth_sat_fixes

    N           Min           Max        Median           Avg    Stddev
x  11          53.1          55.8          55.1     54.872727 0.83317574
+  11          54.2            56          55.1     55.036364 0.59375538
No difference proven at 95.0% confidence

I'll post the results of savage2 as soon as I get it running.

> 
>> I modified shader-db a bit to catch loops unrolls. The shaders that show increase in
>> instruction count are all due to the loop unroll pass triggered by this optimization
>> on games that contain looped clamp/saturate operation. The unroll pass also
>> resulted in a few shaders with looped clamp/sat skipping SIMD16 generation.
>>
>> ** No piglit regressions observed **
>>
>> Abdiel Janulgue (17):
>>       i965/vec4/fs: Count loops in shader debug
>>       glsl: Add ir_unop_saturate
>>       glsl: Add constant evaluation of ir_unop_saturate
>>       glsl: Add a pass to lower ir_unop_saturate to clamp(x, 0, 1)
>>       ir_to_mesa, glsl_to_tgsi: lower ir_unop_saturate
>>       ir_to_mesa, glsl_to_tgsi: Add support for ir_unop_saturate
>>       i965/fs: Add support for ir_unop_saturate
>>       i965/vec4: Add support for ir_unop_saturate
>>       glsl: Implement saturate as ir_unop_saturate
>>       glsl: Optimize clamp(x, 0, 1) as saturate(x)
>>       glsl: Optimize clamp(x, 0.0, b), where b < 1.0 as min(saturate(x),b)
>>       glsl: Optimize clamp(x, b, 1.0), where b > 0.0 as max(saturate(x),b)
>>       i965/fs: Allow propagation of instructions with saturate flag to sel
>>       i965/vec4: Allow propagation of instructions with saturate flag to sel
>>       ir_to_mesa, glsl_to_tgsi: Remove try_emit_saturate
>>       i965/fs: Refactor try_emit_saturate
>>       i965/vec4: Remove try_emit_saturate
>>
>>  src/glsl/ir.cpp                                          |  2 +
>>  src/glsl/ir.h                                            |  1 +
>>  src/glsl/ir_builder.cpp                                  |  6 +-
>>  src/glsl/ir_constant_expression.cpp                      |  6 ++
>>  src/glsl/ir_optimization.h                               |  1 +
>>  src/glsl/ir_validate.cpp                                 |  1 +
>>  src/glsl/lower_instructions.cpp                          | 29 ++++++++
>>  src/glsl/opt_algebraic.cpp                               | 98 ++++++++++++++++++++++++++
>>  src/mesa/drivers/dri/i965/brw_fs_channel_expressions.cpp |  1 +
>>  src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp    | 18 ++++-
>>  src/mesa/drivers/dri/i965/brw_fs_generator.cpp           |  6 +-
>>  src/mesa/drivers/dri/i965/brw_fs_visitor.cpp             | 27 ++++---
>>  src/mesa/drivers/dri/i965/brw_vec4.h                     |  2 +-
>>  src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp  | 85 +++++++++++++++-------
>>  src/mesa/drivers/dri/i965/brw_vec4_generator.cpp         |  6 +-
>>  src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp           | 25 ++-----
>>  src/mesa/program/ir_to_mesa.cpp                          | 59 +++-------------
>>  src/mesa/state_tracker/st_glsl_to_tgsi.cpp               | 63 +++--------------
>>  18 files changed, 261 insertions(+), 175 deletions(-)
>>
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev at lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>
> 
>