[Mesa-dev] [PATCH 1/6] glsl: Optimize pow(x, 2) into x * x.

Mon Mar 10 19:21:32 PDT 2014

Am 11.03.2014 01:23, schrieb Ian Romanick:
> I had a pretty similar patch on the top of my pow-optimization branch.
> I also expand x**3 and x**4.  I had hoped that would enable some cases
> to expand then merge to MADs.  It should also be faster on older GENs
> where POW perf sucks.  I didn't send it out because I wanted to add a
> similar optimization in the back end that would turn x*x*x*x back into
> x**4 on GPUs where the POW would be faster.
I have no idea what performance POW has on newer intel gpu hw (since in
contrast to older pre-snb hw with separate mathbox the manual doesn't
list throughput for extended math functions, at least I never found it),
but I find it highly unlikely that a POW has a cost lower than 2 muls
anywhere.

Roland


> I also didn't have anything in shader-db that benefitted from x**2 or
> x**3.  It seems like there were a couple that would be modified by a
> x**5 flattening, but I think that would universally be slower....
> 
> On 03/10/2014 03:54 PM, Matt Turner wrote:
>> Cuts two instructions out of SynMark's Gl32VSInstancing benchmark.
>> ---
>>  src/glsl/opt_algebraic.cpp | 8 ++++++++
>>  1 file changed, 8 insertions(+)
>>
>> diff --git a/src/glsl/opt_algebraic.cpp b/src/glsl/opt_algebraic.cpp
>> index 5c49a78..8494bd9 100644
>> --- a/src/glsl/opt_algebraic.cpp
>> +++ b/src/glsl/opt_algebraic.cpp
>> @@ -528,6 +528,14 @@ ir_algebraic_visitor::handle_expression(ir_expression *ir)
>>        if (is_vec_two(op_const[0]))
>>           return expr(ir_unop_exp2, ir->operands[1]);
>>  
>> +      if (is_vec_two(op_const[1])) {
>> +         ir_variable *x = new(ir) ir_variable(ir->operands[1]->type, "x",
>> +                                              ir_var_temporary);
>> +         base_ir->insert_before(x);
>> +         base_ir->insert_before(assign(x, ir->operands[0]));
>> +         return mul(x, x);
>> +      }
>> +
>>        break;
>>  
>>     case ir_unop_rcp:
>>
>