[Mesa-dev] [PATCH 1/6] glsl: Optimize pow(x, 2) into x * x.

Tue Mar 11 09:29:03 PDT 2014

On 03/10/2014 07:21 PM, Roland Scheidegger wrote:
> Am 11.03.2014 01:23, schrieb Ian Romanick:
>> I had a pretty similar patch on the top of my pow-optimization branch.
>> I also expand x**3 and x**4.  I had hoped that would enable some cases
>> to expand then merge to MADs.  It should also be faster on older GENs
>> where POW perf sucks.  I didn't send it out because I wanted to add a
>> similar optimization in the back end that would turn x*x*x*x back into
>> x**4 on GPUs where the POW would be faster.
> I have no idea what performance POW has on newer intel gpu hw (since in
> contrast to older pre-snb hw with separate mathbox the manual doesn't
> list throughput for extended math functions, at least I never found it),
> but I find it highly unlikely that a POW has a cost lower than 2 muls
> anywhere.

The architecture has changed quite a bit, so "math box" is kind of a
thing of that past... and there was much rejoicing.  The timings that we
use in the compiler backend are 22 cycles for POW, and 14 cycles for MUL
on Haswell.  The numbers are similar (but slightly longer) on
Sandybridge and Ivybridge.

> Roland
> 
> 
>> I also didn't have anything in shader-db that benefitted from x**2 or
>> x**3.  It seems like there were a couple that would be modified by a
>> x**5 flattening, but I think that would universally be slower....
>>
>> On 03/10/2014 03:54 PM, Matt Turner wrote:
>>> Cuts two instructions out of SynMark's Gl32VSInstancing benchmark.
>>> ---
>>>  src/glsl/opt_algebraic.cpp | 8 ++++++++
>>>  1 file changed, 8 insertions(+)
>>>
>>> diff --git a/src/glsl/opt_algebraic.cpp b/src/glsl/opt_algebraic.cpp
>>> index 5c49a78..8494bd9 100644
>>> --- a/src/glsl/opt_algebraic.cpp
>>> +++ b/src/glsl/opt_algebraic.cpp
>>> @@ -528,6 +528,14 @@ ir_algebraic_visitor::handle_expression(ir_expression *ir)
>>>        if (is_vec_two(op_const[0]))
>>>           return expr(ir_unop_exp2, ir->operands[1]);
>>>  
>>> +      if (is_vec_two(op_const[1])) {
>>> +         ir_variable *x = new(ir) ir_variable(ir->operands[1]->type, "x",
>>> +                                              ir_var_temporary);
>>> +         base_ir->insert_before(x);
>>> +         base_ir->insert_before(assign(x, ir->operands[0]));
>>> +         return mul(x, x);
>>> +      }
>>> +
>>>        break;
>>>  
>>>     case ir_unop_rcp: