[Mesa-dev] [RFC] glsl: Recognize pattern for roundEven().

Wed Jan 28 13:53:15 PST 2015

On 01/28/2015 11:51 AM, Matt Turner wrote:
> On Wed, Jan 28, 2015 at 11:20 AM, Ian Romanick <idr at freedesktop.org> wrote:
>> On 01/28/2015 10:31 AM, Matt Turner wrote:
>>> Note: this will round differently for x.5 where x is even.
>>>
>>> total instructions in shared programs: 5953897 -> 5948654 (-0.09%)
>>> instructions in affected programs:     88619 -> 83376 (-5.92%)
>>> helped:                                696
>>> ---
>>> If we implemented round() differently from roundEven(), we should
>>> use it instead.
>>>
>>> (mul (floor (add (abs x) 0.5) (sign x))) is 6 i965 instructions.
>>> (roundEven x) is 1 instruction.
>>>
>>> Most shaders with this pattern wrap it in int(...), which increases
>>> the counts by one, to 7 and 2 respectively.
>>>
>>> Alternatively, we could optimize this as
>>>
>>> (trunc (add f (mul 0.5 (sign f)))), which would be 6 instructions,
>>> and the int() conversion would be free. We could also apply f's sign
>>> to 0.5 in two instructions, cutting the total to 4.
>>>
>>> What do you think? Should we do precisely as they say? All but two
>>> of the affected shaders seem to be translated from DX.
>>>
>>>  src/glsl/opt_algebraic.cpp | 32 ++++++++++++++++++++++++++++++++
>>>  1 file changed, 32 insertions(+)
>>>
>>> diff --git a/src/glsl/opt_algebraic.cpp b/src/glsl/opt_algebraic.cpp
>>> index c6f4a9c..eaa5f47 100644
>>> --- a/src/glsl/opt_algebraic.cpp
>>> +++ b/src/glsl/opt_algebraic.cpp
>>> @@ -514,6 +514,38 @@ ir_algebraic_visitor::handle_expression(ir_expression *ir)
>>>        if (op_const[1] && !op_const[0])
>>>        reassociate_constant(ir, 1, op_const[1], op_expr[0]);
>>>
>>> +      /* Optimizes
>>> +       *
>>> +       *    (mul (floor (add (abs x) 0.5) (sign x)))
>>
>> If I'm not mistaken, this isn't round-to-even.  Doesn't this round 4.5
>> to 5?  roundEven(4.5) should be 4.  This looks like "half-up" rounding.
>>  Which is very different.  See
>> http://userguide.icu-project.org/formatparse/numbers/rounding-modes
> 
> Isn't this what I said?
> 
> I've suggested a way to cut a six instruction sequence to one, with
> the caveat that it doesn't do the right thing for x.5 where x is even.

*blush*  I missed the tiny commit message for the much larger addendum
message.

> What I'm asking is whether we suspect that they specifically want
> half-up behavior (speculation, so not likely insightful), or if
> there's a way we can emulate round-half-up behavior using round-even
> in fewer than four instructions.

They may or may not want half-up or round-even or something else.  If we
change it, someone will see different pixels, and they will probably
report a bug.  It seems better to play it safe.

I think 'round(x + (intToFloatBits(floatToIntBits(.5) |
(floatToIntBits(x) & 0x80000000))))' should produce the same result.
That should be 4 instructions, I think.  Were you thinking of using CSEL
for your 4 instruction version?

Since this is in common code, we have to be careful about how this will
affect drivers that don't support bit-wise operations.