[Mesa-dev] [PATCH 9/9] glsl: Convert mul/div by power-of-two factors to shift expressions.

Mon Apr 7 10:38:16 PDT 2014

Am 07.04.2014 15:52, schrieb Ian Romanick:
> On 04/06/2014 11:49 PM, Kenneth Graunke wrote:
>> Integer shifts are basically always well supported and efficient; that
>> isn't always true of integer division, and sometimes even integer
>> multiplication isn't without issues.
>>
>> On some Intel hardware, INTDIV can't be used in SIMD16 mode.  It also
>> doesn't support immediate operands (on any generation), while ASR can.
>>
>> On Haswell, this cuts the number of instructions in dolphin/efb2ram by
>> 7.94%.  It also removes a single MOV in dolphin/realxfb (due to ASR
>> supporting immediates), and gains SIMD16 support (due to no INTDIV).
>>
>> Signed-off-by: Kenneth Graunke <kenneth at whitecape.org>
>> ---
>>  src/glsl/opt_algebraic.cpp | 70 ++++++++++++++++++++++++++++++++++++++++++++--
>>  1 file changed, 68 insertions(+), 2 deletions(-)
>>
>> diff --git a/src/glsl/opt_algebraic.cpp b/src/glsl/opt_algebraic.cpp
>> index 9d55392..eccc2eb 100644
>> --- a/src/glsl/opt_algebraic.cpp
>> +++ b/src/glsl/opt_algebraic.cpp
>> @@ -34,6 +34,7 @@
>>  #include "ir_optimization.h"
>>  #include "ir_builder.h"
>>  #include "glsl_types.h"
>> +#include "main/macros.h"
>>  
>>  using namespace ir_builder;
>>  
>> @@ -68,6 +69,8 @@ public:
>>  			     int op2);
>>     ir_rvalue *swizzle_if_required(ir_expression *expr,
>>  				  ir_rvalue *operand);
>> +   ir_rvalue *convert_int_math_to_shifts(ir_expression *ir,
>> +                                         ir_constant *op_const_array[4]);
>>  
>>     void *mem_ctx;
>>  
>> @@ -185,6 +188,59 @@ ir_algebraic_visitor::reassociate_constant(ir_expression *ir1, int const_index,
>>     return false;
>>  }
>>  
>> +/**
>> + * Transform integer multiplication/division by a constant power-of-two
>> + * factor into shift instructions.
>> + */
>> +ir_rvalue *
>> +ir_algebraic_visitor::convert_int_math_to_shifts(ir_expression *ir,
>> +                                                 ir_constant *op_const_array[4])
>> +{
>> +   /* This optimization only makes sense for GPUs with native integers. */
>> +   if (!native_integers)
>> +      return NULL;
>> +
>> +   assert(ir->operation == ir_binop_mul || ir->operation == ir_binop_div);
>> +
>> +   /* Shifts only work for integer types. */
>> +   if (!ir->type->is_integer())
>> +      return NULL;
> 
> After the previous conversation about converting division to shifts, I
> think at least division only generally works for unsigned. -1 / 2 => 0,
> but -1 >> 1 => -1.  I don't know what the rules are for multiplication
> overflow... I think the result of int32_t(0x70000000)*2 is undefined, so
> that should be fine.
Hmm c99 seems to say _unsigned_ multiplication silently wraps (as it
essentially does result modulo largest value + 1), but signed one indeed
could have undefined behavior. I think though usually everything out
there will do the same modulo arithmetic in any case - and shift should
provide the same result.
glsl actually does have this as defined behavior: "Unsigned integers
have exactly 32 bits of precision. Signed integers use 32 bits,
including a sign bit, in two's complement form. Operations resulting in
overflow or underflow will not cause any exception, nor will they
saturate, rather they will “wrap” to yield the low-order 32 bits of the
result."

I'm still unsure about division actually, I could not find anything
indicating it has to conform to c99 rules in glsl spec. If it is defined
it is well hidden, otherwise it would be nice if the spec would specify
this one way or another... It does say though modulo operation is
undefined if any of the two operands is negative, which is a hint that
rounding towards negative infinity for division might be ok too...
(I was looking at slightly older glsl 4.10 since the server seems to be
down.)
Though if division actually does rounding toward zero, and you'll happen
to hit the optimization and it then does rounding toward negative
infinity, that would be a bit awkward indeed.

Roland

>> +
>> +   ir_constant *const_op;
>> +   ir_rvalue *other_op;
>> +   if (op_const_array[0]) {
>> +      const_op = op_const_array[0];
>> +      other_op = ir->operands[1];
>> +   } else if (op_const_array[1]) {
>> +      const_op = op_const_array[1];
>> +      other_op = ir->operands[0];
>> +   } else {
>> +      /* If neither is a constant, we can't check for powers of two. */
>> +      return NULL;
>> +   }
>> +
>> +   ir_constant_data shift_data;
>> +   for (int i = 0; i < const_op->type->vector_elements; i++) {
>> +      if (const_op->type->base_type == GLSL_TYPE_INT &&
>> +          const_op->value.i[i] <= 0) {
>> +         /* Negative values aren't powers of two. */
>> +         return NULL;
>> +      }
> 
> But... we could convert x*-4 into -x*4... assuming negation is free on
> integer operands.  Seems like that would be good follow-on work...
> 
>> +
>> +      if (!is_power_of_two(const_op->value.u[i]))
>> +         return NULL;
>> +
>> +      shift_data.u[i] = ffs(const_op->value.u[i]) - 1;
>> +   }
>> +
>> +   ir_constant *shifts = new(mem_ctx) ir_constant(ir->type, &shift_data);
>> +
>> +   if (ir->operation == ir_binop_mul)
>> +      return lshift(other_op, shifts);
>> +   else
>> +      return rshift(other_op, shifts);
>> +}
>> +
>>  /* When eliminating an expression and just returning one of its operands,
>>   * we may need to swizzle that operand out to a vector if the expression was
>>   * vector type.
>> @@ -389,7 +445,7 @@ ir_algebraic_visitor::handle_expression(ir_expression *ir)
>>  	 return ir->operands[0];
>>        break;
>>  
>> -   case ir_binop_mul:
>> +   case ir_binop_mul: {
>>        if (is_vec_one(op_const[0]))
>>  	 return ir->operands[1];
>>        if (is_vec_one(op_const[1]))
>> @@ -403,6 +459,9 @@ ir_algebraic_visitor::handle_expression(ir_expression *ir)
>>        if (is_vec_negative_one(op_const[1]))
>>           return neg(ir->operands[0]);
>>  
>> +      ir_rvalue *shift_expr = convert_int_math_to_shifts(ir, op_const);
>> +      if (shift_expr)
>> +         return shift_expr;
>>  
>>        /* Reassociate multiplication of constants so that we can do
>>         * constant folding.
>> @@ -413,8 +472,9 @@ ir_algebraic_visitor::handle_expression(ir_expression *ir)
>>  	 reassociate_constant(ir, 1, op_const[1], op_expr[0]);
>>  
>>        break;
>> +   }
>>  
>> -   case ir_binop_div:
>> +   case ir_binop_div: {
>>        if (is_vec_one(op_const[0]) && ir->type->base_type == GLSL_TYPE_FLOAT) {
>>  	 return new(mem_ctx) ir_expression(ir_unop_rcp,
>>  					   ir->operands[1]->type,
>> @@ -423,7 +483,13 @@ ir_algebraic_visitor::handle_expression(ir_expression *ir)
>>        }
>>        if (is_vec_one(op_const[1]))
>>  	 return ir->operands[0];
>> +
>> +      ir_rvalue *shift_expr = convert_int_math_to_shifts(ir, op_const);
>> +      if (shift_expr)
>> +         return shift_expr;
>> +
>>        break;
>> +   }
>>  
>>     case ir_binop_dot:
>>        if (is_vec_zero(op_const[0]) || is_vec_zero(op_const[1]))
> 
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://urldefense.proofpoint.com/v1/url?u=http://lists.freedesktop.org/mailman/listinfo/mesa-dev&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=F4msKE2WxRzA%2BwN%2B25muztFm5TSPwE8HKJfWfR2NgfY%3D%0A&m=hpIPy%2BEG17aLhkB3c6oUCLBYCvhagiUdGifRkvM3Cu4%3D%0A&s=2d182d2f43a70d4111487ae377ca480dce35149e3d93af89cace4289460326b2
>