[Mesa-dev] [PATCH 9/9] glsl: Convert mul/div by power-of-two factors to shift expressions.

Mon Apr 7 11:00:33 PDT 2014

On Sun, Apr 6, 2014 at 11:49 PM, Kenneth Graunke <kenneth at whitecape.org> wrote:
> Integer shifts are basically always well supported and efficient; that
> isn't always true of integer division, and sometimes even integer
> multiplication isn't without issues.
>
> On some Intel hardware, INTDIV can't be used in SIMD16 mode.  It also
> doesn't support immediate operands (on any generation), while ASR can.
>
> On Haswell, this cuts the number of instructions in dolphin/efb2ram by
> 7.94%.  It also removes a single MOV in dolphin/realxfb (due to ASR
> supporting immediates), and gains SIMD16 support (due to no INTDIV).

Just a note that the upstream shaders have already been changed to not
do integer division. That's the reason I didn't bother fixing up my
patch a couple of weeks ago.

> Signed-off-by: Kenneth Graunke <kenneth at whitecape.org>
> ---
>  src/glsl/opt_algebraic.cpp | 70 ++++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 68 insertions(+), 2 deletions(-)
>
> diff --git a/src/glsl/opt_algebraic.cpp b/src/glsl/opt_algebraic.cpp
> index 9d55392..eccc2eb 100644
> --- a/src/glsl/opt_algebraic.cpp
> +++ b/src/glsl/opt_algebraic.cpp
> @@ -34,6 +34,7 @@
>  #include "ir_optimization.h"
>  #include "ir_builder.h"
>  #include "glsl_types.h"
> +#include "main/macros.h"
>
>  using namespace ir_builder;
>
> @@ -68,6 +69,8 @@ public:
>                              int op2);
>     ir_rvalue *swizzle_if_required(ir_expression *expr,
>                                   ir_rvalue *operand);
> +   ir_rvalue *convert_int_math_to_shifts(ir_expression *ir,
> +                                         ir_constant *op_const_array[4]);
>
>     void *mem_ctx;
>
> @@ -185,6 +188,59 @@ ir_algebraic_visitor::reassociate_constant(ir_expression *ir1, int const_index,
>     return false;
>  }
>
> +/**
> + * Transform integer multiplication/division by a constant power-of-two
> + * factor into shift instructions.
> + */
> +ir_rvalue *
> +ir_algebraic_visitor::convert_int_math_to_shifts(ir_expression *ir,
> +                                                 ir_constant *op_const_array[4])
> +{
> +   /* This optimization only makes sense for GPUs with native integers. */
> +   if (!native_integers)
> +      return NULL;
> +
> +   assert(ir->operation == ir_binop_mul || ir->operation == ir_binop_div);
> +
> +   /* Shifts only work for integer types. */
> +   if (!ir->type->is_integer())
> +      return NULL;
> +
> +   ir_constant *const_op;
> +   ir_rvalue *other_op;
> +   if (op_const_array[0]) {
> +      const_op = op_const_array[0];
> +      other_op = ir->operands[1];
> +   } else if (op_const_array[1]) {
> +      const_op = op_const_array[1];
> +      other_op = ir->operands[0];
> +   } else {
> +      /* If neither is a constant, we can't check for powers of two. */
> +      return NULL;
> +   }
> +
> +   ir_constant_data shift_data;
> +   for (int i = 0; i < const_op->type->vector_elements; i++) {
> +      if (const_op->type->base_type == GLSL_TYPE_INT &&
> +          const_op->value.i[i] <= 0) {
> +         /* Negative values aren't powers of two. */

I was going to comment on the is_power_of_two() function returning
true for 0 and how this was going to break things, but I see this
check for <= 0 should handle it.

I don't know what is_power_of_two() should return for zero. The
comment "Negative values" would be more correct if we returned false
from is_power_of_two() though.