[Mesa-dev] [PATCH (gles3) 15/20] glsl: Add lowering pass for GLSL ES 3.00 pack/unpack operations (v2)

Tue Jan 22 21:19:12 PST 2013

On 21 January 2013 00:49, Chad Versace <chad.versace at linux.intel.com> wrote:

> Lower them to arithmetic and bit manipulation expressions.
>
> v2:
>   - Rewrite using ir_builder. [for idr]
>   - In lowering packHalf2x16, don't truncate subnormal float16 values to
> zero.
>     And round to even rather than to zero.  [for stereotype441]
>
> CC: Ian Romanick <idr at freedesktop.org>
> CC: Paul Berry <stereotype441 at gmail.com>
> Signed-off-by: Chad Versace <chad.versace at linux.intel.com>
> ---
>  src/glsl/Makefile.sources           |    1 +
>  src/glsl/ir_optimization.h          |   20 +
>  src/glsl/lower_packing_builtins.cpp | 1043
> +++++++++++++++++++++++++++++++++++
>  3 files changed, 1064 insertions(+)
>  create mode 100644 src/glsl/lower_packing_builtins.cpp
>
>
(snip)

> +   void
> +   setup_factory(void *mem_ctx)
> +   {
> +      assert(factory.mem_ctx == NULL);
> +      factory.mem_ctx = mem_ctx;
> +
> +      /* Avoid making a new list for each call to handle_rvalue(). Make a
> +       * single list and reuse it.
> +       */
> +      if (factory.instructions == NULL) {
> +         factory.instructions = new(NULL) exec_list();
> +      } else {
> +         assert(factory.instructions->is_empty());
> +      }
> +   }
>

Do we need factory.instructions to be heap-allocated?  How about just
making a private exec_list inside lower_packing_builtins_visitor and
setting factory.instructions to point to it in the
lower_packing_builtins_visitor constructor?

(snip)

> +   /**
> +    * \brief Lower the component-wise calculation of packHalf2x16.
> +    *
> +    * \param f_rval is one component of packHafl2x16's input
> +    * \param e_rval is the unshifted exponent bits of f_rval
> +    * \param m_rval is the unshifted mantissa bits of f_rval
> +    *
> +    * \return a uint rvalue that encodes a float16 in its lower 16 bits
> +    */
> +   ir_rvalue*
> +   pack_half_1x16_nosign(ir_rvalue *f_rval,
> +                         ir_rvalue *e_rval,
> +                         ir_rvalue *m_rval)
> +   {
> +      assert(e_rval->type == glsl_type::uint_type);
> +      assert(m_rval->type == glsl_type::uint_type);
> +
> +      /* uint u16; */
> +      ir_variable *u16 = factory.make_temp(glsl_type::uint_type,
> +                                           "tmp_pack_half_1x16_u16");
> +
> +      /* float f = FLOAT_RVAL; */
> +      ir_variable *f = factory.make_temp(glsl_type::float_type,
> +                                          "tmp_pack_half_1x16_f");
> +      factory.emit(assign(f, f_rval));
> +
> +      /* uint e = E_RVAL; */
> +      ir_variable *e = factory.make_temp(glsl_type::uint_type,
> +                                          "tmp_pack_half_1x16_e");
> +      factory.emit(assign(e, e_rval));
> +
> +      /* uint m = M_RVAL; */
> +      ir_variable *m = factory.make_temp(glsl_type::uint_type,
> +                                          "tmp_pack_half_1x16_m");
> +      factory.emit(assign(m, m_rval));
> +
> +      /* Preliminaries
> +       * -------------
> +       *
> +       * For a float16, the bit layout is:
> +       *
> +       *   sign:     15
> +       *   exponent: 10:14
> +       *   mantissa: 0:9
> +       *
> +       * Let f16 be a float16 value. The sign, exponent, and mantissa
> +       * determine its value thus:
> +       *
> +       *   if e16 = 0 and m16 = 0, then zero:       (-1)^s16 * 0
>                       (1)
> +       *   if e16 = 0 and m16!= 0, then subnormal:  (-1)^s16 * 2^(e16 -
> 14) * (m16 / 2^10)     (2)
> +       *   if 0 < e16 < 31, then normal:            (-1)^s16 * 2^(e16 -
> 15) * (1 + m16 / 2^10) (3)
> +       *   if e16 = 31 and m16 = 0, then infinite:  (-1)^s16 * inf
>                       (4)
> +       *   if e16 = 31 and m16 != 0, then           NaN
>                      (5)
> +       *
> +       * where 0 <= m16 < 2^10.
> +       *
> +       * For a float32, the bit layout is:
> +       *
> +       *   sign: 31
> +       *   exponent: 23:30
> +       *   mantissa: 0:22
> +       *
> +       * Let f32 be a float32 value. The sign, exponent, and mantissa
> +       * determine its value thus:
> +       *
> +       *   if e32 = 0 and m32 = 0, then zero:        (-1)^s * 0
>                      (10)
> +       *   if e32 = 0 and m32 != 0, then subnormal:  (-1)^s * 2^(e32 -
> 126) * (m32 / 2^23)     (11)
> +       *   if 0 < e32 < 255, then normal:            (-1)^s * 2^(e32 -
> 127) * (1 + m32 / 2^23) (12)
> +       *   if e32 = 255 and m32 = 0, then infinite:  (-1)^s * inf
>                      (13)
> +       *   if e32 = 255 and m32 != 0, then           NaN
>                       (14)
> +       *
> +       * where 0 <= m32 < 2^23.
> +       *
> +       * The minimum and maximum normal float16 values are
> +       *
> +       *   min_norm16 = 2^(1 - 15) * (1 + 0 / 2^10) = 2^(-14)   (20)
> +       *   max_norm16 = 2^(30 - 15) * (1 + 1023 / 2^10)         (21)
> +       *
> +       * The step at max_norm16 is
> +       *
> +       *   max_step16 = 2^5                                     (22)
> +       *
> +       * Observe that the float16 boundary values in equations 20-21 lie
> in the
> +       * range of normal float32 values.
> +       *
> +       *
> +       * Rounding Behavior
> +       * -----------------
> +       * Not all float32 values can be exactly represented as a float16.
> We
> +       * round all such intermediate float32 values to the nearest
> float16; if
> +       * the float32 is exactly between to float16 values, we round to
> the one
> +       * with an even mantissa. This rounding behavior has several
> benefits:
> +       *
> +       *   - It has no sign bias.
> +       *
> +       *   - It reproduces the behavior of real hardware: opcode F32TO16
> in Intel's
> +       *     GPU ISA.
> +       *
> +       *   - By reproducing the behavior of the GPU (at least on Intel
> hardware),
> +       *     compile-time evaluation of constant packHalf2x16 GLSL
> expressions will
> +       *     result in the same value as if the expression were executed
> on the
> +       *     GPU.
> +       *
> +       * Calculation
> +       * -----------
> +       * Our task is to compute s16, e16, m16 given f32.  Since this
> function
> +       * ignores the sign bit, assume that s32 = s16 = 0.  There are
> several
> +       * cases consider.
> +       */
> +
> +      factory.emit(
> +
> +         /* Case 1) f32 is NaN
> +          *
> +          *   The resultant f16 will also be NaN.
> +          */
> +
> +         /* if (e32 == 255 && m32 != 0) { */
> +         if_tree(logic_and(equal(e, constant(0xffu << 23u)),
> +                           logic_not(equal(m, constant(0u)))),
> +
> +            assign(u16, constant(0x7fffu)),
> +
> +         /* Case 2) f32 lies in the range [0, min_norm16).
> +          *
> +          *   The resultant float16 will be either zero, subnormal, or
> normal.
> +          *
> +          *   Solving
> +          *
> +          *     f32 = min_norm16       (30)
> +          *
> +          *   gives
> +          *
> +          *     e32 = 113 and m32 = 0  (31)
> +          *
> +          *   Therefore this case occurs if and only if
> +          *
> +          *     e32 < 113              (32)
> +          */
> +
> +         /* } else if (e32 < 113) { */
> +         if_tree(less(e, constant(113u << 23u)),
> +
> +            /* u16 = uint(round_to_even(abs(f32) * float(1u << 24u))); */
> +            assign(u16, f2u(round_even(mul(expr(ir_unop_abs, f),
> +                                           constant((float) (1 <<
> 24)))))),
> +
> +         /* Case 3) f32 lies in the range
> +          *         [min_norm16, max_norm16 + max_step16).
> +          *
> +          *   The resultant float16 will be either normal or infinite.
> +          *
> +          *   Solving
> +          *
> +          *     f32 = max_norm16 + max_step16           (40)
> +          *         = 2^15 * (1 + 1023 / 2^10) + 2^5    (41)
> +          *         = 2^16                              (42)
> +          *   gives
> +          *
> +          *     e32 = 142 and m32 = 0                   (43)
>

I calculate this to be 143, not 142.

> +          *
> +          *   We already solved the boundary condition f32 = min_norm16
> above
> +          *   in equation 31. Therefore this case occurs if and only if
> +          *
> +          *     113 <= e32 and e32 < 142
>

So this should be e32 < 143.

> +          */
> +
> +         /* } else if (e32 < 142) { */
> +         if_tree(lequal(e, constant(142u << 23u)),
>

Fortunately, since you use "lequal" here, you get the correct effect.

> +
> +            /* The addition below handles the case where the mantissa
> rounds
> +             * up to 1024 and bumps the exponent.
> +             *
> +             * u16 = ((e - (112u << 23u)) >> 13u)
> +             *     + round_to_even((float(m) / (1u << 13u));
> +             */
> +            assign(u16, add(rshift(sub(e, constant(112u << 23u)),
> +                                   constant(13u)),
> +                            f2u(round_even(
> +                                  div(u2f(m), constant((float) (1 <<
> 13))))))),
> +
> +         /* Case 4) f32 lies in the range [max_norm16 + max_step16, inf].
> +          *
> +          *   The resultant float16 will be infinite.
> +          *
> +          *   The cases above caught all float32 values in the range
> +          *   [0, max_norm16 + max_step16), so this is the fall-through
> case.
> +          */
> +
> +         /* } else { */
> +
> +            assign(u16, constant(31u << 10u))))));
> +
> +         /* } */
> +
> +       return deref(u16).val;
> +   }
>

(snip)

> +   /**
> +    * \brief Lower the component-wise calculation of unpackHalf2x16.
> +    *
> +    * Given a uint that encodes a float16 in its lower 16 bits, this
> function
> +    * returns a uint that encodes a float32 with the same value. The sign
> bit
> +    * of the float16 is ignored.
> +    *
> +    * \param e_rval is the unshifted exponent bits of a float16
> +    * \param m_rval is the unshifted mantissa bits of a float16
> +    * \param a uint rvalue that encodes a float32
> +    */
> +   ir_rvalue*
> +   unpack_half_1x16_nosign(ir_rvalue *e_rval, ir_rvalue *m_rval)
> +   {
> +      assert(e_rval->type == glsl_type::uint_type);
> +      assert(m_rval->type == glsl_type::uint_type);
> +
> +      /* uint u32; */
> +      ir_variable *u32 = factory.make_temp(glsl_type::uint_type,
> +                                           "tmp_unpack_half_1x16_u32");
> +
> +      /* uint e = E_RVAL; */
> +      ir_variable *e = factory.make_temp(glsl_type::uint_type,
> +                                          "tmp_unpack_half_1x16_e");
> +      factory.emit(assign(e, e_rval));
> +
> +      /* uint m = M_RVAL; */
> +      ir_variable *m = factory.make_temp(glsl_type::uint_type,
> +                                          "tmp_unpack_half_1x16_m");
> +      factory.emit(assign(m, m_rval));
> +
> +      /* Preliminaries
> +       * -------------
> +       *
> +       * For a float16, the bit layout is:
> +       *
> +       *   sign:     15
> +       *   exponent: 10:14
> +       *   mantissa: 0:9
> +       *
> +       * Let f16 be a float16 value. The sign, exponent, and mantissa
> +       * determine its value thus:
> +       *
> +       *   if e16 = 0 and m16 = 0, then zero:       (-1)^s16 * 0
>                       (1)
> +       *   if e16 = 0 and m16!= 0, then subnormal:  (-1)^s16 * 2^(e16 -
> 14) * (m16 / 2^10)     (2)
> +       *   if 0 < e16 < 31, then normal:            (-1)^s16 * 2^(e16 -
> 15) * (1 + m16 / 2^10) (3)
> +       *   if e16 = 31 and m16 = 0, then infinite:  (-1)^s16 * inf
>                       (4)
> +       *   if e16 = 31 and m16 != 0, then           NaN
>                      (5)
> +       *
> +       * where 0 <= m16 < 2^10.
> +       *
> +       * For a float32, the bit layout is:
> +       *
> +       *   sign: 31
> +       *   exponent: 23:30
> +       *   mantissa: 0:22
> +       *
> +       * Let f32 be a float32 value. The sign, exponent, and mantissa
> +       * determine its value thus:
> +       *
> +       *   if e32 = 0 and m32 = 0, then zero:        (-1)^s * 0
>                      (10)
> +       *   if e32 = 0 and m32 != 0, then subnormal:  (-1)^s * 2^(e32 -
> 126) * (m32 / 2^23)     (11)
> +       *   if 0 < e32 < 255, then normal:            (-1)^s * 2^(e32 -
> 127) * (1 + m32 / 2^23) (12)
> +       *   if e32 = 255 and m32 = 0, then infinite:  (-1)^s * inf
>                      (13)
> +       *   if e32 = 255 and m32 != 0, then           NaN
>                       (14)
> +       *
> +       * where 0 <= m32 < 2^23.
> +       *
> +       * Calculation
> +       * -----------
> +       * Our task is to compute s32, e32, m32 given f16.  Since this
> function
> +       * ignores the sign bit, assume that s32 = s16 = 0.  There are
> several
> +       * cases consider.
> +       */
> +
> +      factory.emit(
> +
> +         /* Case 1) f16 is zero or subnormal.
> +          *
> +          *   The simplest method of calcuating f32 in this case is
> +          *
> +          *     f32 = f16                       (20)
> +          *         = 2^(-14) * (m16 / 2^10)    (21)
> +          *         = m16 / 2^(-24)             (22)
> +          */
> +
> +         /* if (e16 == 0) { */
> +         if_tree(equal(e, constant(0u)),
> +
> +            /* u32 = bitcast_f2u(float(m) / float(1 << 24)); */
> +            assign(u32, expr(ir_unop_bitcast_f2u,
> +                                div(u2f(m), constant((float)(1 << 24))))),
> +
> +         /* Case 2) f16 is normal.
> +          *
> +          *   The equation
> +          *
> +          *     f32 = f16                              (30)
> +          *     2^(e32 - 127) * (1 + m32 / 2^23) =     (31)
> +          *       2^(e16 - 15) * (1 + m16 / 2^10)
> +          *
> +          *   can be decomposed into two
> +          *
> +          *     2^(e32 - 127) = 2^(e16 - 15)           (32)
> +          *     1 + m32 / 2^23 = 1 + m16 / 2^10        (33)
> +          *
> +          *   which solve to
> +          *
> +          *     e32 = e16 + 112                        (34)
> +          *     m32 = m16 * 2^13                       (35)
> +          */
> +
> +         /* } else if (e16 < 31)) { */
> +         if_tree(less(e, constant(31u << 10u)),
> +
> +              /* u32 = ((e << 13) + (112 << 23))
> +               *     | (m << 13);
> +               */
> +              assign(u32, bit_or(add(lshift(e, constant(13u)),
> +                                     constant(112u << 23u)),
> +                                 lshift(m, constant(13u)))),
>

I believe you can save one operation by factoring out the "<< 13" to get:

assign(u32, lshift(bit_or(add(e, constant(112u << 10u)), m),
                   constant(13u)))

> +
> +         /* Case 3) f16 is infinite. */
> +         if_tree(equal(m, constant(0u)),
> +
> +                 assign(u32, constant(255u << 23u)),
> +
> +         /* Case 4) f16 is NaN. */
> +         /* } else { */
> +
> +            assign(u32, constant(0x7fffffffu))))));
> +
> +         /* } */
> +
> +      return deref(u32).val;
> +   }
> +
>

(snip)

Well done!  This is a tour de force, Chad.  The only comment that I
consider blocking is the 142 vs 143 mix-up I noted above, and even that is
only in the comments.  With that fixed, this patch is:

Reviewed-by: Paul Berry <stereotype441 at gmail.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20130122/2a32a868/attachment-0001.html>