<div dir="ltr">On 21 January 2013 00:49, Chad Versace <span dir="ltr"><<a href="mailto:chad.versace@linux.intel.com" target="_blank">chad.versace@linux.intel.com</a>></span> wrote:<br><div class="gmail_extra"><div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Lower them to arithmetic and bit manipulation expressions.<br>
<br>
v2:<br>
- Rewrite using ir_builder. [for idr]<br>
- In lowering packHalf2x16, don't truncate subnormal float16 values to zero.<br>
And round to even rather than to zero. [for stereotype441]<br>
<br>
CC: Ian Romanick <<a href="mailto:idr@freedesktop.org">idr@freedesktop.org</a>><br>
CC: Paul Berry <<a href="mailto:stereotype441@gmail.com">stereotype441@gmail.com</a>><br>
Signed-off-by: Chad Versace <<a href="mailto:chad.versace@linux.intel.com">chad.versace@linux.intel.com</a>><br>
---<br>
src/glsl/Makefile.sources | 1 +<br>
src/glsl/ir_optimization.h | 20 +<br>
src/glsl/lower_packing_builtins.cpp | 1043 +++++++++++++++++++++++++++++++++++<br>
3 files changed, 1064 insertions(+)<br>
create mode 100644 src/glsl/lower_packing_builtins.cpp<br>
<br></blockquote><div><br></div><div>(snip)<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
+ void<br>
+ setup_factory(void *mem_ctx)<br>
+ {<br>
+ assert(factory.mem_ctx == NULL);<br>
+ factory.mem_ctx = mem_ctx;<br>
+<br>
+ /* Avoid making a new list for each call to handle_rvalue(). Make a<br>
+ * single list and reuse it.<br>
+ */<br>
+ if (factory.instructions == NULL) {<br>
+ factory.instructions = new(NULL) exec_list();<br>
+ } else {<br>
+ assert(factory.instructions->is_empty());<br>
+ }<br>
+ }<br></blockquote><div><br></div><div>Do we need factory.instructions to be heap-allocated? How about just making a private exec_list inside lower_packing_builtins_visitor and setting factory.instructions to point to it in the lower_packing_builtins_visitor constructor?<br>
<br></div><div>(snip)<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">+ /**<br>
+ * \brief Lower the component-wise calculation of packHalf2x16.<br>
+ *<br>
+ * \param f_rval is one component of packHafl2x16's input<br>
+ * \param e_rval is the unshifted exponent bits of f_rval<br>
+ * \param m_rval is the unshifted mantissa bits of f_rval<br>
+ *<br>
+ * \return a uint rvalue that encodes a float16 in its lower 16 bits<br>
+ */<br>
+ ir_rvalue*<br>
+ pack_half_1x16_nosign(ir_rvalue *f_rval,<br>
+ ir_rvalue *e_rval,<br>
+ ir_rvalue *m_rval)<br>
+ {<br>
+ assert(e_rval->type == glsl_type::uint_type);<br>
+ assert(m_rval->type == glsl_type::uint_type);<br>
+<br>
+ /* uint u16; */<br>
+ ir_variable *u16 = factory.make_temp(glsl_type::uint_type,<br>
+ "tmp_pack_half_1x16_u16");<br>
+<br>
+ /* float f = FLOAT_RVAL; */<br>
+ ir_variable *f = factory.make_temp(glsl_type::float_type,<br>
+ "tmp_pack_half_1x16_f");<br>
+ factory.emit(assign(f, f_rval));<br>
+<br>
+ /* uint e = E_RVAL; */<br>
+ ir_variable *e = factory.make_temp(glsl_type::uint_type,<br>
+ "tmp_pack_half_1x16_e");<br>
+ factory.emit(assign(e, e_rval));<br>
+<br>
+ /* uint m = M_RVAL; */<br>
+ ir_variable *m = factory.make_temp(glsl_type::uint_type,<br>
+ "tmp_pack_half_1x16_m");<br>
+ factory.emit(assign(m, m_rval));<br>
+<br>
+ /* Preliminaries<br>
+ * -------------<br>
+ *<br>
+ * For a float16, the bit layout is:<br>
+ *<br>
+ * sign: 15<br>
+ * exponent: 10:14<br>
+ * mantissa: 0:9<br>
+ *<br>
+ * Let f16 be a float16 value. The sign, exponent, and mantissa<br>
+ * determine its value thus:<br>
+ *<br>
+ * if e16 = 0 and m16 = 0, then zero: (-1)^s16 * 0 (1)<br>
+ * if e16 = 0 and m16!= 0, then subnormal: (-1)^s16 * 2^(e16 - 14) * (m16 / 2^10) (2)<br>
+ * if 0 < e16 < 31, then normal: (-1)^s16 * 2^(e16 - 15) * (1 + m16 / 2^10) (3)<br>
+ * if e16 = 31 and m16 = 0, then infinite: (-1)^s16 * inf (4)<br>
+ * if e16 = 31 and m16 != 0, then NaN (5)<br>
+ *<br>
+ * where 0 <= m16 < 2^10.<br>
+ *<br>
+ * For a float32, the bit layout is:<br>
+ *<br>
+ * sign: 31<br>
+ * exponent: 23:30<br>
+ * mantissa: 0:22<br>
+ *<br>
+ * Let f32 be a float32 value. The sign, exponent, and mantissa<br>
+ * determine its value thus:<br>
+ *<br>
+ * if e32 = 0 and m32 = 0, then zero: (-1)^s * 0 (10)<br>
+ * if e32 = 0 and m32 != 0, then subnormal: (-1)^s * 2^(e32 - 126) * (m32 / 2^23) (11)<br>
+ * if 0 < e32 < 255, then normal: (-1)^s * 2^(e32 - 127) * (1 + m32 / 2^23) (12)<br>
+ * if e32 = 255 and m32 = 0, then infinite: (-1)^s * inf (13)<br>
+ * if e32 = 255 and m32 != 0, then NaN (14)<br>
+ *<br>
+ * where 0 <= m32 < 2^23.<br>
+ *<br>
+ * The minimum and maximum normal float16 values are<br>
+ *<br>
+ * min_norm16 = 2^(1 - 15) * (1 + 0 / 2^10) = 2^(-14) (20)<br>
+ * max_norm16 = 2^(30 - 15) * (1 + 1023 / 2^10) (21)<br>
+ *<br>
+ * The step at max_norm16 is<br>
+ *<br>
+ * max_step16 = 2^5 (22)<br>
+ *<br>
+ * Observe that the float16 boundary values in equations 20-21 lie in the<br>
+ * range of normal float32 values.<br>
+ *<br>
+ *<br>
+ * Rounding Behavior<br>
+ * -----------------<br>
+ * Not all float32 values can be exactly represented as a float16. We<br>
+ * round all such intermediate float32 values to the nearest float16; if<br>
+ * the float32 is exactly between to float16 values, we round to the one<br>
+ * with an even mantissa. This rounding behavior has several benefits:<br>
+ *<br>
+ * - It has no sign bias.<br>
+ *<br>
+ * - It reproduces the behavior of real hardware: opcode F32TO16 in Intel's<br>
+ * GPU ISA.<br>
+ *<br>
+ * - By reproducing the behavior of the GPU (at least on Intel hardware),<br>
+ * compile-time evaluation of constant packHalf2x16 GLSL expressions will<br>
+ * result in the same value as if the expression were executed on the<br>
+ * GPU.<br>
+ *<br>
+ * Calculation<br>
+ * -----------<br>
+ * Our task is to compute s16, e16, m16 given f32. Since this function<br>
+ * ignores the sign bit, assume that s32 = s16 = 0. There are several<br>
+ * cases consider.<br>
+ */<br>
+<br>
+ factory.emit(<br>
+<br>
+ /* Case 1) f32 is NaN<br>
+ *<br>
+ * The resultant f16 will also be NaN.<br>
+ */<br>
+<br>
+ /* if (e32 == 255 && m32 != 0) { */<br>
+ if_tree(logic_and(equal(e, constant(0xffu << 23u)),<br>
+ logic_not(equal(m, constant(0u)))),<br>
+<br>
+ assign(u16, constant(0x7fffu)),<br>
+<br>
+ /* Case 2) f32 lies in the range [0, min_norm16).<br>
+ *<br>
+ * The resultant float16 will be either zero, subnormal, or normal.<br>
+ *<br>
+ * Solving<br>
+ *<br>
+ * f32 = min_norm16 (30)<br>
+ *<br>
+ * gives<br>
+ *<br>
+ * e32 = 113 and m32 = 0 (31)<br>
+ *<br>
+ * Therefore this case occurs if and only if<br>
+ *<br>
+ * e32 < 113 (32)<br>
+ */<br>
+<br>
+ /* } else if (e32 < 113) { */<br>
+ if_tree(less(e, constant(113u << 23u)),<br>
+<br>
+ /* u16 = uint(round_to_even(abs(f32) * float(1u << 24u))); */<br>
+ assign(u16, f2u(round_even(mul(expr(ir_unop_abs, f),<br>
+ constant((float) (1 << 24)))))),<br>
+<br>
+ /* Case 3) f32 lies in the range<br>
+ * [min_norm16, max_norm16 + max_step16).<br>
+ *<br>
+ * The resultant float16 will be either normal or infinite.<br>
+ *<br>
+ * Solving<br>
+ *<br>
+ * f32 = max_norm16 + max_step16 (40)<br>
+ * = 2^15 * (1 + 1023 / 2^10) + 2^5 (41)<br>
+ * = 2^16 (42)<br>
+ * gives<br>
+ *<br>
+ * e32 = 142 and m32 = 0 (43)<br></blockquote><div><br></div><div>I calculate this to be 143, not 142.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
+ *<br>
+ * We already solved the boundary condition f32 = min_norm16 above<br>
+ * in equation 31. Therefore this case occurs if and only if<br>
+ *<br>
+ * 113 <= e32 and e32 < 142<br></blockquote><div><br></div><div>So this should be e32 < 143.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
+ */<br>
+<br>
+ /* } else if (e32 < 142) { */<br>
+ if_tree(lequal(e, constant(142u << 23u)),<br></blockquote><div><br></div><div>Fortunately, since you use "lequal" here, you get the correct effect.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
+<br>
+ /* The addition below handles the case where the mantissa rounds<br>
+ * up to 1024 and bumps the exponent.<br>
+ *<br>
+ * u16 = ((e - (112u << 23u)) >> 13u)<br>
+ * + round_to_even((float(m) / (1u << 13u));<br>
+ */<br>
+ assign(u16, add(rshift(sub(e, constant(112u << 23u)),<br>
+ constant(13u)),<br>
+ f2u(round_even(<br>
+ div(u2f(m), constant((float) (1 << 13))))))),<br>
+<br>
+ /* Case 4) f32 lies in the range [max_norm16 + max_step16, inf].<br>
+ *<br>
+ * The resultant float16 will be infinite.<br>
+ *<br>
+ * The cases above caught all float32 values in the range<br>
+ * [0, max_norm16 + max_step16), so this is the fall-through case.<br>
+ */<br>
+<br>
+ /* } else { */<br>
+<br>
+ assign(u16, constant(31u << 10u))))));<br>
+<br>
+ /* } */<br>
+<br>
+ return deref(u16).val;<br>
+ }<br></blockquote><div><br></div><div>(snip)<br> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
+ /**<br>
+ * \brief Lower the component-wise calculation of unpackHalf2x16.<br>
+ *<br>
+ * Given a uint that encodes a float16 in its lower 16 bits, this function<br>
+ * returns a uint that encodes a float32 with the same value. The sign bit<br>
+ * of the float16 is ignored.<br>
+ *<br>
+ * \param e_rval is the unshifted exponent bits of a float16<br>
+ * \param m_rval is the unshifted mantissa bits of a float16<br>
+ * \param a uint rvalue that encodes a float32<br>
+ */<br>
+ ir_rvalue*<br>
+ unpack_half_1x16_nosign(ir_rvalue *e_rval, ir_rvalue *m_rval)<br>
+ {<br>
+ assert(e_rval->type == glsl_type::uint_type);<br>
+ assert(m_rval->type == glsl_type::uint_type);<br>
+<br>
+ /* uint u32; */<br>
+ ir_variable *u32 = factory.make_temp(glsl_type::uint_type,<br>
+ "tmp_unpack_half_1x16_u32");<br>
+<br>
+ /* uint e = E_RVAL; */<br>
+ ir_variable *e = factory.make_temp(glsl_type::uint_type,<br>
+ "tmp_unpack_half_1x16_e");<br>
+ factory.emit(assign(e, e_rval));<br>
+<br>
+ /* uint m = M_RVAL; */<br>
+ ir_variable *m = factory.make_temp(glsl_type::uint_type,<br>
+ "tmp_unpack_half_1x16_m");<br>
+ factory.emit(assign(m, m_rval));<br>
+<br>
+ /* Preliminaries<br>
+ * -------------<br>
+ *<br>
+ * For a float16, the bit layout is:<br>
+ *<br>
+ * sign: 15<br>
+ * exponent: 10:14<br>
+ * mantissa: 0:9<br>
+ *<br>
+ * Let f16 be a float16 value. The sign, exponent, and mantissa<br>
+ * determine its value thus:<br>
+ *<br>
+ * if e16 = 0 and m16 = 0, then zero: (-1)^s16 * 0 (1)<br>
+ * if e16 = 0 and m16!= 0, then subnormal: (-1)^s16 * 2^(e16 - 14) * (m16 / 2^10) (2)<br>
+ * if 0 < e16 < 31, then normal: (-1)^s16 * 2^(e16 - 15) * (1 + m16 / 2^10) (3)<br>
+ * if e16 = 31 and m16 = 0, then infinite: (-1)^s16 * inf (4)<br>
+ * if e16 = 31 and m16 != 0, then NaN (5)<br>
+ *<br>
+ * where 0 <= m16 < 2^10.<br>
+ *<br>
+ * For a float32, the bit layout is:<br>
+ *<br>
+ * sign: 31<br>
+ * exponent: 23:30<br>
+ * mantissa: 0:22<br>
+ *<br>
+ * Let f32 be a float32 value. The sign, exponent, and mantissa<br>
+ * determine its value thus:<br>
+ *<br>
+ * if e32 = 0 and m32 = 0, then zero: (-1)^s * 0 (10)<br>
+ * if e32 = 0 and m32 != 0, then subnormal: (-1)^s * 2^(e32 - 126) * (m32 / 2^23) (11)<br>
+ * if 0 < e32 < 255, then normal: (-1)^s * 2^(e32 - 127) * (1 + m32 / 2^23) (12)<br>
+ * if e32 = 255 and m32 = 0, then infinite: (-1)^s * inf (13)<br>
+ * if e32 = 255 and m32 != 0, then NaN (14)<br>
+ *<br>
+ * where 0 <= m32 < 2^23.<br>
+ *<br>
+ * Calculation<br>
+ * -----------<br>
+ * Our task is to compute s32, e32, m32 given f16. Since this function<br>
+ * ignores the sign bit, assume that s32 = s16 = 0. There are several<br>
+ * cases consider.<br>
+ */<br>
+<br>
+ factory.emit(<br>
+<br>
+ /* Case 1) f16 is zero or subnormal.<br>
+ *<br>
+ * The simplest method of calcuating f32 in this case is<br>
+ *<br>
+ * f32 = f16 (20)<br>
+ * = 2^(-14) * (m16 / 2^10) (21)<br>
+ * = m16 / 2^(-24) (22)<br>
+ */<br>
+<br>
+ /* if (e16 == 0) { */<br>
+ if_tree(equal(e, constant(0u)),<br>
+<br>
+ /* u32 = bitcast_f2u(float(m) / float(1 << 24)); */<br>
+ assign(u32, expr(ir_unop_bitcast_f2u,<br>
+ div(u2f(m), constant((float)(1 << 24))))),<br>
+<br>
+ /* Case 2) f16 is normal.<br>
+ *<br>
+ * The equation<br>
+ *<br>
+ * f32 = f16 (30)<br>
+ * 2^(e32 - 127) * (1 + m32 / 2^23) = (31)<br>
+ * 2^(e16 - 15) * (1 + m16 / 2^10)<br>
+ *<br>
+ * can be decomposed into two<br>
+ *<br>
+ * 2^(e32 - 127) = 2^(e16 - 15) (32)<br>
+ * 1 + m32 / 2^23 = 1 + m16 / 2^10 (33)<br>
+ *<br>
+ * which solve to<br>
+ *<br>
+ * e32 = e16 + 112 (34)<br>
+ * m32 = m16 * 2^13 (35)<br>
+ */<br>
+<br>
+ /* } else if (e16 < 31)) { */<br>
+ if_tree(less(e, constant(31u << 10u)),<br>
+<br>
+ /* u32 = ((e << 13) + (112 << 23))<br>
+ * | (m << 13);<br>
+ */<br>
+ assign(u32, bit_or(add(lshift(e, constant(13u)),<br>
+ constant(112u << 23u)),<br>
+ lshift(m, constant(13u)))),<br></blockquote><div><br></div><div>I believe you can save one operation by factoring out the "<< 13" to get:<br><br>assign(u32, lshift(bit_or(add(e, constant(112u << 10u)), m),<br>
constant(13u)))<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
+<br>
+ /* Case 3) f16 is infinite. */<br>
+ if_tree(equal(m, constant(0u)),<br>
+<br>
+ assign(u32, constant(255u << 23u)),<br>
+<br>
+ /* Case 4) f16 is NaN. */<br>
+ /* } else { */<br>
+<br>
+ assign(u32, constant(0x7fffffffu))))));<br>
+<br>
+ /* } */<br>
+<br>
+ return deref(u32).val;<br>
+ }<br>
+<br></blockquote><div><br></div><div>(snip)<br><br>Well done! This is a tour de force, Chad. The only comment that I consider blocking is the 142 vs 143 mix-up I noted above, and even that is only in the comments. With that fixed, this patch is:<br>
<br></div><div>Reviewed-by: Paul Berry <<a href="mailto:stereotype441@gmail.com">stereotype441@gmail.com</a>> <br></div></div></div></div>