<div dir="ltr">On 23 January 2013 12:44, Chad Versace <span dir="ltr"><<a href="mailto:chad.versace@linux.intel.com" target="_blank">chad.versace@linux.intel.com</a>></span> wrote:<br><div class="gmail_extra"><div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">On 01/22/2013 09:19 PM, Paul Berry wrote:<br>
> On 21 January 2013 00:49, Chad Versace <<a href="mailto:chad.versace@linux.intel.com">chad.versace@linux.intel.com</a>> wrote:<br>
><br>
>> Lower them to arithmetic and bit manipulation expressions.<br>
>><br>
>> v2:<br>
>> - Rewrite using ir_builder. [for idr]<br>
>> - In lowering packHalf2x16, don't truncate subnormal float16 values to<br>
>> zero.<br>
>> And round to even rather than to zero. [for stereotype441]<br>
>><br>
>> CC: Ian Romanick <<a href="mailto:idr@freedesktop.org">idr@freedesktop.org</a>><br>
>> CC: Paul Berry <<a href="mailto:stereotype441@gmail.com">stereotype441@gmail.com</a>><br>
>> Signed-off-by: Chad Versace <<a href="mailto:chad.versace@linux.intel.com">chad.versace@linux.intel.com</a>><br>
>> ---<br>
>> src/glsl/Makefile.sources | 1 +<br>
>> src/glsl/ir_optimization.h | 20 +<br>
>> src/glsl/lower_packing_builtins.cpp | 1043<br>
>> +++++++++++++++++++++++++++++++++++<br>
>> 3 files changed, 1064 insertions(+)<br>
>> create mode 100644 src/glsl/lower_packing_builtins.cpp<br>
<br>
<br>
<br>
</div><div class="im">>> + void<br>
>> + setup_factory(void *mem_ctx)<br>
>> + {<br>
>> + assert(factory.mem_ctx == NULL);<br>
>> + factory.mem_ctx = mem_ctx;<br>
>> +<br>
>> + /* Avoid making a new list for each call to handle_rvalue(). Make a<br>
>> + * single list and reuse it.<br>
>> + */<br>
>> + if (factory.instructions == NULL) {<br>
>> + factory.instructions = new(NULL) exec_list();<br>
>> + } else {<br>
>> + assert(factory.instructions->is_empty());<br>
>> + }<br>
>> + }<br>
>><br>
><br>
> Do we need factory.instructions to be heap-allocated? How about just<br>
> making a private exec_list inside lower_packing_builtins_visitor and<br>
> setting factory.instructions to point to it in the<br>
> lower_packing_builtins_visitor constructor?<br>
><br>
> (snip)<br>
<br>
</div>That seems reasonable. It saves a new/delete pair on each call to<br>
lower_packing_builtins(). I'll add that change.<br>
<br>
I assume that this change is minimal enough that I don't need to<br>
repost the patch.<br></blockquote><div><br></div><div style>Agreed.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="im"><br>
>> + /* Case 3) f32 lies in the range<br>
>> + * [min_norm16, max_norm16 + max_step16).<br>
>> + *<br>
>> + * The resultant float16 will be either normal or infinite.<br>
>> + *<br>
>> + * Solving<br>
>> + *<br>
>> + * f32 = max_norm16 + max_step16 (40)<br>
>> + * = 2^15 * (1 + 1023 / 2^10) + 2^5 (41)<br>
>> + * = 2^16 (42)<br>
>> + * gives<br>
>> + *<br>
>> + * e32 = 142 and m32 = 0 (43)<br>
>><br>
><br>
> I calculate this to be 143, not 142.<br>
><br>
><br>
>> + *<br>
>> + * We already solved the boundary condition f32 = min_norm16<br>
>> above<br>
>> + * in equation 31. Therefore this case occurs if and only if<br>
>> + *<br>
>> + * 113 <= e32 and e32 < 142<br>
>><br>
><br>
> So this should be e32 < 143.<br>
<br>
</div>Fixed.<br>
<div class="im"><br>
><br>
><br>
>> + */<br>
>> +<br>
>> + /* } else if (e32 < 142) { */<br>
>> + if_tree(lequal(e, constant(142u << 23u)),<br>
>><br>
><br>
> Fortunately, since you use "lequal" here, you get the correct effect.<br>
<br>
</div>And fixed the code here to match the fixed comments.<br>
<br>
<br>
/* } else if (e32 < 143) { */<br>
if_tree(less(e, constant(143u << 23u)),<br>
<div class="im"><br>
<br>
>> + /* } else if (e16 < 31)) { */<br>
>> + if_tree(less(e, constant(31u << 10u)),<br>
>> +<br>
>> + /* u32 = ((e << 13) + (112 << 23))<br>
>> + * | (m << 13);<br>
>> + */<br>
>> + assign(u32, bit_or(add(lshift(e, constant(13u)),<br>
>> + constant(112u << 23u)),<br>
>> + lshift(m, constant(13u)))),<br>
>><br>
><br>
> I believe you can save one operation by factoring out the "<< 13" to get:<br>
><br>
> assign(u32, lshift(bit_or(add(e, constant(112u << 10u)), m),<br>
> constant(13u)))<br>
<br>
</div>Fixed.<br>
<div class="im"><br>
<br>
><br>
> Well done! This is a tour de force, Chad. The only comment that I<br>
> consider blocking is the 142 vs 143 mix-up I noted above, and even that is<br>
> only in the comments. With that fixed, this patch is:<br>
><br>
> Reviewed-by: Paul Berry <<a href="mailto:stereotype441@gmail.com">stereotype441@gmail.com</a>><br>
<br>
</div>Thanks for the thorough review!<br>
<br>
</blockquote></div><br></div></div>