<div dir="ltr">On 23 January 2013 12:44, Chad Versace <<a href="mailto:chad.versace@linux.intel.com" target="_blank">chad.versace@linux.intel.com</a>> wrote: <div class="gmail_extra"><div class="gmail_quote"> <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">On 01/22/2013 09:19 PM, Paul Berry wrote: > On 21 January 2013 00:49, Chad Versace <<a href="mailto:chad.versace@linux.intel.com">chad.versace@linux.intel.com</a>> wrote: > >> Lower them to arithmetic and bit manipulation expressions. >> >> v2: >> - Rewrite using ir_builder. [for idr] >> - In lowering packHalf2x16, don't truncate subnormal float16 values to >> zero. >> And round to even rather than to zero. [for stereotype441] >> >> CC: Ian Romanick <<a href="mailto:idr@freedesktop.org">idr@freedesktop.org</a>> >> CC: Paul Berry <<a href="mailto:stereotype441@gmail.com">stereotype441@gmail.com</a>> >> Signed-off-by: Chad Versace <<a href="mailto:chad.versace@linux.intel.com">chad.versace@linux.intel.com</a>> >> --- >> src/glsl/Makefile.sources | 1 + >> src/glsl/ir_optimization.h | 20 + >> src/glsl/lower_packing_builtins.cpp | 1043 >> +++++++++++++++++++++++++++++++++++ >> 3 files changed, 1064 insertions(+) >> create mode 100644 src/glsl/lower_packing_builtins.cpp </div><div class="im">>> + void >> + setup_factory(void *mem_ctx) >> + { >> + assert(factory.mem_ctx == NULL); >> + factory.mem_ctx = mem_ctx; >> + >> + /* Avoid making a new list for each call to handle_rvalue(). Make a >> + * single list and reuse it. >> + */ >> + if (factory.instructions == NULL) { >> + factory.instructions = new(NULL) exec_list(); >> + } else { >> + assert(factory.instructions->is_empty()); >> + } >> + } >> > > Do we need factory.instructions to be heap-allocated? How about just > making a private exec_list inside lower_packing_builtins_visitor and > setting factory.instructions to point to it in the > lower_packing_builtins_visitor constructor? > > (snip) </div>That seems reasonable. It saves a new/delete pair on each call to lower_packing_builtins(). I'll add that change. I assume that this change is minimal enough that I don't need to repost the patch. </blockquote><div> </div><div style>Agreed.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <div class="im"> >> + /* Case 3) f32 lies in the range >> + * [min_norm16, max_norm16 + max_step16). >> + * >> + * The resultant float16 will be either normal or infinite. >> + * >> + * Solving >> + * >> + * f32 = max_norm16 + max_step16 (40) >> + * = 2^15 * (1 + 1023 / 2^10) + 2^5 (41) >> + * = 2^16 (42) >> + * gives >> + * >> + * e32 = 142 and m32 = 0 (43) >> > > I calculate this to be 143, not 142. > > >> + * >> + * We already solved the boundary condition f32 = min_norm16 >> above >> + * in equation 31. Therefore this case occurs if and only if >> + * >> + * 113 <= e32 and e32 < 142 >> > > So this should be e32 < 143. </div>Fixed. <div class="im"> > > >> + */ >> + >> + /* } else if (e32 < 142) { */ >> + if_tree(lequal(e, constant(142u << 23u)), >> > > Fortunately, since you use "lequal" here, you get the correct effect. </div>And fixed the code here to match the fixed comments. /* } else if (e32 < 143) { */ if_tree(less(e, constant(143u << 23u)), <div class="im"> >> + /* } else if (e16 < 31)) { */ >> + if_tree(less(e, constant(31u << 10u)), >> + >> + /* u32 = ((e << 13) + (112 << 23)) >> + * | (m << 13); >> + */ >> + assign(u32, bit_or(add(lshift(e, constant(13u)), >> + constant(112u << 23u)), >> + lshift(m, constant(13u)))), >> > > I believe you can save one operation by factoring out the "<< 13" to get: > > assign(u32, lshift(bit_or(add(e, constant(112u << 10u)), m), > constant(13u))) </div>Fixed. <div class="im"> > > Well done! This is a tour de force, Chad. The only comment that I > consider blocking is the 142 vs 143 mix-up I noted above, and even that is > only in the comments. With that fixed, this patch is: > > Reviewed-by: Paul Berry <<a href="mailto:stereotype441@gmail.com">stereotype441@gmail.com</a>> </div>Thanks for the thorough review! </blockquote></div> </div></div>