[Mesa-dev] [PATCH (gles3) 15/20] glsl: Add lowering pass for GLSL ES 3.00 pack/unpack operations (v2)
Chad Versace
chad.versace at linux.intel.com
Wed Jan 23 12:44:50 PST 2013
On 01/22/2013 09:19 PM, Paul Berry wrote:
> On 21 January 2013 00:49, Chad Versace <chad.versace at linux.intel.com> wrote:
>
>> Lower them to arithmetic and bit manipulation expressions.
>>
>> v2:
>> - Rewrite using ir_builder. [for idr]
>> - In lowering packHalf2x16, don't truncate subnormal float16 values to
>> zero.
>> And round to even rather than to zero. [for stereotype441]
>>
>> CC: Ian Romanick <idr at freedesktop.org>
>> CC: Paul Berry <stereotype441 at gmail.com>
>> Signed-off-by: Chad Versace <chad.versace at linux.intel.com>
>> ---
>> src/glsl/Makefile.sources | 1 +
>> src/glsl/ir_optimization.h | 20 +
>> src/glsl/lower_packing_builtins.cpp | 1043
>> +++++++++++++++++++++++++++++++++++
>> 3 files changed, 1064 insertions(+)
>> create mode 100644 src/glsl/lower_packing_builtins.cpp
>> + void
>> + setup_factory(void *mem_ctx)
>> + {
>> + assert(factory.mem_ctx == NULL);
>> + factory.mem_ctx = mem_ctx;
>> +
>> + /* Avoid making a new list for each call to handle_rvalue(). Make a
>> + * single list and reuse it.
>> + */
>> + if (factory.instructions == NULL) {
>> + factory.instructions = new(NULL) exec_list();
>> + } else {
>> + assert(factory.instructions->is_empty());
>> + }
>> + }
>>
>
> Do we need factory.instructions to be heap-allocated? How about just
> making a private exec_list inside lower_packing_builtins_visitor and
> setting factory.instructions to point to it in the
> lower_packing_builtins_visitor constructor?
>
> (snip)
That seems reasonable. It saves a new/delete pair on each call to
lower_packing_builtins(). I'll add that change.
I assume that this change is minimal enough that I don't need to
repost the patch.
>> + /* Case 3) f32 lies in the range
>> + * [min_norm16, max_norm16 + max_step16).
>> + *
>> + * The resultant float16 will be either normal or infinite.
>> + *
>> + * Solving
>> + *
>> + * f32 = max_norm16 + max_step16 (40)
>> + * = 2^15 * (1 + 1023 / 2^10) + 2^5 (41)
>> + * = 2^16 (42)
>> + * gives
>> + *
>> + * e32 = 142 and m32 = 0 (43)
>>
>
> I calculate this to be 143, not 142.
>
>
>> + *
>> + * We already solved the boundary condition f32 = min_norm16
>> above
>> + * in equation 31. Therefore this case occurs if and only if
>> + *
>> + * 113 <= e32 and e32 < 142
>>
>
> So this should be e32 < 143.
Fixed.
>
>
>> + */
>> +
>> + /* } else if (e32 < 142) { */
>> + if_tree(lequal(e, constant(142u << 23u)),
>>
>
> Fortunately, since you use "lequal" here, you get the correct effect.
And fixed the code here to match the fixed comments.
/* } else if (e32 < 143) { */
if_tree(less(e, constant(143u << 23u)),
>> + /* } else if (e16 < 31)) { */
>> + if_tree(less(e, constant(31u << 10u)),
>> +
>> + /* u32 = ((e << 13) + (112 << 23))
>> + * | (m << 13);
>> + */
>> + assign(u32, bit_or(add(lshift(e, constant(13u)),
>> + constant(112u << 23u)),
>> + lshift(m, constant(13u)))),
>>
>
> I believe you can save one operation by factoring out the "<< 13" to get:
>
> assign(u32, lshift(bit_or(add(e, constant(112u << 10u)), m),
> constant(13u)))
Fixed.
>
> Well done! This is a tour de force, Chad. The only comment that I
> consider blocking is the 142 vs 143 mix-up I noted above, and even that is
> only in the comments. With that fixed, this patch is:
>
> Reviewed-by: Paul Berry <stereotype441 at gmail.com>
Thanks for the thorough review!
More information about the mesa-dev
mailing list