[Mesa-dev] [PATCH v2] glsl: provide the option of using BFE for unpack builting lowering
Matt Turner
mattst88 at gmail.com
Thu Aug 27 23:25:36 PDT 2015
On Tue, Aug 25, 2015 at 9:39 AM, Ilia Mirkin <imirkin at alum.mit.edu> wrote:
> This greatly improves generated code, especially for the snorm variants,
> since it is able to get rid of the lshift/rshift for sext, as well as
> replacing each shift + mask with a single op.
>
> Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu>
> ---
>
> v1 -> v2: Only use BFE for .yz of a -> uvec4 conversion.
>
> src/glsl/ir_builder.cpp | 6 ++
> src/glsl/ir_builder.h | 1 +
> src/glsl/ir_optimization.h | 1 +
> src/glsl/lower_packing_builtins.cpp | 103 +++++++++++++++++++++++++----
> src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 3 +-
> 5 files changed, 100 insertions(+), 14 deletions(-)
>
> diff --git a/src/glsl/ir_builder.cpp b/src/glsl/ir_builder.cpp
> index cd03859..c9cf124 100644
> --- a/src/glsl/ir_builder.cpp
> +++ b/src/glsl/ir_builder.cpp
> @@ -567,6 +567,12 @@ csel(operand a, operand b, operand c)
> }
>
> ir_expression *
> +bitfield_extract(operand a, operand b, operand c)
> +{
> + return expr(ir_triop_bitfield_extract, a, b, c);
> +}
> +
> +ir_expression *
> bitfield_insert(operand a, operand b, operand c, operand d)
> {
> void *mem_ctx = ralloc_parent(a.val);
> diff --git a/src/glsl/ir_builder.h b/src/glsl/ir_builder.h
> index f76453f..b483ebf 100644
> --- a/src/glsl/ir_builder.h
> +++ b/src/glsl/ir_builder.h
> @@ -200,6 +200,7 @@ ir_expression *interpolate_at_sample(operand a, operand b);
> ir_expression *fma(operand a, operand b, operand c);
> ir_expression *lrp(operand x, operand y, operand a);
> ir_expression *csel(operand a, operand b, operand c);
> +ir_expression *bitfield_extract(operand a, operand b, operand c);
> ir_expression *bitfield_insert(operand a, operand b, operand c, operand d);
>
> ir_swizzle *swizzle(operand a, int swizzle, int components);
> diff --git a/src/glsl/ir_optimization.h b/src/glsl/ir_optimization.h
> index b955874..265b223 100644
> --- a/src/glsl/ir_optimization.h
> +++ b/src/glsl/ir_optimization.h
> @@ -69,6 +69,7 @@ enum lower_packing_builtins_op {
> LOWER_UNPACK_UNORM_4x8 = 0x0800,
>
> LOWER_PACK_USE_BFI = 0x1000,
> + LOWER_PACK_USE_BFE = 0x2000,
> };
>
> bool do_common_optimization(exec_list *ir, bool linked,
> diff --git a/src/glsl/lower_packing_builtins.cpp b/src/glsl/lower_packing_builtins.cpp
> index be17e1d..1bd635e 100644
> --- a/src/glsl/lower_packing_builtins.cpp
> +++ b/src/glsl/lower_packing_builtins.cpp
> @@ -118,6 +118,7 @@ public:
> *rvalue = split_unpack_half_2x16(op0);
> break;
> case LOWER_PACK_UNPACK_NONE:
> + default:
Put the new enum values here instead of default (and put
LOWER_PACK_USE_BFI in the first patch)
> assert(!"not reached");
> break;
> }
> @@ -307,6 +308,39 @@ private:
> }
>
> /**
> + * \brief Unpack a uint32 into two int16's.
> + *
> + * Specifically each 16-bit value is sign-extended to the full width of an
> + * int32 on return.
> + */
> + ir_rvalue *
> + unpack_uint_to_ivec2(ir_rvalue *uint_rval)
> + {
> + assert(uint_rval->type == glsl_type::uint_type);
> +
> + if (!(op_mask & LOWER_PACK_USE_BFE)) {
> + return rshift(lshift(u2i(unpack_uint_to_uvec2(uint_rval)),
> + constant(16u)),
> + constant(16u));
> + }
> +
> + ir_variable *i = factory.make_temp(glsl_type::int_type,
> + "tmp_unpack_uint_to_ivec2_i");
Every instance of '"tmp' in this patch is indented one space too many,
probably because the rest of the file is like that -- probably by
mistake.
With those couple of things fixed, both patches are
Reviewed-by: Matt Turner <mattst88 at gmail.com>
I'll follow up with a patch for i965 to use this as well. It looks
like it cuts a few instructions there too.
More information about the mesa-dev
mailing list