[Mesa-dev] [PATCH v2] glsl: provide the option of using BFE for unpack builting lowering

Thu Aug 27 23:41:25 PDT 2015

On Fri, Aug 28, 2015 at 2:25 AM, Matt Turner <mattst88 at gmail.com> wrote:
> On Tue, Aug 25, 2015 at 9:39 AM, Ilia Mirkin <imirkin at alum.mit.edu> wrote:
>> This greatly improves generated code, especially for the snorm variants,
>> since it is able to get rid of the lshift/rshift for sext, as well as
>> replacing each shift + mask with a single op.
>>
>> Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu>
>> ---
>>
>> v1 -> v2: Only use BFE for .yz of a -> uvec4 conversion.
>>
>>  src/glsl/ir_builder.cpp                    |   6 ++
>>  src/glsl/ir_builder.h                      |   1 +
>>  src/glsl/ir_optimization.h                 |   1 +
>>  src/glsl/lower_packing_builtins.cpp        | 103 +++++++++++++++++++++++++----
>>  src/mesa/state_tracker/st_glsl_to_tgsi.cpp |   3 +-
>>  5 files changed, 100 insertions(+), 14 deletions(-)
>>
>> diff --git a/src/glsl/ir_builder.cpp b/src/glsl/ir_builder.cpp
>> index cd03859..c9cf124 100644
>> --- a/src/glsl/ir_builder.cpp
>> +++ b/src/glsl/ir_builder.cpp
>> @@ -567,6 +567,12 @@ csel(operand a, operand b, operand c)
>>  }
>>
>>  ir_expression *
>> +bitfield_extract(operand a, operand b, operand c)
>> +{
>> +   return expr(ir_triop_bitfield_extract, a, b, c);
>> +}
>> +
>> +ir_expression *
>>  bitfield_insert(operand a, operand b, operand c, operand d)
>>  {
>>     void *mem_ctx = ralloc_parent(a.val);
>> diff --git a/src/glsl/ir_builder.h b/src/glsl/ir_builder.h
>> index f76453f..b483ebf 100644
>> --- a/src/glsl/ir_builder.h
>> +++ b/src/glsl/ir_builder.h
>> @@ -200,6 +200,7 @@ ir_expression *interpolate_at_sample(operand a, operand b);
>>  ir_expression *fma(operand a, operand b, operand c);
>>  ir_expression *lrp(operand x, operand y, operand a);
>>  ir_expression *csel(operand a, operand b, operand c);
>> +ir_expression *bitfield_extract(operand a, operand b, operand c);
>>  ir_expression *bitfield_insert(operand a, operand b, operand c, operand d);
>>
>>  ir_swizzle *swizzle(operand a, int swizzle, int components);
>> diff --git a/src/glsl/ir_optimization.h b/src/glsl/ir_optimization.h
>> index b955874..265b223 100644
>> --- a/src/glsl/ir_optimization.h
>> +++ b/src/glsl/ir_optimization.h
>> @@ -69,6 +69,7 @@ enum lower_packing_builtins_op {
>>     LOWER_UNPACK_UNORM_4x8               = 0x0800,
>>
>>     LOWER_PACK_USE_BFI                   = 0x1000,
>> +   LOWER_PACK_USE_BFE                   = 0x2000,
>>  };
>>
>>  bool do_common_optimization(exec_list *ir, bool linked,
>> diff --git a/src/glsl/lower_packing_builtins.cpp b/src/glsl/lower_packing_builtins.cpp
>> index be17e1d..1bd635e 100644
>> --- a/src/glsl/lower_packing_builtins.cpp
>> +++ b/src/glsl/lower_packing_builtins.cpp
>> @@ -118,6 +118,7 @@ public:
>>           *rvalue = split_unpack_half_2x16(op0);
>>           break;
>>        case LOWER_PACK_UNPACK_NONE:
>> +      default:
>
> Put the new enum values here instead of default (and put
> LOWER_PACK_USE_BFI in the first patch)

OK.

>
>>           assert(!"not reached");
>>           break;
>>        }
>> @@ -307,6 +308,39 @@ private:
>>     }
>>
>>     /**
>> +    * \brief Unpack a uint32 into two int16's.
>> +    *
>> +    * Specifically each 16-bit value is sign-extended to the full width of an
>> +    * int32 on return.
>> +    */
>> +   ir_rvalue *
>> +   unpack_uint_to_ivec2(ir_rvalue *uint_rval)
>> +   {
>> +      assert(uint_rval->type == glsl_type::uint_type);
>> +
>> +      if (!(op_mask & LOWER_PACK_USE_BFE)) {
>> +         return rshift(lshift(u2i(unpack_uint_to_uvec2(uint_rval)),
>> +                              constant(16u)),
>> +                       constant(16u));
>> +      }
>> +
>> +      ir_variable *i = factory.make_temp(glsl_type::int_type,
>> +                                          "tmp_unpack_uint_to_ivec2_i");
>
> Every instance of '"tmp' in this patch is indented one space too many,
> probably because the rest of the file is like that -- probably by
> mistake.

Oooops, not sure how this happened. I probably edited the first line
without reindenting the second line. Will definitely fix.

>
> With those couple of things fixed, both patches are
>
> Reviewed-by: Matt Turner <mattst88 at gmail.com>
>
> I'll follow up with a patch for i965 to use this as well. It looks
> like it cuts a few instructions there too.

Cool thanks! I think you should be able to use a triop bfi instead of
the quadop, which will help more on i965.