[Mesa-dev] [PATCH] glsl: use bitfield_insert instead of and + shift + or for packing

Thu Aug 27 23:35:56 PDT 2015

On Fri, Aug 28, 2015 at 2:20 AM, Matt Turner <mattst88 at gmail.com> wrote:
> On Thu, Aug 20, 2015 at 6:00 PM, Ilia Mirkin <imirkin at alum.mit.edu> wrote:
>> It is fairly tricky to detect the proper conditions for using bitfield
>> insert, but easy to just use it up front. This removes a lot of
>> instructions on nvc0 when invoking the packing builtins.
>>
>> Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu>
>> ---
>>
>> Not sure if all backends will benefit from this, but nvc0 definitely
>> does, since it has a "insert bitfield" instruction which handles
>> everything all-in-one (the combined offset/bits argument takes the
>> position of the third source, which requires no extra logic when
>> they're both immediate).
>>
>> It would be just as easy to use the 3-arg bfi op (no masking necessary
>> since we know that we start with a clean value), but gallium doesn't
>> support it. (Not for the least of reasons is that nvc0 doesn't support
>> it and I'm the one that added BFI to TGSI.)
>>
>> Unpacking could use BFE, really the snorm logic would greatly benefit
>> from using an IBFE (thus avoiding the shifts back and forth),
>> but... later.
>>
>>  src/glsl/ir_optimization.h                 |  4 +++-
>>  src/glsl/lower_packing_builtins.cpp        | 24 +++++++++++++++++++++++-
>>  src/mesa/state_tracker/st_glsl_to_tgsi.cpp |  3 +++
>>  3 files changed, 29 insertions(+), 2 deletions(-)
>>
>> diff --git a/src/glsl/ir_optimization.h b/src/glsl/ir_optimization.h
>> index eef107e..b955874 100644
>> --- a/src/glsl/ir_optimization.h
>> +++ b/src/glsl/ir_optimization.h
>> @@ -66,7 +66,9 @@ enum lower_packing_builtins_op {
>>     LOWER_UNPACK_SNORM_4x8               = 0x0200,
>>
>>     LOWER_PACK_UNORM_4x8                 = 0x0400,
>> -   LOWER_UNPACK_UNORM_4x8               = 0x0800
>> +   LOWER_UNPACK_UNORM_4x8               = 0x0800,
>> +
>> +   LOWER_PACK_USE_BFI                   = 0x1000,
>>  };
>>
>>  bool do_common_optimization(exec_list *ir, bool linked,
>> diff --git a/src/glsl/lower_packing_builtins.cpp b/src/glsl/lower_packing_builtins.cpp
>> index a6fb8a8..be17e1d 100644
>> --- a/src/glsl/lower_packing_builtins.cpp
>> +++ b/src/glsl/lower_packing_builtins.cpp
>> @@ -225,6 +225,14 @@ private:
>>                                            "tmp_pack_uvec2_to_uint");
>>        factory.emit(assign(u, uvec2_rval));
>>
>> +      if (op_mask & LOWER_PACK_USE_BFI) {
>> +         return bitfield_insert(
>> +            bit_and(swizzle_x(u), constant(0xffffu)),
>> +            swizzle_y(u),
>> +            constant(16),
>> +            constant(16));
>
> Indent bitfield_insert's arguments to align with its "(" (and probably
> put bit_and on the same line)

Hm, well there was an earlier time when I had more arguments and it
wouldn't fit in 80 chars. However I believe that this indentation is
perfectly valid. This is how I've always done it if I put the first
arg on the next line. emacs agrees.

Anyways, I'll try to fit the first bit_and on the same line, in which
case indenting to the ( will make sense.

>
>> +      }
>> +
>>        /* return (u.y << 16) | (u.x & 0xffff); */
>>        return bit_or(lshift(swizzle_y(u), constant(16u)),
>>                      bit_and(swizzle_x(u), constant(0xffffu)));
>> @@ -242,9 +250,23 @@ private:
>>     {
>>        assert(uvec4_rval->type == glsl_type::uvec4_type);
>>
>> -      /* uvec4 u = UVEC4_RVAL; */
>>        ir_variable *u = factory.make_temp(glsl_type::uvec4_type,
>>                                            "tmp_pack_uvec4_to_uint");
>> +
>> +      if (op_mask & LOWER_PACK_USE_BFI) {
>> +         /* uvec4 u = UVEC4_RVAL; */
>> +         factory.emit(assign(u, uvec4_rval));
>> +
>> +         return bitfield_insert(
>> +            bitfield_insert(
>> +               bitfield_insert(
>> +                  bit_and(swizzle_x(u), constant(0xffu)),
>> +                  swizzle_y(u), constant(8), constant(8)),
>> +               swizzle_z(u), constant(16), constant(8)),
>> +            swizzle_w(u), constant(24), constant(8));
>
> I don't know if there's a nice way to indent such a long expression,
> but I'd indent the lines after the return three spaces more than the
> first bitfield_insert.

Again, this is how emacs does it, and also the way I've always done it
(Have I always done it that way because that's how emacs does it? Hard
to tell by now, but I think I did it like this before I used emacs). I
could put the first bitfield_insert on the next line after the return,
perhaps that'll result in a more pleasing indentation? i.e.

return
   bitfield_insert(
       bitfield_insert(
...

If you'd like to make rules for emacs for the glsl dir that do things
just the way you like them, I probably won't object too much, but I
tend to let the author determine such smaller details as long as its
within the regular rules of indentation, even if it's not exactly the
way I'd do it.

>
>> +      }
>> +
>> +      /* uvec4 u = UVEC4_RVAL & 0xff */
>>        factory.emit(assign(u, bit_and(uvec4_rval, constant(0xffu))));
>>
>>        /* return (u.w << 24) | (u.z << 16) | (u.y << 8) | u.x; */
>> diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
>> index eb47685..d3c7238 100644
>> --- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
>> +++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
>> @@ -5995,6 +5995,9 @@ st_link_shader(struct gl_context *ctx, struct gl_shader_program *prog)
>>                                 LOWER_PACK_HALF_2x16 |
>>                                 LOWER_UNPACK_HALF_2x16;
>>
>> +         if (ctx->Extensions.ARB_gpu_shader5)
>> +            lower_inst |= LOWER_PACK_USE_BFI;
>> +
>>           lower_packing_builtins(ir, lower_inst);
>>        }