[Mesa-dev] [PATCH (gles3) 20/20] i965/fs/gen7: Emit code for GLSL 3.00 pack/unpack operations (v2)

Chad Versace chad.versace at linux.intel.com
Mon Jan 21 15:04:43 PST 2013


On 01/21/2013 01:18 PM, Eric Anholt wrote:
> Chad Versace <chad.versace at linux.intel.com> writes:
>>     ir->remove();
>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_emit.cpp b/src/mesa/drivers/dri/i965/brw_fs_emit.cpp
>> index 324e665..0ff296c 100644
>> --- a/src/mesa/drivers/dri/i965/brw_fs_emit.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_fs_emit.cpp
>> @@ -923,6 +923,34 @@ fs_generator::generate_set_global_offset(fs_inst *inst,
>>  }
>>  
>>  void
>> +fs_generator::generate_unpack_half_2x16_split_y(fs_inst *inst,
>> +                                                struct brw_reg dst,
>> +                                                struct brw_reg src)
>> +{
>> +   assert(intel->gen >= 7);
>> +
>> +   /* src has the form of unpackHalf2x16's input:
>> +    *
>> +    *         w     z     y          x
>> +    *   |undef|undef|undef|0xhhhhllll|
> 
> I don't understand what "w z y x" mean here.  I thought src was just a
> scalar uint (one uint per pixel).

You're right. I had vec4 on the mind when I wrote the comment. How about
changing it to this?

   /* Each channel of src has the form of unpackHalf2x16's input: 0xhhhhllll.
    * We wish to access only the "hhhh" bits of each channel, and hence
    * must access the register with a 16 bit subregister offset.  To do so, we
    * must halve the size of the source data type from UD to UW and compensate
    * by doubling the stride.
    */

> 
>> +   assert(src.type == BRW_REGISTER_TYPE_UD);
>> +   src.type = BRW_REGISTER_TYPE_UW;
>> +   if (src.vstride > 0)
>> +      ++src.vstride;
>> +   if (src.hstride > 0)
>> +      ++src.hstride;
> 
> This ++hstride/++vstride looks totally wrong to me.  We use a vstride of
> 8 in 16-wide for temporaries -- doesn't this totally trash that?
> 
> I think what you want here is an unconditional *= 2 for both values.

This indeed does *= 2, from the hardware's perspective. My mistake for
not providing an explanation.

src.hstride, before the adjustment, is one of the BRE_HORIZONTAL_STRIDE_*
values, which are defined as

#define BRW_HORIZONTAL_STRIDE_0   0
#define BRW_HORIZONTAL_STRIDE_1   1
#define BRW_HORIZONTAL_STRIDE_2   2
#define BRW_HORIZONTAL_STRIDE_4   3

That's why `if (hstride > 0) ++hstride` is the right thing to do here. Likewise
for vstride. In patch v2, I'll add comments explaining all this.

> 
>> +   src.subnr += 2;
>> +
>> +   brw_F16TO32(p, dst, src);
> 
> 
>> @@ -2261,6 +2276,69 @@ fs_visitor::emit_fb_writes()
>>  }
>>  
>>  void
>> +fs_visitor::emit_pack_half_2x16_split(fs_reg dst, fs_reg x, fs_reg y)
>> +{
>> +   if (intel->gen < 7)
>> +      assert(!"packHalf2x16 should be lowered");
>> +
>> +   /* uint dst; */
>> +   assert(dst.type == BRW_REGISTER_TYPE_UD);
>> +
>> +   /* float x; */
>> +   assert(x.type == BRW_REGISTER_TYPE_F);
>> +
>> +   /* float y; */
>> +   assert(y.type == BRW_REGISTER_TYPE_F);
>> +
>> +   /* uint tmp; */
>> +   fs_reg tmp(this, glsl_type::uint_type);
>> +
>> +   /* dst = f32to16(x); */
>> +   emit(BRW_OPCODE_F32TO16, dst, x);
>> +
>> +   /* tmp = f32to16(y); */
>> +   emit(BRW_OPCODE_F32TO16, tmp, y);
>> +
>> +   /* tmp <<= 16; */
>> +   emit(BRW_OPCODE_SHL, tmp, tmp, fs_reg(16u));
>> +
>> +   /* dst |= tmp; */
>> +   emit(BRW_OPCODE_OR, dst, dst, tmp);
>> +}
> 
> These comments before every op aren't helping me, since they just say
> the same as what the code below says.

Since this was my first time delving into i965 codegen, it helped me
to write these comments before writing the actual code. But now,
my eyes agree with yours: the comments are noise rather than signal.
I'll remove them.
 
> Actually, I think this code can bet simpler, given:
> 
> "The destination must be DWord aligned, the result is stored in the
> lower Word for each channel, data in the upper Word of the destination
> is untouched"
> 
> So:
> 
>    emit(BRW_OPCODE_F32TO16, dst, y);
>    emit(BRW_OPCODE_SHL, dst, dst, fs_reg(16u));
>    emit(BRW_OPCODE_F32TO16, dst, x);

I missed to see that "the upper Word of the destination is untouched."
I'll make this change in v2.

> 
>> +void
>> +fs_visitor::emit_unpack_half_2x16_split_x(fs_reg dst, fs_reg src0)
>> +{
>> +   if (intel->gen < 7)
>> +      assert(!"unpackHalf2x16 should be lowered");
>> +
>> +   /* float dst; */
>> +   assert(dst.type == BRW_REGISTER_TYPE_F);
>> +
>> +   /* uint src0; */
>> +   assert(src0.type == BRW_REGISTER_TYPE_UD);
>> +
>> +   /* dst = f16to32(src0); */
>> +   emit(BRW_OPCODE_F16TO32, dst, src0);
>> +}
> 
> Drop these comments.

Will do.



More information about the mesa-dev mailing list