[Mesa-dev] [PATCH] i965/vs/gen7: Emit code for GLSL ES 3.00 pack/unpack operations (v2)

Thu Jan 24 13:55:18 PST 2013

On 01/23/2013 07:18 PM, Eric Anholt wrote:
> Chad Versace <chad.versace at linux.intel.com> writes:
>> +void
>> +vec4_visitor::emit_unpack_half_2x16(dst_reg dst, src_reg src0)
>> +{
>> +   if (intel->gen < 7)
>> +      assert(!"ir_unop_unpack_half_2x16 should be lowered");
>> +
>> +   assert(dst.type == BRW_REGISTER_TYPE_F);
>> +   assert(src0.type == BRW_REGISTER_TYPE_UD);
>> +
>> +   /* From the Ivybridge PRM, Vol4, Part3, Section 6.26 f32to16:
>> +    *
>> +    *   Because this instruction does not have a 16-bit floating-point type,
>> +    *   the source data type must be Word (W). The destination type must be
>> +    *   F (Float).
>> +    *
>> +    * To use W as the source data type, we must adjust horizontal strides,
>> +    * which is only possible in align1 mode. All my [chadv] attempts at
>> +    * emitting align1 instructions for unpackHalf2x16 failed to pass the
>> +    * Piglit tests, so I gave up.
>> +    *
>> +    * I've verified that, on gen7, it is safe to emit f16to32 in align16 mode
>> +    * with UD as source data type.
>> +    */
> 
> Have you tested this on something like:
> 
> in uvec4 v;
> vec2 result = unpackHalf2x16(v.w);
> 
> Those kinds of "the type must be X and the stride must by Y" have
> sometimes meant that it's just hardcoded and they don't look at what you
> program, so I'm concerned that some of your regioning
> (swizzle/abs/neg/uniformness) will just get thrown out by the hardware.
> 
> But if it's passing on your tests with uniforms, it's probably OK.

In the brw code generated by my vs-packHafl2x16 test on IVB, the source to f32to16
is swizzled as yz. If I recall correctly, for my vs-unpackHalf2x16 test,
the source to f16to32 was also swizzled to the non-x channel. So I think
it's safe to say that this does the right thing.

>> +   dst_reg tmp_dst(this, glsl_type::uvec2_type);
>> +   src_reg tmp_src(tmp_dst);
>> +
>> +   /* tmp.x = src0 & 0xffffu; */
>> +   tmp_dst.writemask = WRITEMASK_X;
>> +   emit(new(mem_ctx) vec4_instruction(this, BRW_OPCODE_AND,
>> +                                      tmp_dst, src0, src_reg(0xffffu)));
> 
> These ought to use the helper functions for simplicity:
> "emit(AND(tmp_dst, src0, src_reg(0xffffu)));" Check out the ALU1 macro
> for how to set up one of those to have a similar helper for F16TO32 if
> you want to match up the style.

Will do.

FWIW, I'll also append the "I've experimentally the hardware does what I want
to it do" comments by stating that the simulator does it too without complaint.

>> +
>> +   /* tmp.y = src0 >> 16u; */
>> +   tmp_dst.writemask = WRITEMASK_Y;
>> +   emit(new(mem_ctx) vec4_instruction(this, BRW_OPCODE_SHR,
>> +                                      tmp_dst, src0, src_reg(16u)));
>> +
>> +   /* dst.xy = f16to32(tmp); */
>> +   dst.writemask = WRITEMASK_XY;
>> +   emit(new(mem_ctx) vec4_instruction(this, BRW_OPCODE_F16TO32,
>> +                                      dst, tmp_src));
>> +}