[Mesa-dev] [PATCH] i965/vs/gen7: Emit code for GLSL ES 3.00 pack/unpack operations (v2)
chad.versace at linux.intel.com
Thu Jan 24 13:55:18 PST 2013
On 01/23/2013 07:18 PM, Eric Anholt wrote:
> Chad Versace <chad.versace at linux.intel.com> writes:
>> +vec4_visitor::emit_unpack_half_2x16(dst_reg dst, src_reg src0)
>> + if (intel->gen < 7)
>> + assert(!"ir_unop_unpack_half_2x16 should be lowered");
>> + assert(dst.type == BRW_REGISTER_TYPE_F);
>> + assert(src0.type == BRW_REGISTER_TYPE_UD);
>> + /* From the Ivybridge PRM, Vol4, Part3, Section 6.26 f32to16:
>> + *
>> + * Because this instruction does not have a 16-bit floating-point type,
>> + * the source data type must be Word (W). The destination type must be
>> + * F (Float).
>> + *
>> + * To use W as the source data type, we must adjust horizontal strides,
>> + * which is only possible in align1 mode. All my [chadv] attempts at
>> + * emitting align1 instructions for unpackHalf2x16 failed to pass the
>> + * Piglit tests, so I gave up.
>> + *
>> + * I've verified that, on gen7, it is safe to emit f16to32 in align16 mode
>> + * with UD as source data type.
>> + */
> Have you tested this on something like:
> in uvec4 v;
> vec2 result = unpackHalf2x16(v.w);
> Those kinds of "the type must be X and the stride must by Y" have
> sometimes meant that it's just hardcoded and they don't look at what you
> program, so I'm concerned that some of your regioning
> (swizzle/abs/neg/uniformness) will just get thrown out by the hardware.
> But if it's passing on your tests with uniforms, it's probably OK.
In the brw code generated by my vs-packHafl2x16 test on IVB, the source to f32to16
is swizzled as yz. If I recall correctly, for my vs-unpackHalf2x16 test,
the source to f16to32 was also swizzled to the non-x channel. So I think
it's safe to say that this does the right thing.
>> + dst_reg tmp_dst(this, glsl_type::uvec2_type);
>> + src_reg tmp_src(tmp_dst);
>> + /* tmp.x = src0 & 0xffffu; */
>> + tmp_dst.writemask = WRITEMASK_X;
>> + emit(new(mem_ctx) vec4_instruction(this, BRW_OPCODE_AND,
>> + tmp_dst, src0, src_reg(0xffffu)));
> These ought to use the helper functions for simplicity:
> "emit(AND(tmp_dst, src0, src_reg(0xffffu)));" Check out the ALU1 macro
> for how to set up one of those to have a similar helper for F16TO32 if
> you want to match up the style.
FWIW, I'll also append the "I've experimentally the hardware does what I want
to it do" comments by stating that the simulator does it too without complaint.
>> + /* tmp.y = src0 >> 16u; */
>> + tmp_dst.writemask = WRITEMASK_Y;
>> + emit(new(mem_ctx) vec4_instruction(this, BRW_OPCODE_SHR,
>> + tmp_dst, src0, src_reg(16u)));
>> + /* dst.xy = f16to32(tmp); */
>> + dst.writemask = WRITEMASK_XY;
>> + emit(new(mem_ctx) vec4_instruction(this, BRW_OPCODE_F16TO32,
>> + dst, tmp_src));
More information about the mesa-dev