[Mesa-dev] [PATCH (gles3) 19/20] i965/vs/gen7: Emit code for GLSL ES 3.00 pack/unpack operations
Eric Anholt
eric at anholt.net
Mon Jan 21 13:31:16 PST 2013
Chad Versace <chad.versace at linux.intel.com> writes:
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
> index ebf8990..b5f1aae 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
> @@ -348,6 +348,143 @@ vec4_visitor::emit_math(enum opcode opcode,
> }
>
> void
> +vec4_visitor::emit_pack_half_2x16(dst_reg dst, src_reg src0)
> +{
> + if (intel->gen < 7)
> + assert(!"ir_unop_pack_half_2x16 should be lowered");
> +
> + /* uint dst; */
> + assert(dst.type == BRW_REGISTER_TYPE_UD);
> +
> + /* vec2 src0; */
> + assert(src0.type == BRW_REGISTER_TYPE_F);
> +
> + /* uvec2 tmp;
> + *
> + * The PRM lists the destination type of f32to16 as W. However, I've
> + * experimentally confirmed on gen7 that it must be a 32-bit size, such as
> + * UD, in align16 mode.
> + */
> + dst_reg tmp_dst(this, glsl_type::uvec2_type);
> + src_reg tmp_src(tmp_dst);
> +
> + /* tmp.xy = f32to16(src0); */
> + tmp_dst.writemask = WRITEMASK_XY;
> + emit(new(mem_ctx) vec4_instruction(this, BRW_OPCODE_F32TO16,
> + tmp_dst, src0));
> +
> + /* The result's high 16 bits are in the low 16 bits of the temporary
> + * register's Y channel. The result's low 16 bits are in the low 16 bits
> + * of the X channel.
> + *
> + * In experiments on gen7 I've found the that, in the temporary register,
> + * the hight 16 bits of the X and Y channels are zeros. This is critical
"high"
> + * for the SHL and OR instructions below to work as expected.
> + */
The docs say that the high bits are unchanged. The temporary reg will
often have already had 0 in it to begin with, but sometimes not. Have
you confirmed that the high bits of the x channel were changed to 0 if
you had initialized them to non-zero?
> + /* Idea for reducing the above number of registers and instructions
> + * ----------------------------------------------------------------
> + *
> + * It should be possible to remove the temporary register and replace the
> + * SHL and OR instructions above with a single MOV instruction mode in
> + * align1 mode that uses clever register region addressing. (It is
> + * impossible to specify the necessary register regions in align16 mode).
> + * Unfortunately, it is difficult to emit an align1 instruction here.
> + *
> + * In particular, I want to do this:
> + *
> + * # Give dst the form:
> + * #
> + * # w z y x w z y x
> + * # |0|0|0x0000hhhh|0x0000llll|0|0|0x0000hhhh|0x0000llll|
> + * #
> + * f32to16(8) dst<1>.xy:UD src<4;4,1>:F {align16}
> + *
> + * # Transform dst into the form of packHalf2x16's output.
> + * #
> + * # w z y x w z y x
> + * # |0|0|0x00000000|0xhhhhllll|0|0|0x00000000|0xhhhhllll|
> + * #
> + * # Use width=2 in order to move the Y channel's high 16 bits
> + * # into the low 16 bits, thus clearing the Y channel to zero.
> + * #
> + * mov(4) dst.1<1>:UW dst.2<8;2,1>:UW {align1}
> + */
I like the sound of this, and it would be a matter of making a new
VS_OPCODE that the generator implements.
> +}
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20130121/00652f40/attachment.pgp>
More information about the mesa-dev
mailing list