[Mesa-stable] [Mesa-dev] [PATCH 2/4] i965/vec4: Handle ir_triop_lrp on Gen4-5 as well.
Eric Anholt
eric at anholt.net
Mon Feb 24 10:15:38 PST 2014
Kenneth Graunke <kenneth at whitecape.org> writes:
> When the vec4 backend encountered an ir_triop_lrp, it always emitted an
> actual LRP instruction, which only exists on Gen6+. Gen4-5 used
> lower_instructions() to decompose ir_triop_lrp at the IR level.
>
> Since commit 8d37e9915a3b21 ("glsl: Optimize open-coded lrp into lrp."),
> we've had an bug where lower_instructions translates ir_triop_lrp into
> arithmetic, but opt_algebraic reassembles it back into a lrp.
>
> To avoid this ordering concern, just handle ir_triop_lrp in the backend.
> The FS backend already does this, so we may as well do likewise.
>
> Cc: "10.1" <mesa-stable at lists.freedesktop.org>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75253
> Signed-off-by: Kenneth Graunke <kenneth at whitecape.org>
> ---
> src/mesa/drivers/dri/i965/brw_vec4.h | 3 +++
> src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 36 +++++++++++++++++++++-----
> 2 files changed, 32 insertions(+), 7 deletions(-)
>
> This patch fixes a regression from 10.0 to 10.1, and really needs to be
> cherry-picked before the final 10.1.0 release.
>
> Technically, it's the only one that needs to be cherry-picked, but I figured
> I may as well CC the whole series and leave it up to the stable maintainers.
>
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h b/src/mesa/drivers/dri/i965/brw_vec4.h
> index 6bd8b80..fb5c0a6 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4.h
> +++ b/src/mesa/drivers/dri/i965/brw_vec4.h
> @@ -506,6 +506,9 @@ public:
>
> void emit_minmax(uint32_t condmod, dst_reg dst, src_reg src0, src_reg src1);
>
> + void emit_lrp(const dst_reg &dst,
> + const src_reg &x, const src_reg &y, const src_reg &a);
> +
> void emit_block_move(dst_reg *dst, src_reg *src,
> const struct glsl_type *type, uint32_t predicate);
>
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
> index 95e0064..d4f1899 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
> @@ -1132,6 +1132,34 @@ vec4_visitor::emit_minmax(uint32_t conditionalmod, dst_reg dst,
> }
> }
>
> +void
> +vec4_visitor::emit_lrp(const dst_reg &dst,
> + const src_reg &x, const src_reg &y, const src_reg &a)
> +{
> + if (brw->gen >= 6) {
> + /* Note that the instruction's argument order is reversed from GLSL
> + * and the IR.
> + */
> + emit(LRP(dst,
> + fix_3src_operand(a), fix_3src_operand(y), fix_3src_operand(x)));
> + } else {
> + /* Earlier generations don't support three source operations, so we
> + * need to emit x*(1-a) + y*a.
> + */
> + dst_reg y_times_a = dst_reg(this, glsl_type::vec4_type);
> + dst_reg one_minus_a = dst_reg(this, glsl_type::vec4_type);
> + dst_reg x_times_one_minus_a = dst_reg(this, glsl_type::vec4_type);
> + y_times_a.writemask = dst.writemask;
> + one_minus_a.writemask = dst.writemask;
> + x_times_one_minus_a.writemask = dst.writemask;
> +
> + emit(MUL(y_times_a, y, a));
> + emit(ADD(one_minus_a, negate(a), src_reg(1.0f)));
> + emit(MUL(x_times_one_minus_a, x, src_reg(one_minus_a)));
> + emit(ADD(dst, src_reg(x_times_one_minus_a), src_reg(y_times_a)));
> + }
> +}
I think we would do better by emitting
ADD(y_minus_x, y, negate(x))
MAC(dst, x, y_minus_x, a)
Then gen4/5 get a win from the algebraic pass existing, like gen6+.
Other than that, I like the series.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 818 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/mesa-stable/attachments/20140224/96f4cccb/attachment.pgp>
More information about the mesa-stable
mailing list