[Mesa-stable] [Mesa-dev] [PATCH 2/4] i965/vec4: Handle ir_triop_lrp on Gen4-5 as well.

Mon Feb 24 10:15:38 PST 2014

Kenneth Graunke <kenneth at whitecape.org> writes:

> When the vec4 backend encountered an ir_triop_lrp, it always emitted an
> actual LRP instruction, which only exists on Gen6+.  Gen4-5 used
> lower_instructions() to decompose ir_triop_lrp at the IR level.
>
> Since commit 8d37e9915a3b21 ("glsl: Optimize open-coded lrp into lrp."),
> we've had an bug where lower_instructions translates ir_triop_lrp into
> arithmetic, but opt_algebraic reassembles it back into a lrp.
>
> To avoid this ordering concern, just handle ir_triop_lrp in the backend.
> The FS backend already does this, so we may as well do likewise.
>
> Cc: "10.1" <mesa-stable at lists.freedesktop.org>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75253
> Signed-off-by: Kenneth Graunke <kenneth at whitecape.org>
> ---
>  src/mesa/drivers/dri/i965/brw_vec4.h           |  3 +++
>  src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 36 +++++++++++++++++++++-----
>  2 files changed, 32 insertions(+), 7 deletions(-)
>
> This patch fixes a regression from 10.0 to 10.1, and really needs to be
> cherry-picked before the final 10.1.0 release.
>
> Technically, it's the only one that needs to be cherry-picked, but I figured
> I may as well CC the whole series and leave it up to the stable maintainers.
>
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h b/src/mesa/drivers/dri/i965/brw_vec4.h
> index 6bd8b80..fb5c0a6 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4.h
> +++ b/src/mesa/drivers/dri/i965/brw_vec4.h
> @@ -506,6 +506,9 @@ public:
>  
>     void emit_minmax(uint32_t condmod, dst_reg dst, src_reg src0, src_reg src1);
>  
> +   void emit_lrp(const dst_reg &dst,
> +                 const src_reg &x, const src_reg &y, const src_reg &a);
> +
>     void emit_block_move(dst_reg *dst, src_reg *src,
>  			const struct glsl_type *type, uint32_t predicate);
>  
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
> index 95e0064..d4f1899 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
> @@ -1132,6 +1132,34 @@ vec4_visitor::emit_minmax(uint32_t conditionalmod, dst_reg dst,
>     }
>  }
>  
> +void
> +vec4_visitor::emit_lrp(const dst_reg &dst,
> +                       const src_reg &x, const src_reg &y, const src_reg &a)
> +{
> +   if (brw->gen >= 6) {
> +      /* Note that the instruction's argument order is reversed from GLSL
> +       * and the IR.
> +       */
> +      emit(LRP(dst,
> +               fix_3src_operand(a), fix_3src_operand(y), fix_3src_operand(x)));
> +   } else {
> +      /* Earlier generations don't support three source operations, so we
> +       * need to emit x*(1-a) + y*a.
> +       */
> +      dst_reg y_times_a           = dst_reg(this, glsl_type::vec4_type);
> +      dst_reg one_minus_a         = dst_reg(this, glsl_type::vec4_type);
> +      dst_reg x_times_one_minus_a = dst_reg(this, glsl_type::vec4_type);
> +      y_times_a.writemask           = dst.writemask;
> +      one_minus_a.writemask         = dst.writemask;
> +      x_times_one_minus_a.writemask = dst.writemask;
> +
> +      emit(MUL(y_times_a, y, a));
> +      emit(ADD(one_minus_a, negate(a), src_reg(1.0f)));
> +      emit(MUL(x_times_one_minus_a, x, src_reg(one_minus_a)));
> +      emit(ADD(dst, src_reg(x_times_one_minus_a), src_reg(y_times_a)));
> +   }
> +}

I think we would do better by emitting
ADD(y_minus_x, y, negate(x))
MAC(dst, x, y_minus_x, a)

Then gen4/5 get a win from the algebraic pass existing, like gen6+.
Other than that, I like the series.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 818 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/mesa-stable/attachments/20140224/96f4cccb/attachment.pgp>