[Mesa-dev] [PATCH 2/2] i965: Change vec4_visitor::emit_lrp to use MAC for gen < 6

Fri Mar 14 02:37:07 PDT 2014

On 03/10/2014 07:59 AM, Juha-Pekka Heikkila wrote:
>

I might add this to the commit message:

This allows us to emit ADD/MUL/MAC instead of MUL/ADD/MUL/ADD, saving
one instruction and two temporary registers.

> Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila at gmail.com>
> ---
>  src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 26 +++++++-------------------
>  1 file changed, 7 insertions(+), 19 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
> index dc58457..4e4ab6e 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
> @@ -1160,26 +1160,14 @@ vec4_visitor::emit_lrp(const dst_reg &dst,
>        emit(LRP(dst,
>                 fix_3src_operand(a), fix_3src_operand(y), fix_3src_operand(x)));
>     } else {
> -      /* Earlier generations don't support three source operations, so we
> -       * need to emit x*(1-a) + y*a.
> -       *
> -       * A better way to do this would be:
> -       *    ADD one_minus_a, negate(a), 1.0f
> -       *    MUL null, y, a
> -       *    MAC dst, x, one_minus_a
> -       * but we would need to support MAC and implicit accumulator.
> -       */
> -      dst_reg y_times_a           = dst_reg(this, glsl_type::vec4_type);
> -      dst_reg one_minus_a         = dst_reg(this, glsl_type::vec4_type);
> -      dst_reg x_times_one_minus_a = dst_reg(this, glsl_type::vec4_type);
> -      y_times_a.writemask           = dst.writemask;
> -      one_minus_a.writemask         = dst.writemask;
> -      x_times_one_minus_a.writemask = dst.writemask;
> -
> -      emit(MUL(y_times_a, y, a));
> +      dst_reg one_minus_a   = dst_reg(this, glsl_type::vec4_type);
> +      one_minus_a.writemask = dst.writemask;
> +
> +      struct brw_reg acc = retype(brw_acc_reg(), dst.type);
> +
>        emit(ADD(one_minus_a, negate(a), src_reg(1.0f)));
> -      emit(MUL(x_times_one_minus_a, x, src_reg(one_minus_a)));
> -      emit(ADD(dst, src_reg(x_times_one_minus_a), src_reg(y_times_a)));
> +      emit(MUL(acc, y, a));
> +      emit(MAC(dst, x, src_reg(one_minus_a)));
>     }
>  }

I think the intention was to do:

vec4_instruction *mul = emit(MUL(dst_null_f(), y, a));
mul->writes_accumulator = true;

but (and I feel really stupid now), I think your code using the
accumulator as an explicit destination will work just fine, too.

I'm not sure if there's any advantage to doing it one way or the other.
 Matt, any thoughts?

Either way, this patch is:
Reviewed-by: Kenneth Graunke <kenneth at whitecape.org>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: OpenPGP digital signature
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20140314/b662c1bc/attachment-0001.pgp>