[Mesa-dev] [PATCH] nv50/ir: remove dnz flag when converting MAD to ADD due to optimizations

Sun Nov 25 03:08:48 UTC 2018

yeah, sounds fine. I wasn't 100% sure what the dnz flag does, with the
addition below: Reviewed-by: Karol Herbst <kherbst at redhat.com>

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
index 307d8762506..202faf0746a 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
@@ -1094,6 +1094,7 @@ ConstantFolding::opnd(Instruction *i,
ImmediateValue &imm0, int s)
          if (imm0.isNegative())
             i->src(t).mod = i->src(t).mod ^ Modifier(NV50_IR_MOD_NEG);
          i->op = OP_ADD;
+         i->dnz = 0;
          i->setSrc(s, i->getSrc(t));
          i->src(s).mod = i->src(t).mod;
       } else

shader:
FRAG
PROPERTY FS_COORD_ORIGIN UPPER_LEFT
PROPERTY MUL_ZERO_WINS 1
DCL IN[0], COLOR, COLOR
DCL IN[1], TEXCOORD[0], PERSPECTIVE
DCL OUT[0], COLOR
DCL OUT[1], COLOR[1]
DCL CONST[0][0..129]
DCL TEMP[0..2]
IMM[0] FLT32 {   -0.0000,    -1.0000,     2.0000,    -0.5000}
 0: ADD TEMP[0].x, -CONST[0][112].yyyy, IN[1].wwww
 1: CMP TEMP[0], TEMP[0].xxxx, IMM[0].yyyy, IMM[0].xxxx
 2: KILL_IF TEMP[0]
 3: MUL TEMP[0].xyz, CONST[0][0], IN[0]
 4: MOV TEMP[0].w, IN[0].wwww
 5: MUL TEMP[1].xyz, TEMP[0], IMM[0].zzzz
 6: MUL OUT[0].w, TEMP[0].wwww, CONST[0][0].wwww
 7: MAD_SAT TEMP[0].w, IN[1].xxxx, CONST[0][128].xxxx, CONST[0][128].yyyy
 8: MUL TEMP[0].w, TEMP[0].wwww, CONST[0][129].wwww
 9: MOV TEMP[2].z, IMM[0].zzzz
10: MAD TEMP[0].xyz, TEMP[2].zzzz, -TEMP[0], CONST[0][129]
11: MAD OUT[0].xyz, TEMP[0].wwww, TEMP[0], TEMP[1]
12: MOV OUT[1], -IMM[0].wwwy
13: END

On Sun, Nov 25, 2018 at 3:58 AM Ilia Mirkin <imirkin at alum.mit.edu> wrote:
>
> dnz flag only applies for multiplications (e.g. to make 0 * Infinity
> becomes 0 instead of NaN). Once we optimize a MAD into an ADD, the dnz
> flag no longer makes sense, and upsets the GM107 emitter (since it looks
> at the ftz and dnz flags together).
>
> Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu>
> ---
>  src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
> index 04d26dcbf53..307d8762506 100644
> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
> @@ -740,6 +740,7 @@ ConstantFolding::expr(Instruction *i,
>        // restrictions, so move it into a separate LValue.
>        bld.setPosition(i, false);
>        i->op = OP_ADD;
> +      i->dnz = 0;
>        i->setSrc(1, bld.mkMov(bld.getSSA(type), i->getSrc(0), type)->getDef(0));
>        i->setSrc(0, i->getSrc(2));
>        i->src(0).mod = i->src(2).mod;
> @@ -1131,6 +1132,7 @@ ConstantFolding::opnd(Instruction *i, ImmediateValue &imm0, int s)
>           i->setSrc(1, i->getSrc(2));
>           i->src(1).mod = i->src(2).mod;
>           i->setSrc(2, NULL);
> +         i->dnz = 0;
>           i->op = OP_ADD;
>        } else
>        if (!isFloatType(i->dType) && !i->subOp && !i->src(t).mod && !i->src(2).mod) {
> --
> 2.18.1
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev