[Mesa-dev] odd translation from glsl to tgsi for ir_unop_any_nequal

Wed May 7 19:55:27 PDT 2014

On Wed, May 7, 2014 at 8:38 PM, Ilia Mirkin <imirkin at alum.mit.edu> wrote:
> So... this shader (from
> generated_tests/spec/glsl-1.10/execution/built-in-functions/fs-op-eq-mat2-mat2.shader_test):
>
> uniform mat2 arg0;
> uniform mat2 arg1;
>
> void main()
> {
>   bool result = (arg0 == arg1);
>   gl_FragColor = vec4(result, 0.0, 0.0, 0.0);
> }
>
> Which becomes the following IR:
>
> (
> (declare (shader_out ) vec4 gl_FragColor)
> (declare (temporary ) vec4 gl_FragColor)
> (declare (uniform ) mat2 arg0)
> (declare (uniform ) mat2 arg1)
> (function main
>   (signature void
>     (parameters
>     )
>     (
>       (declare (temporary ) vec4 vec_ctor)
>       (assign  (yzw) (var_ref vec_ctor)  (constant vec3 (0.0 0.0 0.0)) )
>       (declare (temporary ) bvec2 mat_cmp_bvec)
>       (assign  (x) (var_ref mat_cmp_bvec)  (expression bool any_nequal
> (array_ref (var_ref arg1) (constant int (0)) ) (array_ref (var_ref
> arg0) (constant int (0)) ) ) )
>       (assign  (y) (var_ref mat_cmp_bvec)  (expression bool any_nequal
> (array_ref (var_ref arg1) (constant int (1)) ) (array_ref (var_ref
> arg0) (constant int (1)) ) ) )
>       (assign  (x) (var_ref vec_ctor)  (expression float b2f
> (expression bool ! (expression bool any (var_ref mat_cmp_bvec) ) ) ) )
>       (assign  (xyzw) (var_ref gl_FragColor)  (var_ref vec_ctor) )
>       (assign  (xyzw) (var_ref gl_FragColor at 4)  (var_ref gl_FragColor) )
>     ))
>
> )
>
>
> When converted to TGS becomes:
>
> FRAG
> PROPERTY FS_COLOR0_WRITES_ALL_CBUFS 1
> DCL OUT[0], COLOR
> DCL CONST[0..3]
> DCL TEMP[0..2], LOCAL
> IMM[0] FLT32 {    0.0000,     1.0000,     0.0000,     0.0000}
> IMM[1] INT32 {0, 0, 0, 0}
>   0: MOV TEMP[0].yzw, IMM[0].xxxx
>   1: FSNE TEMP[1].xy, CONST[2].xyyy, CONST[0].xyyy
>   2: OR TEMP[1].x, TEMP[1].xxxx, TEMP[1].yyyy
>   3: FSNE TEMP[2].xy, CONST[3].xyyy, CONST[1].xyyy
>   4: OR TEMP[2].x, TEMP[2].xxxx, TEMP[2].yyyy
>   5: MOV TEMP[1].y, TEMP[2].xxxx
>   6: DP2 TEMP[1].x, TEMP[1].xyyy, TEMP[1].xyyy
>   7: USNE TEMP[1].x, TEMP[1].xxxx, IMM[1].xxxx
>   8: NOT TEMP[1].x, TEMP[1].xxxx
>   9: AND TEMP[0].x, TEMP[1].xxxx, IMM[0].yyyy
>  10: MOV OUT[0], TEMP[0]
>  11: END
>
> Note that FSNE/OR are used, implying that the integer version of these
> is expected. However then it goes on to use DP2, which, as I
> understand, does a floating point multiply + add. Now, this _happens_
> to work out, since the integer representations of float 0 and int 0
> are the same, and those are really the only possilibities we care
> about.
>
> However this seems really dodgy... wouldn't it be clearer to use
> either SNE + OR (which would still work!) + DP2, or alternatively AND
> them all together instead of SNE/DP2? This seems to come in via
> ir_unop_any_nequal. IMO the latter would be better since it keeps

Erm, sorry -- the email subject and this sentence isn't _quite_
accurate. That should be ir_unop_any. ir_binop_any_nequal is what
generates the FSNE/OR' combos. But everything else still holds :)

> things in integer space, and presumably AND's are cheaper than
> fmul/fadd.
>
> I noticed this because nouveau's codegen logic isn't able to optimize
> this intelligently and I was trying to figure out why.
>
> Thoughts?
>
>   -ilia