[Mesa-dev] Mesa (shader-work): glsl: introduce ir_binop_all_equal and ir_binop_any_equal, allow vector cmps

Wed Sep 8 11:26:49 PDT 2010

On Wed, 8 Sep 2010 18:57:46 +0200, Luca Barbieri <luca at luca-barbieri.com> wrote:
> S I'd been wanting to do this.  Only, I was thinking that instead of
> > adding an ir_binop_all_equal and ir_binop_any_equal, those would just be
> > expressed as not(any(nequal())) and any(equal()).  And I say that as
> > probably one of the few that has a backend that wants to recognize
> > all_equal.  What do you think?
> 
> I think it makes sense: by the way, ir_to_mesa emits the current
> nequal, or my new any_nequal, exactly as it does emit any(equal()).
> 
> Of course, what ir_to_mesa does is not really optimal, because it
> should use predicates/condition codes, which are however badly
> supported everywhere.
> 
> In general if(any(nequal(a, b))) should become, in pseudo-code,
> assuming a vector predicate register,
> 
> SNE_update_pred NONE, a, b
> IFC pred.xyzw:
> 
> and certainly not anything using DP4 for optimal performance on
> hardware that has predicates like nv30/nv40.
> 
> Modern/scalar hardware would probably prefer that representation too,
> since unlike DP4 it can be readily scalarized.

As far as scalar hardware, right now in the 965 FS backend for:
	if (any(lessThan(args, vec4(3.0))))
		gl_FragColor = vec4(0.0, 1.0, 0.0, 0.0);
	else
		gl_FragColor = vec4(1.0, 0.0, 0.0, 0.0);

I'm getting:

   (expression bool || (swiz w (expression bool < (swiz w (var_ref args at 0x8753010) )(constant float (3.000000)) ) )(expression bool || (swiz z (expression bool < (swiz z (var_ref args at 0x8753010) )(constant float (3.000000)) ) )(expression bool || (expression bool < (swiz x (var_ref args at 0x8753010) )(constant float (3.000000)) ) (swiz y (expression bool < (swiz y (var_ref args at 0x8753010) )(constant float (3.000000)) ) )) ) ) 
mov(8)          g19<1>F         g3.3<0,1,0>F                    { align1 };
mov(8)          g20<1>F         3F                              { align1 };
cmp.l(8)        g21<1>D         g19<8,8,1>F     g20<8,8,1>F     { align1 };
and(8)          g21<1>D         g21<8,8,1>D     1D              { align1 };
mov(8)          g22<1>D         g24<8,8,1>D                     { align1 };
mov(8)          g23<1>F         g3.2<0,1,0>F                    { align1 };
mov(8)          g24<1>F         3F                              { align1 };
cmp.l(8)        g25<1>D         g23<8,8,1>F     g24<8,8,1>F     { align1 };
and(8)          g25<1>D         g25<8,8,1>D     1D              { align1 };
mov(8)          g26<1>D         g27<8,8,1>D                     { align1 };
mov(8)          g27<1>F         g3<0,1,0>F                      { align1 };
mov(8)          g28<1>F         3F                              { align1 };
cmp.l(8)        g29<1>D         g27<8,8,1>F     g28<8,8,1>F     { align1 };
and(8)          g29<1>D         g29<8,8,1>D     1D              { align1 };
mov(8)          g30<1>F         g3.1<0,1,0>F                    { align1 };
mov(8)          g31<1>F         3F                              { align1 };
cmp.l(8)        g32<1>D         g30<8,8,1>F     g31<8,8,1>F     { align1 };
and(8)          g32<1>D         g32<8,8,1>D     1D              { align1 };
mov(8)          g33<1>D         g33<8,8,1>D                     { align1 };
or(8)           g34<1>D         g29<8,8,1>D     g33<8,8,1>D     { align1 };
or(8)           g35<1>D         g26<8,8,1>D     g34<8,8,1>D     { align1 };
or(8)           g36<1>D         g22<8,8,1>D     g35<8,8,1>D     { align1 };
mov.ne(8)       null            g36<8,8,1>D                     { align1 };
(+f0) if(8)                     ip              18D             { align1 switch };

[...]

So the current implementation of if(any()) is looking pretty OK ("or or
or mov.ne if"), though there's a register dependency that could be
eliminated with a bit of juggling, and we could move the cond update up
to the last or.  Also, not sure whether to make bools be 0,1 immediately
at cmp time, or to do that at b2f/b2i time.  (ignoring gratuitous moves
in that code that will get eliminated later, and obviously register
allocation isn't done.)

If you're working on a driver for a scalar chip, you might want to pull
brw_fs_channel_expressions and brw_fs_vector_splitting up and get them
used -- it should make sensible codegen a lot easier for them.

Hmm, that dependency thing might be nice in general.  Take the
or(or(or(a,b),c),d) and make or(or(a,b), or(c,d)) when you've got an
associating operator (probably make sure component counts are equal, or
the type changes can be painful).
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20100908/12336fcc/attachment.pgp>