[Mesa-dev] [PATCH 8/9] mesa: Add partial constant propagation pass for Mesa IR
Eric Anholt
eric at anholt.net
Mon Aug 15 17:23:56 PDT 2011
On Mon, 15 Aug 2011 16:17:20 -0700, Ian Romanick <idr at freedesktop.org> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 08/15/2011 01:44 PM, Eric Anholt wrote:
> > On Mon, 15 Aug 2011 12:02:42 -0700, "Ian Romanick" <idr at freedesktop.org> wrote:
> >> From: Ian Romanick <ian.d.romanick at intel.com>
> >>
> >> This cleans up some code generated by the IR-to-Mesa pass for i915.
> >> In particular, some shaders involving arrays of constant matrices
> >> result in really bad code.
> >
> > I'm curious what sort of constructs led to this being needed at this
> > level but not at GLSL IR level. I suspect that some of it (SEQ temp, a,
> > a handling, for example) might be things that we should be doing in
> > opt_algebraic and just failing to do. Then one comment below.
>
> So... I wrote that commit message back in February, so some of the
> problems may have been fixed by then. However, this pass does reduce
> the OpenGL ES2 conformance test acos_float_frag_xvary from 109
> instructions to 93. Looking at the instruction diffs, it appears that a
> fair amount of the constant folding opportunities derive from changes
> made by earlier Mesa IR optimization passes (e.g., CMP simplification).
>
> For example, this sequence:
>
> 28: (expression bool all_equal (var_ref arr0) (constant int (0)) )
> SEQ TEMP[30].x, TEMP[23].xxxx, CONST[2].xxxx;
> 29: (assign (x) (var_ref if_to_cond_assign_then) (expression bool
> all_equal (var_ref arr0) (constant int (0)) ) )
> MOV TEMP[31], TEMP[30].xxxx;
> 30: (assign (var_ref if_to_cond_assign_then) (x) (var_ref a)
> (array_ref (var_ref asinValues) (constant int (0)) ) )
> CMP TEMP[32], TEMP[30].-x-x-x-x, CONST[0].xxxx, TEMP[32];
> 31: (expression float + (array_ref (var_ref asinValues) (constant int
> (1)) ) (expression float neg (var_ref a) ) )
> ADD TEMP[34].x, CONST[0].yyyy, TEMP[32].-x-x-x-x;
>
> becomes
>
> 7: SEQ TEMP[0].x, TEMP[2].xxxx, CONST[2].xxxx;
> 8: ADD TEMP[1].x, CONST[0].yyyy, CONST[0].-x-x-x-x;
If I'm following this right, it looks like this happened because the of
the undefined temporary value access optimization on Mesa IR? (CMP ADD
TEMP[32], TEMP[32] before TEMP[32] was initialized). I think our
decision on that was that we would like to get that into GLSL IR by
promoting conditional moves to undefined values into unconditional
moves. However, it wouldn't help this case unless we also had an array
splitting pass.
> and the constant folding eliminates the ADD.
>
> We also emit sequences of DP, SEQ, SLT, etc. for some GLSL IR opcodes.
> For example, ir_binop_any_equal(a,b) becomes:
>
> SNE temp, a, b;
> DP4 temp, temp, temp;
> SLT temp, -temp, 0.0;
>
> Previous to the earlier part of this series, the SLT would have been a SEQ.
>
> I hacked up the optimizer to dump which instructions are (or could be)
> optimized. Here are the results for a full piglit run (with ES2
> conform). I looked at the code that generated some of these, and it's
> not clear how they could be optimized at the GLSL IR level.
>
> ADD: 408
> CMP: 260
> DP2: 21
> DP3: 20
> DP4: 280
> MAD: 4
> MUL: 19
> RCP: 6
> SEQ: 37
> SEQ (same register): 2
> SGE: 3
> SGE (same register): 2
> SGT: 10
> SGT (same register): 1
> SLE (same register): 4
> SLT: 177
> SLT (same register): 2
> SNE: 604
> SNE (same register): 57
> TRUNC: 2
Very cool.
I'm wasn't objecting to this code, just want to figure out if there were
cases where it should have (also) been handled at a higher level so that
everyone wins. I can definitely understand codegen ending up with stuff
that needs optimization like this even if the upper level does it too --
see also the projective texturing optimization stuff in brw_fs.cpp, and
I anticipate much more as we get into real array access on 965.
> I think it happens to work because the swizzles of constants for these
> opcodes put 0.0 in the unused slots. That is pretty fragile, though. I
> can fix that.
We don't have swizzles of 0 in the Mesa IR we generate. We swizzle in
the last valid channel. My guess is that it's just hard to manage to
get a DP3 of constants to actually get produced where it isn't just used
for the saturated ir_unop_all_equals result or whatever.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20110815/6cbb4038/attachment.pgp>
More information about the mesa-dev
mailing list