[Mesa-dev] [PATCH 8/9] mesa: Add partial constant propagation pass for Mesa IR

Mon Aug 15 16:17:20 PDT 2011

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 08/15/2011 01:44 PM, Eric Anholt wrote:
> On Mon, 15 Aug 2011 12:02:42 -0700, "Ian Romanick" <idr at freedesktop.org> wrote:
>> From: Ian Romanick <ian.d.romanick at intel.com>
>>
>> This cleans up some code generated by the IR-to-Mesa pass for i915.
>> In particular, some shaders involving arrays of constant matrices
>> result in really bad code.
> 
> I'm curious what sort of constructs led to this being needed at this
> level but not at GLSL IR level.  I suspect that some of it (SEQ temp, a,
> a handling, for example) might be things that we should be doing in
> opt_algebraic and just failing to do.  Then one comment below.

So... I wrote that commit message back in February, so some of the
problems may have been fixed by then.  However, this pass does reduce
the OpenGL ES2 conformance test acos_float_frag_xvary from 109
instructions to 93.  Looking at the instruction diffs, it appears that a
fair amount of the constant folding opportunities derive from changes
made by earlier Mesa IR optimization passes (e.g., CMP simplification).

For example, this sequence:

 28: (expression bool all_equal (var_ref arr0) (constant int (0)) )
     SEQ TEMP[30].x, TEMP[23].xxxx, CONST[2].xxxx;
 29: (assign  (x) (var_ref if_to_cond_assign_then)  (expression bool
all_equal (var_ref arr0) (constant int (0)) ) )
     MOV TEMP[31], TEMP[30].xxxx;
 30: (assign (var_ref if_to_cond_assign_then)  (x) (var_ref a)
(array_ref (var_ref asinValues) (constant int (0)) ) )
     CMP TEMP[32], TEMP[30].-x-x-x-x, CONST[0].xxxx, TEMP[32];
 31: (expression float + (array_ref (var_ref asinValues) (constant int
(1)) ) (expression float neg (var_ref a) ) )
     ADD TEMP[34].x, CONST[0].yyyy, TEMP[32].-x-x-x-x;

becomes

  7: SEQ TEMP[0].x, TEMP[2].xxxx, CONST[2].xxxx;
  8: ADD TEMP[1].x, CONST[0].yyyy, CONST[0].-x-x-x-x;

and the constant folding eliminates the ADD.

We also emit sequences of DP, SEQ, SLT, etc. for some GLSL IR opcodes.
For example, ir_binop_any_equal(a,b) becomes:

	SNE	temp, a, b;
	DP4	temp, temp, temp;
	SLT	temp, -temp, 0.0;

Previous to the earlier part of this series, the SLT would have been a SEQ.

I hacked up the optimizer to dump which instructions are (or could be)
optimized.  Here are the results for a full piglit run (with ES2
conform).  I looked at the code that generated some of these, and it's
not clear how they could be optimized at the GLSL IR level.

ADD: 408
CMP: 260
DP2: 21
DP3: 20
DP4: 280
MAD: 4
MUL: 19
RCP: 6
SEQ: 37
SEQ (same register): 2
SGE: 3
SGE (same register): 2
SGT: 10
SGT (same register): 1
SLE (same register): 4
SLT: 177
SLT (same register): 2
SNE: 604
SNE (same register): 57
TRUNC: 2

In any case, I have another version of this patch coming.

>> diff --git a/src/mesa/program/prog_opt_constant_fold.c b/src/mesa/program/prog_opt_constant_fold.c
>> new file mode 100644
>> index 0000000..2acd4f35
>> --- /dev/null
>> +++ b/src/mesa/program/prog_opt_constant_fold.c
> 
>> +      case OPCODE_DP2:
>> +      case OPCODE_DP3:
>> +      case OPCODE_DP4:
>> +	 if (src_regs_are_constant(inst, 2)) {
>> +	    float a[4];
>> +	    float b[4];
>> +	    float result;
>> +
>> +	    get_value(prog, &inst->SrcReg[0], a);
>> +	    get_value(prog, &inst->SrcReg[1], b);
>> +
>> +	    result = (a[0] * b[0]) + (a[1] * b[1])
>> +	       + (a[2] * b[2]) + (a[3] * b[3]);
>> +
>> +	    inst->Opcode = OPCODE_MOV;
>> +	    inst->SrcReg[0] = src_reg_for_float(prog, result);
>> +	    memset(& inst->SrcReg[1], 0, sizeof(inst->SrcReg[1]));
>> +
>> +	    progress = true;
>> +	 }
>> +	 break;
> 
> This seems unlikely to be correct for DP2, DP3.

I think it happens to work because the swizzles of constants for these
opcodes put 0.0 in the unused slots.  That is pretty fragile, though.  I
can fix that.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

iEYEARECAAYFAk5JqQAACgkQX1gOwKyEAw+TZwCfVQJcPHFNQrrCJwMFm7pJa3RC
6pYAnRId3mh/6axlUvbfAbF7b6vhrDsU
=KaNc
-----END PGP SIGNATURE-----