[Mesa-dev] [RFC] GL fixed function fragment shaders

Jakob Bornecrantz wallbraker at gmail.com
Fri Mar 18 13:31:33 PDT 2011


On Mon, Jan 17, 2011 at 10:40 PM, Eric Anholt <eric at anholt.net> wrote:
> On Thu, 13 Jan 2011 17:40:39 +0100, Roland Scheidegger <sroland at vmware.com> wrote:
>> Am 12.01.2011 23:04, schrieb Eric Anholt:
>> > This is a work-in-progress patch series to switch texenvprogram.c from
>> > generating ARB_fp style Mesa IR to generating GLSL IR as its product.
>> > For drivers without native GLSL codegen, that is then turned into the
>> > Mesa IR that can be consumed.  However, for 965 we don't use the Mesa
>> > IR product and just use the GLSL output, producing much better code
>> > thanks to the new backend.  This is part of a long term goal to get
>> > Mesa drivers off of Mesa IR and producing their instruction stream
>> > directly from the GLSL IR.
>> >
>> > I'm not planning on committing this series immediately, as I've still
>> > got a regression in the 965 driver with texrect-many on the last
>> > commit.
>> >
>> > As a comparison, here's one of the shaders from openarena before:
>>
>> So what's the code looking like after conversion to mesa IR? As long
>> as
>

[SNIP]

>
> So, there's one extra Mesa IR move added where we could compute into the
> destination reg but don't.  This is a general problem with
> ir_to_mesa.cpp that affects GLSL pretty badly.

I found pretty much the same thing when looking into tunnel:

# Fragment Program/Shader 0
 0: TXP TEMP[0], INPUT[4].xyyw, texture[0], 2D;
 1: MUL TEMP[1].xyz, TEMP[0], INPUT[1];
 2: MOV TEMP[0].xyz, TEMP[1].xyzx;
 3: MOV TEMP[0].w, INPUT[1].wwww;
 4: MOV TEMP[2], TEMP[0];
 5: MUL TEMP[0].x, INPUT[3].xxxx, STATE[1].wwww;
 6: MUL TEMP[3].x, TEMP[0].xxxx, TEMP[0].xxxx;
 7: EX2 TEMP[0].x, TEMP[3].-x-x-x-x;
 8: MOV_SAT TEMP[3].x, TEMP[0].xxxx;
 9: ADD TEMP[0].x, CONST[4].xxxx, TEMP[3].-x-x-x-x;
10: MUL TEMP[4].xyz, STATE[2].xyzz, TEMP[0].xxxx;
11: MAD TEMP[2].xyz, TEMP[1].xyzx, TEMP[3].xxxx, TEMP[4].xyzx;
12: MOV OUTPUT[2], TEMP[2];
13: END

# Fragment Program/Shader 0
 0: TXP TEMP[0], INPUT[4], texture[0], 2D;
 1: MUL_SAT TEMP[1].xyz, TEMP[0], INPUT[1];
 2: MOV_SAT TEMP[1].w, INPUT[1];
 3: MUL TEMP[2].x, STATE[0].wwww, INPUT[3].xxxx;
 4: MUL TEMP[2].x, TEMP[2].xxxx, TEMP[2].xxxx;
 5: EX2_SAT TEMP[2].x, TEMP[2].-x-x-x-x;
 6: LRP OUTPUT[2].xyz, TEMP[2].xxxx, TEMP[1], STATE[1];
 7: MOV OUTPUT[2].w, TEMP[1];
 8: END

I got similar results, tho the effects are more visible here. Also
note that the new shader uses 5 temps compared to 3. The FF setup I
think only uses fog (or one texenv modulate) so its not just hard to
program texenv that gets effect by this change.

Now looking at how this is generated, the new code seems to generate
it quite similarly to the old. After that tho things gets interesting,
after the generation step the old code is now done and is on the
already optimized form you see above. The new code however is far from
done. Going through it first go through various common GLSL IR
optimizations steps (from the attached text file, the second shader
and third shader in the file both are the same just with and without
the inlining of GLSL IR). Finally it calls _mesa_optimize_program
which gets it to its current form.

As for the code itself, it doesn't look as bad as I thought it would,
there are a lot of allocations, a fair bit of extra typing tho loc
count in the commit stays about the same even less, the reason behind
that is that texenv has its own implementation of ureg. Not counting
that a conversion to GLSL IR would instead add extra locs.

>
> Of course, talking about optimality of Mesa IR is kind of a joke, as for
> the drivers that directly consume it (i915, 965 VS, r200, and I'm
> discounting r300+ as they have their own IR that Mesa IR gets translated
> to and actually optimized), we miss huge opportunities to reduce
> instruction count due to swizzle sources including -1, 0, 1 as options
> but Mesa IR not taking advantage of it.  If we were doing that right,
> then the other MOV-reduction pass would hit and that extra move just
> added here would go away, resulting in a net win.

This could be done with any of the IR's (provided numeric swizzling is
added) and something that I have been thinking about adding to TGSI.
As pretty much all hw supports it natively (exception being svga).

>
> Similarly, we add an extra indirection phase according to 915's
> accounting of those on the second shader, but the fact that we don't
> schedule those in our GLSL output anyway is a big issue for GLSL on
> hardware with indirection limits.
>
>> it's not worse than the original I guess this should be ok, though for
>> those drivers consuming mesa IR I guess it's just more cpu time without
>> any real benefit?
>
> Assuming that the setup the app did was already optimal for a
> programmable GPU, yes.  But I suspect that isn't generally the case --
> while OA has reasonable looking fixed function setup (other than Mesa IR
> we produce not using the swizzles), given how painful it is to program
> using texenv I suspect there are a lot of "suboptimal" shader setups out
> there that we could actually improve.

You posted some GLSL IR cpu optimizations patches after pushing this
code and only the delta between pre and post optimizations. What is
the delta for the old MesaIR code and GLSL IR code, if you didn't do
any testing can you give an estimate? We seem to be doing a lot more
cpu crunching for worse results.

>> For gallium we should probably address this some way
>> or another, it seems quite backward to do ff->glsl->mesa ir->tgsi.
>
> I'm surprised you guys haven't forked off ir_to_mesa.cpp to something
> that produces TGSI, since you seem to prefer it as the thing for drivers
> to consume over GLSL IR.  At least with sized variables, you could then
> adapt the Mesa IR optimization passes on TGSI so that they wouldn't all
> be disabled whenever relative addressing occurred.  I'm only interested
> in Mesa IR for hardware that doesn't have relative addressing of temps,
> so it's not really an issue to me.

While a ir_to_tgsi is needed, I'm a quite worried that the old
_mesa_optimize_program was needed at all to even get it close to
comparable output.

Cheers Jakob.
-------------- next part --------------
GLSL IR for linked fragment program 0:
(
(declare (uniform ) sampler2D sampler_0 at 0x23c7fd0)
(declare (out ) vec4 gl_FragColor at 0x23cbde0)
(declare (in ) vec4 gl_Color at 0x23cbf00)
(declare (in ) float gl_FogFragCoord at 0x23cc020)
(declare (uniform ) vec4 gl_MESAFogParamsOptimized at 0x23ccaa0)
(declare (uniform ) gl_FogParameters gl_Fog at 0x23ce9f0)
(declare (in ) (array vec4 1) gl_TexCoord at 0x23cea80)
(function main
  (signature void
    (parameters
    )
    (
      (declare (temporary ) vec4 texenv_combine at 0x23ceeb0)
      (assign  (xyz) (var_ref texenv_combine at 0x23ceeb0)  (swiz xyz (expression vec4 * (tex (var_ref sampler_0 at 0x23c7fd0)  (swiz xy (array_ref (var_ref gl_TexCoord at 0x23cea80) (constant uint (0)) ) ) 0 (swiz w (array_ref (var_ref gl_TexCoord at 0x23cea80) (constant uint (0)) ) ) () )(var_ref gl_Color at 0x23cbf00) ) )) 
      (assign  (w) (var_ref texenv_combine at 0x23ceeb0)  (swiz w (var_ref gl_Color at 0x23cbf00) )) 
      (declare () vec4 fog_result at 0x23cf2f0)
      (assign  (xyzw) (var_ref fog_result at 0x23cf2f0)  (var_ref texenv_combine at 0x23ceeb0) ) 
      (declare () float fog_factor at 0x23cf400)
      (declare () float fog_temp at 0x23cf490)
      (assign  (x) (var_ref fog_temp at 0x23cf490)  (expression float * (var_ref gl_FogFragCoord at 0x23cc020) (swiz w (var_ref gl_MESAFogParamsOptimized at 0x23ccaa0) )) ) 
      (assign  (x) (var_ref fog_factor at 0x23cf400)  (expression float max (expression float min (expression float exp2 (expression float neg (expression float * (var_ref fog_temp at 0x23cf490) (var_ref fog_temp at 0x23cf490) ) ) ) (constant float (1.000000)) ) (constant float (0.000000)) ) ) 
      (assign  (xyz) (var_ref fog_result at 0x23cf2f0)  (expression vec3 + (expression vec3 * (swiz xyz (record_ref (var_ref gl_Fog at 0x23ce9f0)  color) )(expression float + (constant float (1.000000)) (expression float neg (var_ref fog_factor at 0x23cf400) ) ) ) (expression vec3 * (swiz xyz (var_ref texenv_combine at 0x23ceeb0) )(var_ref fog_factor at 0x23cf400) ) ) ) 
      (assign  (xyzw) (var_ref gl_FragColor at 0x23cbde0)  (var_ref fog_result at 0x23cf2f0) ) 
    ))

)


)

Mesa IR for linked fragment program 0:
  0: (declare (uniform ) gl_FogParameters gl_Fog at 0x23ce9f0)
     MOV TEMP[1], STATE[2];
  1: MOV TEMP[2], STATE[3].xxxx;
  2: MOV TEMP[3], STATE[3].yyyy;
  3: MOV TEMP[4], STATE[3].zzzz;
  4: MOV TEMP[5], STATE[3].wwww;
  5: (tex (var_ref sampler_0 at 0x23c7fd0)  (swiz xy (array_ref (var_ref gl_TexCoord at 0x23cea80) (constant uint (0)) ) ) 0 (swiz w (array_ref (var_ref gl_TexCoord at 0x23cea80) (constant uint (0)) ) ) () )
     MOV TEMP[6], INPUT[4].xyyy;
  6: MOV TEMP[6].w, INPUT[4].wwww;
  7: TXP TEMP[7], INPUT[4].xyyw, texture[0], 2D;
  8: (expression vec4 * (tex (var_ref sampler_0 at 0x23c7fd0)  (swiz xy (array_ref (var_ref gl_TexCoord at 0x23cea80) (constant uint (0)) ) ) 0 (swiz w (array_ref (var_ref gl_TexCoord at 0x23cea80) (constant uint (0)) ) ) () )(var_ref gl_Color at 0x23cbf00) ) 
     MUL TEMP[8], TEMP[7], INPUT[1];
  9: (assign  (xyz) (var_ref texenv_combine at 0x23ceeb0)  (swiz xyz (expression vec4 * (tex (var_ref sampler_0 at 0x23c7fd0)  (swiz xy (array_ref (var_ref gl_TexCoord at 0x23cea80) (constant uint (0)) ) ) 0 (swiz w (array_ref (var_ref gl_TexCoord at 0x23cea80) (constant uint (0)) ) ) () )(var_ref gl_Color at 0x23cbf00) ) )) 
     MOV TEMP[9].xyz, TEMP[8].xyzx;
 10: (assign  (w) (var_ref texenv_combine at 0x23ceeb0)  (swiz w (var_ref gl_Color at 0x23cbf00) )) 
     MOV TEMP[9].w, INPUT[1].wwww;
 11: (assign  (xyzw) (var_ref fog_result at 0x23cf2f0)  (var_ref texenv_combine at 0x23ceeb0) ) 
     MOV TEMP[10], TEMP[9];
 12: (expression float * (var_ref gl_FogFragCoord at 0x23cc020) (swiz w (var_ref gl_MESAFogParamsOptimized at 0x23ccaa0) )) 
     MUL TEMP[11].x, INPUT[3].xxxx, STATE[1].wwww;
 13: (assign  (x) (var_ref fog_temp at 0x23cf490)  (expression float * (var_ref gl_FogFragCoord at 0x23cc020) (swiz w (var_ref gl_MESAFogParamsOptimized at 0x23ccaa0) )) ) 
     MOV TEMP[12], TEMP[11].xxxx;
 14: (expression float * (var_ref fog_temp at 0x23cf490) (var_ref fog_temp at 0x23cf490) ) 
     MUL TEMP[13].x, TEMP[11].xxxx, TEMP[11].xxxx;
 15: (expression float exp2 (expression float neg (expression float * (var_ref fog_temp at 0x23cf490) (var_ref fog_temp at 0x23cf490) ) ) ) 
     EX2 TEMP[15].x, TEMP[13].-x-x-x-x;
 16: (expression float max (expression float min (expression float exp2 (expression float neg (expression float * (var_ref fog_temp at 0x23cf490) (var_ref fog_temp at 0x23cf490) ) ) ) (constant float (1.000000)) ) (constant float (0.000000)) ) 
     MOV_SAT TEMP[16], TEMP[15].xxxx;
 17: (assign  (x) (var_ref fog_factor at 0x23cf400)  (expression float max (expression float min (expression float exp2 (expression float neg (expression float * (var_ref fog_temp at 0x23cf490) (var_ref fog_temp at 0x23cf490) ) ) ) (constant float (1.000000)) ) (constant float (0.000000)) ) ) 
     MOV TEMP[17], TEMP[16].xxxx;
 18: (expression float + (constant float (1.000000)) (expression float neg (var_ref fog_factor at 0x23cf400) ) ) 
     ADD TEMP[19].x, CONST[4].xxxx, TEMP[16].-x-x-x-x;
 19: (expression vec3 * (swiz xyz (record_ref (var_ref gl_Fog at 0x23ce9f0)  color) )(expression float + (constant float (1.000000)) (expression float neg (var_ref fog_factor at 0x23cf400) ) ) ) 
     MUL TEMP[20].xyz, STATE[2].xyzz, TEMP[19].xxxx;
 20: (expression vec3 + (expression vec3 * (swiz xyz (record_ref (var_ref gl_Fog at 0x23ce9f0)  color) )(expression float + (constant float (1.000000)) (expression float neg (var_ref fog_factor at 0x23cf400) ) ) ) (expression vec3 * (swiz xyz (var_ref texenv_combine at 0x23ceeb0) )(var_ref fog_factor at 0x23cf400) ) ) 
     MAD TEMP[21], TEMP[8].xyzz, TEMP[16].xxxx, TEMP[20].xyzz;
 21: (assign  (xyz) (var_ref fog_result at 0x23cf2f0)  (expression vec3 + (expression vec3 * (swiz xyz (record_ref (var_ref gl_Fog at 0x23ce9f0)  color) )(expression float + (constant float (1.000000)) (expression float neg (var_ref fog_factor at 0x23cf400) ) ) ) (expression vec3 * (swiz xyz (var_ref texenv_combine at 0x23ceeb0) )(var_ref fog_factor at 0x23cf400) ) ) ) 
     MOV TEMP[10].xyz, TEMP[21].xyzx;
 22: (assign  (xyzw) (var_ref gl_FragColor at 0x23cbde0)  (var_ref fog_result at 0x23cf2f0) ) 
     MOV OUTPUT[2], TEMP[10];
 23: END

Mesa IR pre Mesa IR optimizations
# Fragment Program/Shader 0
  0: MOV TEMP[1], STATE[2];
  1: MOV TEMP[2], STATE[3].xxxx;
  2: MOV TEMP[3], STATE[3].yyyy;
  3: MOV TEMP[4], STATE[3].zzzz;
  4: MOV TEMP[5], STATE[3].wwww;
  5: MOV TEMP[6], INPUT[4].xyyy;
  6: MOV TEMP[6].w, INPUT[4].wwww;
  7: TXP TEMP[7], INPUT[4].xyyw, texture[0], 2D;
  8: MUL TEMP[8], TEMP[7], INPUT[1];
  9: MOV TEMP[9].xyz, TEMP[8].xyzx;
 10: MOV TEMP[9].w, INPUT[1].wwww;
 11: MOV TEMP[10], TEMP[9];
 12: MUL TEMP[11].x, INPUT[3].xxxx, STATE[1].wwww;
 13: MOV TEMP[12], TEMP[11].xxxx;
 14: MUL TEMP[13].x, TEMP[11].xxxx, TEMP[11].xxxx;
 15: EX2 TEMP[15].x, TEMP[13].-x-x-x-x;
 16: MOV_SAT TEMP[16], TEMP[15].xxxx;
 17: MOV TEMP[17], TEMP[16].xxxx;
 18: ADD TEMP[19].x, CONST[4].xxxx, TEMP[16].-x-x-x-x;
 19: MUL TEMP[20].xyz, STATE[2].xyzz, TEMP[19].xxxx;
 20: MAD TEMP[21], TEMP[8].xyzz, TEMP[16].xxxx, TEMP[20].xyzz;
 21: MOV TEMP[10].xyz, TEMP[21].xyzx;
 22: MOV OUTPUT[2], TEMP[10];
 23: END

Mesa IR post Mesa IR optimizations
# Fragment Program/Shader 0
  0: TXP TEMP[0], INPUT[4].xyyw, texture[0], 2D;
  1: MUL TEMP[1].xyz, TEMP[0], INPUT[1];
  2: MOV TEMP[0].xyz, TEMP[1].xyzx;
  3: MOV TEMP[0].w, INPUT[1].wwww;
  4: MOV TEMP[2], TEMP[0];
  5: MUL TEMP[0].x, INPUT[3].xxxx, STATE[1].wwww;
  6: MUL TEMP[3].x, TEMP[0].xxxx, TEMP[0].xxxx;
  7: EX2 TEMP[0].x, TEMP[3].-x-x-x-x;
  8: MOV_SAT TEMP[3].x, TEMP[0].xxxx;
  9: ADD TEMP[0].x, CONST[4].xxxx, TEMP[3].-x-x-x-x;
 10: MUL TEMP[4].xyz, STATE[2].xyzz, TEMP[0].xxxx;
 11: MAD TEMP[2].xyz, TEMP[1].xyzx, TEMP[3].xxxx, TEMP[4].xyzx;
 12: MOV OUTPUT[2], TEMP[2];
 13: END


More information about the mesa-dev mailing list