[Mesa-dev] [RFC] GL fixed function fragment shaders

Fri Mar 18 15:42:07 PDT 2011

On 03/18/2011 02:31 PM, Jakob Bornecrantz wrote:
> On Mon, Jan 17, 2011 at 10:40 PM, Eric Anholt<eric at anholt.net>  wrote:
>> On Thu, 13 Jan 2011 17:40:39 +0100, Roland Scheidegger<sroland at vmware.com>  wrote:
>>> Am 12.01.2011 23:04, schrieb Eric Anholt:
>>>> This is a work-in-progress patch series to switch texenvprogram.c from
>>>> generating ARB_fp style Mesa IR to generating GLSL IR as its product.
>>>> For drivers without native GLSL codegen, that is then turned into the
>>>> Mesa IR that can be consumed.  However, for 965 we don't use the Mesa
>>>> IR product and just use the GLSL output, producing much better code
>>>> thanks to the new backend.  This is part of a long term goal to get
>>>> Mesa drivers off of Mesa IR and producing their instruction stream
>>>> directly from the GLSL IR.
>>>>
>>>> I'm not planning on committing this series immediately, as I've still
>>>> got a regression in the 965 driver with texrect-many on the last
>>>> commit.
>>>>
>>>> As a comparison, here's one of the shaders from openarena before:
>>>
>>> So what's the code looking like after conversion to mesa IR? As long
>>> as
>>
>
> [SNIP]
>
>>
>> So, there's one extra Mesa IR move added where we could compute into the
>> destination reg but don't.  This is a general problem with
>> ir_to_mesa.cpp that affects GLSL pretty badly.
>
> I found pretty much the same thing when looking into tunnel:
>
> # Fragment Program/Shader 0
>   0: TXP TEMP[0], INPUT[4].xyyw, texture[0], 2D;
>   1: MUL TEMP[1].xyz, TEMP[0], INPUT[1];
>   2: MOV TEMP[0].xyz, TEMP[1].xyzx;
>   3: MOV TEMP[0].w, INPUT[1].wwww;
>   4: MOV TEMP[2], TEMP[0];
>   5: MUL TEMP[0].x, INPUT[3].xxxx, STATE[1].wwww;
>   6: MUL TEMP[3].x, TEMP[0].xxxx, TEMP[0].xxxx;
>   7: EX2 TEMP[0].x, TEMP[3].-x-x-x-x;
>   8: MOV_SAT TEMP[3].x, TEMP[0].xxxx;
>   9: ADD TEMP[0].x, CONST[4].xxxx, TEMP[3].-x-x-x-x;
> 10: MUL TEMP[4].xyz, STATE[2].xyzz, TEMP[0].xxxx;
> 11: MAD TEMP[2].xyz, TEMP[1].xyzx, TEMP[3].xxxx, TEMP[4].xyzx;
> 12: MOV OUTPUT[2], TEMP[2];
> 13: END
>
> # Fragment Program/Shader 0
>   0: TXP TEMP[0], INPUT[4], texture[0], 2D;
>   1: MUL_SAT TEMP[1].xyz, TEMP[0], INPUT[1];
>   2: MOV_SAT TEMP[1].w, INPUT[1];
>   3: MUL TEMP[2].x, STATE[0].wwww, INPUT[3].xxxx;
>   4: MUL TEMP[2].x, TEMP[2].xxxx, TEMP[2].xxxx;
>   5: EX2_SAT TEMP[2].x, TEMP[2].-x-x-x-x;
>   6: LRP OUTPUT[2].xyz, TEMP[2].xxxx, TEMP[1], STATE[1];
>   7: MOV OUTPUT[2].w, TEMP[1];
>   8: END
>
> I got similar results, tho the effects are more visible here. Also
> note that the new shader uses 5 temps compared to 3. The FF setup I
> think only uses fog (or one texenv modulate) so its not just hard to
> program texenv that gets effect by this change.
>
> Now looking at how this is generated, the new code seems to generate
> it quite similarly to the old. After that tho things gets interesting,
> after the generation step the old code is now done and is on the
> already optimized form you see above. The new code however is far from
> done. Going through it first go through various common GLSL IR
> optimizations steps (from the attached text file, the second shader
> and third shader in the file both are the same just with and without
> the inlining of GLSL IR). Finally it calls _mesa_optimize_program
> which gets it to its current form.
>
> As for the code itself, it doesn't look as bad as I thought it would,
> there are a lot of allocations, a fair bit of extra typing tho loc
> count in the commit stays about the same even less, the reason behind
> that is that texenv has its own implementation of ureg. Not counting
> that a conversion to GLSL IR would instead add extra locs.
>
>>
>> Of course, talking about optimality of Mesa IR is kind of a joke, as for
>> the drivers that directly consume it (i915, 965 VS, r200, and I'm
>> discounting r300+ as they have their own IR that Mesa IR gets translated
>> to and actually optimized), we miss huge opportunities to reduce
>> instruction count due to swizzle sources including -1, 0, 1 as options
>> but Mesa IR not taking advantage of it.  If we were doing that right,
>> then the other MOV-reduction pass would hit and that extra move just
>> added here would go away, resulting in a net win.
>
> This could be done with any of the IR's (provided numeric swizzling is
> added) and something that I have been thinking about adding to TGSI.
> As pretty much all hw supports it natively (exception being svga).
>
>>
>> Similarly, we add an extra indirection phase according to 915's
>> accounting of those on the second shader, but the fact that we don't
>> schedule those in our GLSL output anyway is a big issue for GLSL on
>> hardware with indirection limits.
>>
>>> it's not worse than the original I guess this should be ok, though for
>>> those drivers consuming mesa IR I guess it's just more cpu time without
>>> any real benefit?
>>
>> Assuming that the setup the app did was already optimal for a
>> programmable GPU, yes.  But I suspect that isn't generally the case --
>> while OA has reasonable looking fixed function setup (other than Mesa IR
>> we produce not using the swizzles), given how painful it is to program
>> using texenv I suspect there are a lot of "suboptimal" shader setups out
>> there that we could actually improve.
>
> You posted some GLSL IR cpu optimizations patches after pushing this
> code and only the delta between pre and post optimizations. What is
> the delta for the old MesaIR code and GLSL IR code, if you didn't do
> any testing can you give an estimate? We seem to be doing a lot more
> cpu crunching for worse results.
>
>>> For gallium we should probably address this some way
>>> or another, it seems quite backward to do ff->glsl->mesa ir->tgsi.
>>
>> I'm surprised you guys haven't forked off ir_to_mesa.cpp to something
>> that produces TGSI, since you seem to prefer it as the thing for drivers
>> to consume over GLSL IR.

It would probably be a good step, but it's quite a bit of work.

>>  At least with sized variables, you could then
>> adapt the Mesa IR optimization passes on TGSI so that they wouldn't all
>> be disabled whenever relative addressing occurred.

Yeah, that's been a problem.

>>  I'm only interested
>> in Mesa IR for hardware that doesn't have relative addressing of temps,
>> so it's not really an issue to me.
>
> While a ir_to_tgsi is needed, I'm a quite worried that the old
> _mesa_optimize_program was needed at all to even get it close to
> comparable output.

Since the texenvprogram.c and ff_fragment_shader.cpp code have the 
same simple entrypoint, both could coexist for a while and we could 
choose between them with a simple flag.  I'm not too crazy about 
switching to the new path if it means a substantial performance hit.

-Brian