[Mesa-dev] [RFC] GL fixed function fragment shaders
brianp at vmware.com
Fri Mar 18 15:42:07 PDT 2011
On 03/18/2011 02:31 PM, Jakob Bornecrantz wrote:
> On Mon, Jan 17, 2011 at 10:40 PM, Eric Anholt<eric at anholt.net> wrote:
>> On Thu, 13 Jan 2011 17:40:39 +0100, Roland Scheidegger<sroland at vmware.com> wrote:
>>> Am 12.01.2011 23:04, schrieb Eric Anholt:
>>>> This is a work-in-progress patch series to switch texenvprogram.c from
>>>> generating ARB_fp style Mesa IR to generating GLSL IR as its product.
>>>> For drivers without native GLSL codegen, that is then turned into the
>>>> Mesa IR that can be consumed. However, for 965 we don't use the Mesa
>>>> IR product and just use the GLSL output, producing much better code
>>>> thanks to the new backend. This is part of a long term goal to get
>>>> Mesa drivers off of Mesa IR and producing their instruction stream
>>>> directly from the GLSL IR.
>>>> I'm not planning on committing this series immediately, as I've still
>>>> got a regression in the 965 driver with texrect-many on the last
>>>> As a comparison, here's one of the shaders from openarena before:
>>> So what's the code looking like after conversion to mesa IR? As long
>> So, there's one extra Mesa IR move added where we could compute into the
>> destination reg but don't. This is a general problem with
>> ir_to_mesa.cpp that affects GLSL pretty badly.
> I found pretty much the same thing when looking into tunnel:
> # Fragment Program/Shader 0
> 0: TXP TEMP, INPUT.xyyw, texture, 2D;
> 1: MUL TEMP.xyz, TEMP, INPUT;
> 2: MOV TEMP.xyz, TEMP.xyzx;
> 3: MOV TEMP.w, INPUT.wwww;
> 4: MOV TEMP, TEMP;
> 5: MUL TEMP.x, INPUT.xxxx, STATE.wwww;
> 6: MUL TEMP.x, TEMP.xxxx, TEMP.xxxx;
> 7: EX2 TEMP.x, TEMP.-x-x-x-x;
> 8: MOV_SAT TEMP.x, TEMP.xxxx;
> 9: ADD TEMP.x, CONST.xxxx, TEMP.-x-x-x-x;
> 10: MUL TEMP.xyz, STATE.xyzz, TEMP.xxxx;
> 11: MAD TEMP.xyz, TEMP.xyzx, TEMP.xxxx, TEMP.xyzx;
> 12: MOV OUTPUT, TEMP;
> 13: END
> # Fragment Program/Shader 0
> 0: TXP TEMP, INPUT, texture, 2D;
> 1: MUL_SAT TEMP.xyz, TEMP, INPUT;
> 2: MOV_SAT TEMP.w, INPUT;
> 3: MUL TEMP.x, STATE.wwww, INPUT.xxxx;
> 4: MUL TEMP.x, TEMP.xxxx, TEMP.xxxx;
> 5: EX2_SAT TEMP.x, TEMP.-x-x-x-x;
> 6: LRP OUTPUT.xyz, TEMP.xxxx, TEMP, STATE;
> 7: MOV OUTPUT.w, TEMP;
> 8: END
> I got similar results, tho the effects are more visible here. Also
> note that the new shader uses 5 temps compared to 3. The FF setup I
> think only uses fog (or one texenv modulate) so its not just hard to
> program texenv that gets effect by this change.
> Now looking at how this is generated, the new code seems to generate
> it quite similarly to the old. After that tho things gets interesting,
> after the generation step the old code is now done and is on the
> already optimized form you see above. The new code however is far from
> done. Going through it first go through various common GLSL IR
> optimizations steps (from the attached text file, the second shader
> and third shader in the file both are the same just with and without
> the inlining of GLSL IR). Finally it calls _mesa_optimize_program
> which gets it to its current form.
> As for the code itself, it doesn't look as bad as I thought it would,
> there are a lot of allocations, a fair bit of extra typing tho loc
> count in the commit stays about the same even less, the reason behind
> that is that texenv has its own implementation of ureg. Not counting
> that a conversion to GLSL IR would instead add extra locs.
>> Of course, talking about optimality of Mesa IR is kind of a joke, as for
>> the drivers that directly consume it (i915, 965 VS, r200, and I'm
>> discounting r300+ as they have their own IR that Mesa IR gets translated
>> to and actually optimized), we miss huge opportunities to reduce
>> instruction count due to swizzle sources including -1, 0, 1 as options
>> but Mesa IR not taking advantage of it. If we were doing that right,
>> then the other MOV-reduction pass would hit and that extra move just
>> added here would go away, resulting in a net win.
> This could be done with any of the IR's (provided numeric swizzling is
> added) and something that I have been thinking about adding to TGSI.
> As pretty much all hw supports it natively (exception being svga).
>> Similarly, we add an extra indirection phase according to 915's
>> accounting of those on the second shader, but the fact that we don't
>> schedule those in our GLSL output anyway is a big issue for GLSL on
>> hardware with indirection limits.
>>> it's not worse than the original I guess this should be ok, though for
>>> those drivers consuming mesa IR I guess it's just more cpu time without
>>> any real benefit?
>> Assuming that the setup the app did was already optimal for a
>> programmable GPU, yes. But I suspect that isn't generally the case --
>> while OA has reasonable looking fixed function setup (other than Mesa IR
>> we produce not using the swizzles), given how painful it is to program
>> using texenv I suspect there are a lot of "suboptimal" shader setups out
>> there that we could actually improve.
> You posted some GLSL IR cpu optimizations patches after pushing this
> code and only the delta between pre and post optimizations. What is
> the delta for the old MesaIR code and GLSL IR code, if you didn't do
> any testing can you give an estimate? We seem to be doing a lot more
> cpu crunching for worse results.
>>> For gallium we should probably address this some way
>>> or another, it seems quite backward to do ff->glsl->mesa ir->tgsi.
>> I'm surprised you guys haven't forked off ir_to_mesa.cpp to something
>> that produces TGSI, since you seem to prefer it as the thing for drivers
>> to consume over GLSL IR.
It would probably be a good step, but it's quite a bit of work.
>> At least with sized variables, you could then
>> adapt the Mesa IR optimization passes on TGSI so that they wouldn't all
>> be disabled whenever relative addressing occurred.
Yeah, that's been a problem.
>> I'm only interested
>> in Mesa IR for hardware that doesn't have relative addressing of temps,
>> so it's not really an issue to me.
> While a ir_to_tgsi is needed, I'm a quite worried that the old
> _mesa_optimize_program was needed at all to even get it close to
> comparable output.
Since the texenvprogram.c and ff_fragment_shader.cpp code have the
same simple entrypoint, both could coexist for a while and we could
choose between them with a simple flag. I'm not too crazy about
switching to the new path if it means a substantial performance hit.
More information about the mesa-dev