<div class="gmail_quote">On Thu, Apr 28, 2011 at 5:23 AM, Brian Paul <span dir="ltr"><<a href="mailto:brian.e.paul@gmail.com" target="_blank">brian.e.paul@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div>On Tue, Apr 26, 2011 at 12:26 AM, Bryan Cain <<a href="mailto:bryancain3@gmail.com" target="_blank">bryancain3@gmail.com</a>> wrote:<br>
> Hi,<br>
><br>
> In the last week or so, I've been working on a direct translator from<br>
> GLSL IR to TGSI that does not go through Mesa IR. Although it is still<br>
> a work in progress, it is now working and very usable. So before I go<br>
> on, here is a link to the branch I've pushed to GitHub:<br>
><br>
> <a href="https://github.com/Plombo/mesa/tree/glsl-130" target="_blank">https://github.com/Plombo/mesa/tree/glsl-130</a><br>
><br>
> My main objective with this work is to make GLSL 1.30 support feasible<br>
> on Gallium drivers. From what I understand, it would be difficult or<br>
> impossible to implement integer-specific opcodes such as shifting and<br>
> bit masking in Mesa IR, since it only supports floats. TGSI, on the<br>
> other hand, doesn't have this problem, and already supports most or all<br>
> of the functionality required by GLSL 1.30.<br>
<br>
</div>Unfortunately, TGSI doesn't have everything we need yet. There's<br>
opcodes for binary AND, OR, XOR, etc. and a few integer operations,<br>
but it's incomplete. It shouldn't be a big deal to add what's missing<br>
but it'll take a little time.<br>
<br>
I think everyone agrees that we want to eventually ditch Mesa's IR. I<br>
_think_ that the only classic Mesa driver that uses Mesa IR and hasn't<br>
been deprecated by a Gallium driver, or already weaned from Mesa IR is<br>
swrast. How much does the i965 driver still rely on swrast for<br>
fallbacks? Do the Intel people see need for a GLSL IR executor for<br>
swrast?<br>
<div><br>
<br>
> The translator started as a modified version of ir_to_mesa, and that<br>
> origin is still obvious from reading the code. Many parts of ir_to_mesa<br>
> are still untouched - glsl_to_tgsi is still a long way away from<br>
> eliminating all traces of Mesa IR. It also contains a significant<br>
> amount of code adapted from st_mesa_to_tgsi, but modified to generate<br>
> TGSI code from the glsl_to_tgsi_instruction class instead of using Mesa<br>
> IR. (It actually still generates Mesa IR instructions, but that could<br>
> be safely removed at some point since the generated Mesa IR instructions<br>
> are not actually used for anything.) I'm planning to push more of the<br>
> conversion to TGSI higher up in the stack in the future, although the<br>
> remaining remnants of Mesa IR (such as the Mesa IR opcodes used by most<br>
> of glsl_to_tgsi) aren't doing any harm.<br>
<br>
</div>I finally found a little time to look over your code. As you said,<br>
it's basically a copy & paste of the ir_to_mesa.cpp and<br>
st_mesa_to_tgsi.c code at this time. Do you plan to eliminate all<br>
remnants of Mesa IR there before adding support for GLSL 1.30? One<br>
easy step would be to replace use of Mesa IR opcodes with TGSI opcodes<br>
and add new TGSI opcodes for integer ops.<br>
<div><div></div><div><br>
<br>
> Since the _mesa_optimize_program function is vital to generating<br>
> optimized code with ir_to_mesa, and it is not available when not using<br>
> Mesa IR, I've written some new optimization passes for<br>
> glsl_to_tgsi_visitor that perform dead code elimination and<br>
> consolidation of the temporary register space. Although they are rather<br>
> simple, they do make a huge difference in the quality of the output. As<br>
> an example, here is what it generates for the vertex shader in the<br>
> Mandelbrot GLSL demo from the Mesa demos repository:<br>
><br>
> VERT<br>
> DCL IN[0]<br>
> DCL IN[1]<br>
> DCL IN[2]<br>
> DCL OUT[0], POSITION<br>
> DCL OUT[1], GENERIC[10]<br>
> DCL OUT[2], GENERIC[11]<br>
> DCL CONST[0..14]<br>
> DCL TEMP[0..4]<br>
> IMM FLT32 { 2.0000, 0.0000, -0.5000, 5.0000}<br>
> 0: MUL TEMP[0], CONST[4], IN[0].xxxx<br>
> 1: MAD TEMP[0], CONST[5], IN[0].yyyy, TEMP[0]<br>
> 2: MAD TEMP[0], CONST[6], IN[0].zzzz, TEMP[0]<br>
> 3: MAD TEMP[0], CONST[7], IN[0].wwww, TEMP[0]<br>
> 4: MUL TEMP[1].xyz, CONST[12].xyzz, IN[1].xxxx<br>
> 5: MAD TEMP[1], CONST[13].xyzz, IN[1].yyyy, TEMP[1].xyzz<br>
> 6: MAD TEMP[1], CONST[14].xyzz, IN[1].zzzz, TEMP[1].xyzz<br>
> 7: DP3 TEMP[2].x, TEMP[1].xyzz, TEMP[1].xyzz<br>
> 8: RSQ TEMP[2].x, TEMP[2].xxxx<br>
> 9: MUL TEMP[1].xyz, TEMP[1].xyzz, TEMP[2].xxxx<br>
> 10: ADD TEMP[2].xyz, CONST[3].xyzz, -TEMP[0].xyzz<br>
> 11: DP3 TEMP[3].x, TEMP[2].xyzz, TEMP[2].xyzz<br>
> 12: RSQ TEMP[3].x, TEMP[3].xxxx<br>
> 13: MUL TEMP[2].xyz, TEMP[2].xyzz, TEMP[3].xxxx<br>
> 14: MOV TEMP[3].xyz, -TEMP[2].xyzx<br>
> 15: MOV TEMP[0].xyz, -TEMP[0].xyzx<br>
> 16: DP3 TEMP[4].x, TEMP[1].xyzz, TEMP[3].xyzz<br>
> 17: MUL TEMP[4].xyz, TEMP[4].xxxx, TEMP[1].xyzz<br>
> 18: MUL TEMP[4].xyz, IMM[0].xxxx, TEMP[4].xyzz<br>
> 19: ADD TEMP[3].xyz, TEMP[3].xyzz, -TEMP[4].xyzz<br>
> 20: DP3 TEMP[4].x, TEMP[0].xyzz, TEMP[0].xyzz<br>
> 21: RSQ TEMP[4].x, TEMP[4].xxxx<br>
> 22: MUL TEMP[0].xyz, TEMP[0].xyzz, TEMP[4].xxxx<br>
> 23: DP3 TEMP[0].x, TEMP[3].xyzz, TEMP[0].xyzz<br>
> 24: MAX TEMP[0].x, TEMP[0].xxxx, IMM[0].yyyy<br>
> 25: POW TEMP[0].x, TEMP[0].xxxx, CONST[0].xxxx<br>
> 26: DP3 TEMP[1].x, TEMP[2].xyzz, TEMP[1].xyzz<br>
> 27: MAX TEMP[1].x, TEMP[1].xxxx, IMM[0].yyyy<br>
> 28: MUL TEMP[1].x, CONST[1].xxxx, TEMP[1].xxxx<br>
> 29: MAD TEMP[0], CONST[2].xxxx, TEMP[0].xxxx, TEMP[1].xxxx<br>
> 30: MOV OUT[2], TEMP[0].xxxx<br>
> 31: ADD TEMP[0], IN[2], IMM[0].zzzz<br>
> 32: MUL TEMP[0].xyz, TEMP[0].xyzz, IMM[0].wwww<br>
> 33: MOV OUT[1].xyz, TEMP[0].xyzx<br>
> 34: MUL TEMP[0], CONST[8], IN[0].xxxx<br>
> 35: MAD TEMP[0], CONST[9], IN[0].yyyy, TEMP[0]<br>
> 36: MAD TEMP[0], CONST[10], IN[0].zzzz, TEMP[0]<br>
> 37: MAD TEMP[0], CONST[11], IN[0].wwww, TEMP[0]<br>
> 38: MOV OUT[0], TEMP[0]<br>
> 39: END<br>
><br>
> Here is the same shader as generated by ir_to_mesa and st_mesa_to_tgsi<br>
> in Mesa master:<br>
><br>
> VERT<br>
> DCL IN[0]<br>
> DCL IN[1]<br>
> DCL IN[2]<br>
> DCL OUT[0], POSITION<br>
> DCL OUT[1], GENERIC[10]<br>
> DCL OUT[2], GENERIC[11]<br>
> DCL CONST[0..14]<br>
> DCL TEMP[0..4]<br>
> IMM FLT32 { 2.0000, 0.0000, -0.5000, 5.0000}<br>
> 0: MUL TEMP[0], CONST[4], IN[0].xxxx<br>
> 1: MAD TEMP[0], CONST[5], IN[0].yyyy, TEMP[0]<br>
> 2: MAD TEMP[0], CONST[6], IN[0].zzzz, TEMP[0]<br>
> 3: MAD TEMP[0], CONST[7], IN[0].wwww, TEMP[0]<br>
> 4: MUL TEMP[1].xyz, CONST[12].xyzz, IN[1].xxxx<br>
> 5: MAD TEMP[1].xyz, CONST[13].xyzz, IN[1].yyyy, TEMP[1].xyzz<br>
> 6: MAD TEMP[1].xyz, CONST[14].xyzz, IN[1].zzzz, TEMP[1].xyzz<br>
> 7: DP3 TEMP[2].x, TEMP[1].xyzz, TEMP[1].xyzz<br>
> 8: RSQ TEMP[2].x, TEMP[2].xxxx<br>
> 9: MUL TEMP[1].xyz, TEMP[1].xyzz, TEMP[2].xxxx<br>
> 10: ADD TEMP[2].xyz, CONST[3].xyzz, -TEMP[0].xyzz<br>
> 11: DP3 TEMP[3].x, TEMP[2].xyzz, TEMP[2].xyzz<br>
> 12: RSQ TEMP[3].x, TEMP[3].xxxx<br>
> 13: MUL TEMP[2].xyz, TEMP[2].xyzz, TEMP[3].xxxx<br>
> 14: MOV TEMP[3].xyz, -TEMP[2].xyzx<br>
> 15: MOV TEMP[0].xyz, -TEMP[0].xyzx<br>
> 16: DP3 TEMP[4].x, TEMP[1].xyzz, TEMP[3].xyzz<br>
> 17: MUL TEMP[4].xyz, TEMP[4].xxxx, TEMP[1].xyzz<br>
> 18: MUL TEMP[4].xyz, IMM[0].xxxx, TEMP[4].xyzz<br>
> 19: ADD TEMP[3].xyz, TEMP[3].xyzz, -TEMP[4].xyzz<br>
> 20: DP3 TEMP[4].x, TEMP[0].xyzz, TEMP[0].xyzz<br>
> 21: RSQ TEMP[4].x, TEMP[4].xxxx<br>
> 22: MUL TEMP[0].xyz, TEMP[0].xyzz, TEMP[4].xxxx<br>
> 23: DP3 TEMP[0].x, TEMP[3].xyzz, TEMP[0].xyzz<br>
> 24: MAX TEMP[0].x, TEMP[0].xxxx, IMM[0].yyyy<br>
> 25: POW TEMP[0].x, TEMP[0].xxxx, CONST[0].xxxx<br>
> 26: DP3 TEMP[1].x, TEMP[2].xyzz, TEMP[1].xyzz<br>
> 27: MAX TEMP[1].x, TEMP[1].xxxx, IMM[0].yyyy<br>
> 28: MUL TEMP[1].x, CONST[1].xxxx, TEMP[1].xxxx<br>
> 29: MAD OUT[2], CONST[2].xxxx, TEMP[0].xxxx, TEMP[1].xxxx<br>
> 30: ADD TEMP[0], IN[2], IMM[0].zzzz<br>
> 31: MUL OUT[1].xyz, TEMP[0].xyzx, IMM[0].wwwx<br>
> 32: MUL TEMP[0], CONST[8], IN[0].xxxx<br>
> 33: MAD TEMP[0], CONST[9], IN[0].yyyy, TEMP[0]<br>
> 34: MAD TEMP[0], CONST[10], IN[0].zzzz, TEMP[0]<br>
> 35: MAD OUT[0], CONST[11], IN[0].wwww, TEMP[0]<br>
> 36: END<br>
><br>
> With neither the new optimization passes nor _mesa_optimize_program, the<br>
> shader has 44 instructions and 40 temporaries. Both optimized shaders<br>
> have only 5 temporaries declared. For every shader I've tried, in fact,<br>
> my register consolidation passes result in exactly the same number of<br>
> temporaries being used as when _mesa_optimize_program is used. In terms<br>
> of instruction count, the only optimization visible that is implemented<br>
> in Mesa master but not in the GLSL IR to TGSI converter is copy<br>
> propagation to output registers, which accounts for 2 of the 3 extra<br>
> instructions in the st_glsl_to_tgsi version of the shader.<br>
><br>
> One current weakness of my new optimization passes is that they don't<br>
> optimize code inside of loops as well as they should, although at least<br>
> they don't break code that uses loops to the best of my knowledge and<br>
> testing.<br>
><br>
> I'd very much appreciate any comments, feedback, patches, or testing.<br>
<br>
</div></div>I don't have any spare time to test anything right now. The only<br>
feedback I have for now would be superficial (whitespace<br>
inconsistencies, comments, etc). But I'm glad you're taking on this<br>
project.<br></blockquote></div><br>FWIW, In order to keep all the other drivers working and especially those which can't support integer opcodes, there should be a way for a driver to report that it doesn't accept those opcodes and glsl_to_tgsi shouldn't generate them then. The cap could be e.g. PIPE_CAP_SM4 or PIPE_CAP_SHADER_MODEL returning a number >=4.<br>
<br>Marek<br>