<div class="gmail_quote">On Thu, Apr 28, 2011 at 5:23 AM, Brian Paul <<a href="mailto:brian.e.paul@gmail.com" target="_blank">brian.e.paul@gmail.com</a>> wrote: <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <div>On Tue, Apr 26, 2011 at 12:26 AM, Bryan Cain <<a href="mailto:bryancain3@gmail.com" target="_blank">bryancain3@gmail.com</a>> wrote: > Hi, > > In the last week or so, I've been working on a direct translator from > GLSL IR to TGSI that does not go through Mesa IR. Although it is still > a work in progress, it is now working and very usable. So before I go > on, here is a link to the branch I've pushed to GitHub: > > <a href="https://github.com/Plombo/mesa/tree/glsl-130" target="_blank">https://github.com/Plombo/mesa/tree/glsl-130</a> > > My main objective with this work is to make GLSL 1.30 support feasible > on Gallium drivers. From what I understand, it would be difficult or > impossible to implement integer-specific opcodes such as shifting and > bit masking in Mesa IR, since it only supports floats. TGSI, on the > other hand, doesn't have this problem, and already supports most or all > of the functionality required by GLSL 1.30. </div>Unfortunately, TGSI doesn't have everything we need yet. There's opcodes for binary AND, OR, XOR, etc. and a few integer operations, but it's incomplete. It shouldn't be a big deal to add what's missing but it'll take a little time. I think everyone agrees that we want to eventually ditch Mesa's IR. I _think_ that the only classic Mesa driver that uses Mesa IR and hasn't been deprecated by a Gallium driver, or already weaned from Mesa IR is swrast. How much does the i965 driver still rely on swrast for fallbacks? Do the Intel people see need for a GLSL IR executor for swrast? <div> > The translator started as a modified version of ir_to_mesa, and that > origin is still obvious from reading the code. Many parts of ir_to_mesa > are still untouched - glsl_to_tgsi is still a long way away from > eliminating all traces of Mesa IR. It also contains a significant > amount of code adapted from st_mesa_to_tgsi, but modified to generate > TGSI code from the glsl_to_tgsi_instruction class instead of using Mesa > IR. (It actually still generates Mesa IR instructions, but that could > be safely removed at some point since the generated Mesa IR instructions > are not actually used for anything.) I'm planning to push more of the > conversion to TGSI higher up in the stack in the future, although the > remaining remnants of Mesa IR (such as the Mesa IR opcodes used by most > of glsl_to_tgsi) aren't doing any harm. </div>I finally found a little time to look over your code. As you said, it's basically a copy & paste of the ir_to_mesa.cpp and st_mesa_to_tgsi.c code at this time. Do you plan to eliminate all remnants of Mesa IR there before adding support for GLSL 1.30? One easy step would be to replace use of Mesa IR opcodes with TGSI opcodes and add new TGSI opcodes for integer ops. <div><div></div><div> > Since the _mesa_optimize_program function is vital to generating > optimized code with ir_to_mesa, and it is not available when not using > Mesa IR, I've written some new optimization passes for > glsl_to_tgsi_visitor that perform dead code elimination and > consolidation of the temporary register space. Although they are rather > simple, they do make a huge difference in the quality of the output. As > an example, here is what it generates for the vertex shader in the > Mandelbrot GLSL demo from the Mesa demos repository: > > VERT > DCL IN[0] > DCL IN[1] > DCL IN[2] > DCL OUT[0], POSITION > DCL OUT[1], GENERIC[10] > DCL OUT[2], GENERIC[11] > DCL CONST[0..14] > DCL TEMP[0..4] > IMM FLT32 { 2.0000, 0.0000, -0.5000, 5.0000} > 0: MUL TEMP[0], CONST[4], IN[0].xxxx > 1: MAD TEMP[0], CONST[5], IN[0].yyyy, TEMP[0] > 2: MAD TEMP[0], CONST[6], IN[0].zzzz, TEMP[0] > 3: MAD TEMP[0], CONST[7], IN[0].wwww, TEMP[0] > 4: MUL TEMP[1].xyz, CONST[12].xyzz, IN[1].xxxx > 5: MAD TEMP[1], CONST[13].xyzz, IN[1].yyyy, TEMP[1].xyzz > 6: MAD TEMP[1], CONST[14].xyzz, IN[1].zzzz, TEMP[1].xyzz > 7: DP3 TEMP[2].x, TEMP[1].xyzz, TEMP[1].xyzz > 8: RSQ TEMP[2].x, TEMP[2].xxxx > 9: MUL TEMP[1].xyz, TEMP[1].xyzz, TEMP[2].xxxx > 10: ADD TEMP[2].xyz, CONST[3].xyzz, -TEMP[0].xyzz > 11: DP3 TEMP[3].x, TEMP[2].xyzz, TEMP[2].xyzz > 12: RSQ TEMP[3].x, TEMP[3].xxxx > 13: MUL TEMP[2].xyz, TEMP[2].xyzz, TEMP[3].xxxx > 14: MOV TEMP[3].xyz, -TEMP[2].xyzx > 15: MOV TEMP[0].xyz, -TEMP[0].xyzx > 16: DP3 TEMP[4].x, TEMP[1].xyzz, TEMP[3].xyzz > 17: MUL TEMP[4].xyz, TEMP[4].xxxx, TEMP[1].xyzz > 18: MUL TEMP[4].xyz, IMM[0].xxxx, TEMP[4].xyzz > 19: ADD TEMP[3].xyz, TEMP[3].xyzz, -TEMP[4].xyzz > 20: DP3 TEMP[4].x, TEMP[0].xyzz, TEMP[0].xyzz > 21: RSQ TEMP[4].x, TEMP[4].xxxx > 22: MUL TEMP[0].xyz, TEMP[0].xyzz, TEMP[4].xxxx > 23: DP3 TEMP[0].x, TEMP[3].xyzz, TEMP[0].xyzz > 24: MAX TEMP[0].x, TEMP[0].xxxx, IMM[0].yyyy > 25: POW TEMP[0].x, TEMP[0].xxxx, CONST[0].xxxx > 26: DP3 TEMP[1].x, TEMP[2].xyzz, TEMP[1].xyzz > 27: MAX TEMP[1].x, TEMP[1].xxxx, IMM[0].yyyy > 28: MUL TEMP[1].x, CONST[1].xxxx, TEMP[1].xxxx > 29: MAD TEMP[0], CONST[2].xxxx, TEMP[0].xxxx, TEMP[1].xxxx > 30: MOV OUT[2], TEMP[0].xxxx > 31: ADD TEMP[0], IN[2], IMM[0].zzzz > 32: MUL TEMP[0].xyz, TEMP[0].xyzz, IMM[0].wwww > 33: MOV OUT[1].xyz, TEMP[0].xyzx > 34: MUL TEMP[0], CONST[8], IN[0].xxxx > 35: MAD TEMP[0], CONST[9], IN[0].yyyy, TEMP[0] > 36: MAD TEMP[0], CONST[10], IN[0].zzzz, TEMP[0] > 37: MAD TEMP[0], CONST[11], IN[0].wwww, TEMP[0] > 38: MOV OUT[0], TEMP[0] > 39: END > > Here is the same shader as generated by ir_to_mesa and st_mesa_to_tgsi > in Mesa master: > > VERT > DCL IN[0] > DCL IN[1] > DCL IN[2] > DCL OUT[0], POSITION > DCL OUT[1], GENERIC[10] > DCL OUT[2], GENERIC[11] > DCL CONST[0..14] > DCL TEMP[0..4] > IMM FLT32 { 2.0000, 0.0000, -0.5000, 5.0000} > 0: MUL TEMP[0], CONST[4], IN[0].xxxx > 1: MAD TEMP[0], CONST[5], IN[0].yyyy, TEMP[0] > 2: MAD TEMP[0], CONST[6], IN[0].zzzz, TEMP[0] > 3: MAD TEMP[0], CONST[7], IN[0].wwww, TEMP[0] > 4: MUL TEMP[1].xyz, CONST[12].xyzz, IN[1].xxxx > 5: MAD TEMP[1].xyz, CONST[13].xyzz, IN[1].yyyy, TEMP[1].xyzz > 6: MAD TEMP[1].xyz, CONST[14].xyzz, IN[1].zzzz, TEMP[1].xyzz > 7: DP3 TEMP[2].x, TEMP[1].xyzz, TEMP[1].xyzz > 8: RSQ TEMP[2].x, TEMP[2].xxxx > 9: MUL TEMP[1].xyz, TEMP[1].xyzz, TEMP[2].xxxx > 10: ADD TEMP[2].xyz, CONST[3].xyzz, -TEMP[0].xyzz > 11: DP3 TEMP[3].x, TEMP[2].xyzz, TEMP[2].xyzz > 12: RSQ TEMP[3].x, TEMP[3].xxxx > 13: MUL TEMP[2].xyz, TEMP[2].xyzz, TEMP[3].xxxx > 14: MOV TEMP[3].xyz, -TEMP[2].xyzx > 15: MOV TEMP[0].xyz, -TEMP[0].xyzx > 16: DP3 TEMP[4].x, TEMP[1].xyzz, TEMP[3].xyzz > 17: MUL TEMP[4].xyz, TEMP[4].xxxx, TEMP[1].xyzz > 18: MUL TEMP[4].xyz, IMM[0].xxxx, TEMP[4].xyzz > 19: ADD TEMP[3].xyz, TEMP[3].xyzz, -TEMP[4].xyzz > 20: DP3 TEMP[4].x, TEMP[0].xyzz, TEMP[0].xyzz > 21: RSQ TEMP[4].x, TEMP[4].xxxx > 22: MUL TEMP[0].xyz, TEMP[0].xyzz, TEMP[4].xxxx > 23: DP3 TEMP[0].x, TEMP[3].xyzz, TEMP[0].xyzz > 24: MAX TEMP[0].x, TEMP[0].xxxx, IMM[0].yyyy > 25: POW TEMP[0].x, TEMP[0].xxxx, CONST[0].xxxx > 26: DP3 TEMP[1].x, TEMP[2].xyzz, TEMP[1].xyzz > 27: MAX TEMP[1].x, TEMP[1].xxxx, IMM[0].yyyy > 28: MUL TEMP[1].x, CONST[1].xxxx, TEMP[1].xxxx > 29: MAD OUT[2], CONST[2].xxxx, TEMP[0].xxxx, TEMP[1].xxxx > 30: ADD TEMP[0], IN[2], IMM[0].zzzz > 31: MUL OUT[1].xyz, TEMP[0].xyzx, IMM[0].wwwx > 32: MUL TEMP[0], CONST[8], IN[0].xxxx > 33: MAD TEMP[0], CONST[9], IN[0].yyyy, TEMP[0] > 34: MAD TEMP[0], CONST[10], IN[0].zzzz, TEMP[0] > 35: MAD OUT[0], CONST[11], IN[0].wwww, TEMP[0] > 36: END > > With neither the new optimization passes nor _mesa_optimize_program, the > shader has 44 instructions and 40 temporaries. Both optimized shaders > have only 5 temporaries declared. For every shader I've tried, in fact, > my register consolidation passes result in exactly the same number of > temporaries being used as when _mesa_optimize_program is used. In terms > of instruction count, the only optimization visible that is implemented > in Mesa master but not in the GLSL IR to TGSI converter is copy > propagation to output registers, which accounts for 2 of the 3 extra > instructions in the st_glsl_to_tgsi version of the shader. > > One current weakness of my new optimization passes is that they don't > optimize code inside of loops as well as they should, although at least > they don't break code that uses loops to the best of my knowledge and > testing. > > I'd very much appreciate any comments, feedback, patches, or testing. </div></div>I don't have any spare time to test anything right now. The only feedback I have for now would be superficial (whitespace inconsistencies, comments, etc). But I'm glad you're taking on this project. </blockquote></div> FWIW, In order to keep all the other drivers working and especially those which can't support integer opcodes, there should be a way for a driver to report that it doesn't accept those opcodes and glsl_to_tgsi shouldn't generate them then. The cap could be e.g. PIPE_CAP_SM4 or PIPE_CAP_SHADER_MODEL returning a number >=4. Marek