[Mesa-dev] GLSL IR to TGSI translator

Thu Apr 28 03:06:10 PDT 2011

On Thu, Apr 28, 2011 at 5:23 AM, Brian Paul <brian.e.paul at gmail.com> wrote:

> On Tue, Apr 26, 2011 at 12:26 AM, Bryan Cain <bryancain3 at gmail.com> wrote:
> > Hi,
> >
> > In the last week or so, I've been working on a direct translator from
> > GLSL IR to TGSI that does not go through Mesa IR.  Although it is still
> > a work in progress, it is now working and very usable.  So before I go
> > on, here is a link to the branch I've pushed to GitHub:
> >
> > https://github.com/Plombo/mesa/tree/glsl-130
> >
> > My main objective with this work is to make GLSL 1.30 support feasible
> > on Gallium drivers.  From what I understand, it would be difficult or
> > impossible to implement integer-specific opcodes such as shifting and
> > bit masking in Mesa IR, since it only supports floats.  TGSI, on the
> > other hand, doesn't have this problem, and already supports most or all
> > of the functionality required by GLSL 1.30.
>
> Unfortunately, TGSI doesn't have everything we need yet.  There's
> opcodes for binary AND, OR, XOR, etc. and a few integer operations,
> but it's incomplete.  It shouldn't be a big deal to add what's missing
> but it'll take a little time.
>
> I think everyone agrees that we want to eventually ditch Mesa's IR.  I
> _think_ that the only classic Mesa driver that uses Mesa IR and hasn't
> been deprecated by a Gallium driver, or already weaned from Mesa IR is
> swrast.  How much does the i965 driver still rely on swrast for
> fallbacks?  Do the Intel people see need for a GLSL IR executor for
> swrast?
>
>
> > The translator started as a modified version of ir_to_mesa, and that
> > origin is still obvious from reading the code.  Many parts of ir_to_mesa
> > are still untouched - glsl_to_tgsi is still a long way away from
> > eliminating all traces of Mesa IR.  It also contains a significant
> > amount of code adapted from st_mesa_to_tgsi, but modified to generate
> > TGSI code from the glsl_to_tgsi_instruction class instead of using Mesa
> > IR.  (It actually still generates Mesa IR instructions, but that could
> > be safely removed at some point since the generated Mesa IR instructions
> > are not actually used for anything.)  I'm planning to push more of the
> > conversion to TGSI higher up in the stack in the future, although the
> > remaining remnants of Mesa IR (such as the Mesa IR opcodes used by most
> > of glsl_to_tgsi) aren't doing any harm.
>
> I finally found a little time to look over your code.  As you said,
> it's basically a copy & paste of the ir_to_mesa.cpp and
> st_mesa_to_tgsi.c code at this time.  Do you plan to eliminate all
> remnants of Mesa IR there before adding support for GLSL 1.30?  One
> easy step would be to replace use of Mesa IR opcodes with TGSI opcodes
> and add new TGSI opcodes for integer ops.
>
>
> > Since the _mesa_optimize_program function is vital to generating
> > optimized code with ir_to_mesa, and it is not available when not using
> > Mesa IR, I've written some new optimization passes for
> > glsl_to_tgsi_visitor that perform dead code elimination and
> > consolidation of the temporary register space.  Although they are rather
> > simple, they do make a huge difference in the quality of the output.  As
> > an example, here is what it generates for the vertex shader in the
> > Mandelbrot GLSL demo from the Mesa demos repository:
> >
> > VERT
> > DCL IN[0]
> > DCL IN[1]
> > DCL IN[2]
> > DCL OUT[0], POSITION
> > DCL OUT[1], GENERIC[10]
> > DCL OUT[2], GENERIC[11]
> > DCL CONST[0..14]
> > DCL TEMP[0..4]
> > IMM FLT32 {    2.0000,     0.0000,    -0.5000,     5.0000}
> >  0: MUL TEMP[0], CONST[4], IN[0].xxxx
> >  1: MAD TEMP[0], CONST[5], IN[0].yyyy, TEMP[0]
> >  2: MAD TEMP[0], CONST[6], IN[0].zzzz, TEMP[0]
> >  3: MAD TEMP[0], CONST[7], IN[0].wwww, TEMP[0]
> >  4: MUL TEMP[1].xyz, CONST[12].xyzz, IN[1].xxxx
> >  5: MAD TEMP[1], CONST[13].xyzz, IN[1].yyyy, TEMP[1].xyzz
> >  6: MAD TEMP[1], CONST[14].xyzz, IN[1].zzzz, TEMP[1].xyzz
> >  7: DP3 TEMP[2].x, TEMP[1].xyzz, TEMP[1].xyzz
> >  8: RSQ TEMP[2].x, TEMP[2].xxxx
> >  9: MUL TEMP[1].xyz, TEMP[1].xyzz, TEMP[2].xxxx
> >  10: ADD TEMP[2].xyz, CONST[3].xyzz, -TEMP[0].xyzz
> >  11: DP3 TEMP[3].x, TEMP[2].xyzz, TEMP[2].xyzz
> >  12: RSQ TEMP[3].x, TEMP[3].xxxx
> >  13: MUL TEMP[2].xyz, TEMP[2].xyzz, TEMP[3].xxxx
> >  14: MOV TEMP[3].xyz, -TEMP[2].xyzx
> >  15: MOV TEMP[0].xyz, -TEMP[0].xyzx
> >  16: DP3 TEMP[4].x, TEMP[1].xyzz, TEMP[3].xyzz
> >  17: MUL TEMP[4].xyz, TEMP[4].xxxx, TEMP[1].xyzz
> >  18: MUL TEMP[4].xyz, IMM[0].xxxx, TEMP[4].xyzz
> >  19: ADD TEMP[3].xyz, TEMP[3].xyzz, -TEMP[4].xyzz
> >  20: DP3 TEMP[4].x, TEMP[0].xyzz, TEMP[0].xyzz
> >  21: RSQ TEMP[4].x, TEMP[4].xxxx
> >  22: MUL TEMP[0].xyz, TEMP[0].xyzz, TEMP[4].xxxx
> >  23: DP3 TEMP[0].x, TEMP[3].xyzz, TEMP[0].xyzz
> >  24: MAX TEMP[0].x, TEMP[0].xxxx, IMM[0].yyyy
> >  25: POW TEMP[0].x, TEMP[0].xxxx, CONST[0].xxxx
> >  26: DP3 TEMP[1].x, TEMP[2].xyzz, TEMP[1].xyzz
> >  27: MAX TEMP[1].x, TEMP[1].xxxx, IMM[0].yyyy
> >  28: MUL TEMP[1].x, CONST[1].xxxx, TEMP[1].xxxx
> >  29: MAD TEMP[0], CONST[2].xxxx, TEMP[0].xxxx, TEMP[1].xxxx
> >  30: MOV OUT[2], TEMP[0].xxxx
> >  31: ADD TEMP[0], IN[2], IMM[0].zzzz
> >  32: MUL TEMP[0].xyz, TEMP[0].xyzz, IMM[0].wwww
> >  33: MOV OUT[1].xyz, TEMP[0].xyzx
> >  34: MUL TEMP[0], CONST[8], IN[0].xxxx
> >  35: MAD TEMP[0], CONST[9], IN[0].yyyy, TEMP[0]
> >  36: MAD TEMP[0], CONST[10], IN[0].zzzz, TEMP[0]
> >  37: MAD TEMP[0], CONST[11], IN[0].wwww, TEMP[0]
> >  38: MOV OUT[0], TEMP[0]
> >  39: END
> >
> > Here is the same shader as generated by ir_to_mesa and st_mesa_to_tgsi
> > in Mesa master:
> >
> > VERT
> > DCL IN[0]
> > DCL IN[1]
> > DCL IN[2]
> > DCL OUT[0], POSITION
> > DCL OUT[1], GENERIC[10]
> > DCL OUT[2], GENERIC[11]
> > DCL CONST[0..14]
> > DCL TEMP[0..4]
> > IMM FLT32 {    2.0000,     0.0000,    -0.5000,     5.0000}
> >  0: MUL TEMP[0], CONST[4], IN[0].xxxx
> >  1: MAD TEMP[0], CONST[5], IN[0].yyyy, TEMP[0]
> >  2: MAD TEMP[0], CONST[6], IN[0].zzzz, TEMP[0]
> >  3: MAD TEMP[0], CONST[7], IN[0].wwww, TEMP[0]
> >  4: MUL TEMP[1].xyz, CONST[12].xyzz, IN[1].xxxx
> >  5: MAD TEMP[1].xyz, CONST[13].xyzz, IN[1].yyyy, TEMP[1].xyzz
> >  6: MAD TEMP[1].xyz, CONST[14].xyzz, IN[1].zzzz, TEMP[1].xyzz
> >  7: DP3 TEMP[2].x, TEMP[1].xyzz, TEMP[1].xyzz
> >  8: RSQ TEMP[2].x, TEMP[2].xxxx
> >  9: MUL TEMP[1].xyz, TEMP[1].xyzz, TEMP[2].xxxx
> >  10: ADD TEMP[2].xyz, CONST[3].xyzz, -TEMP[0].xyzz
> >  11: DP3 TEMP[3].x, TEMP[2].xyzz, TEMP[2].xyzz
> >  12: RSQ TEMP[3].x, TEMP[3].xxxx
> >  13: MUL TEMP[2].xyz, TEMP[2].xyzz, TEMP[3].xxxx
> >  14: MOV TEMP[3].xyz, -TEMP[2].xyzx
> >  15: MOV TEMP[0].xyz, -TEMP[0].xyzx
> >  16: DP3 TEMP[4].x, TEMP[1].xyzz, TEMP[3].xyzz
> >  17: MUL TEMP[4].xyz, TEMP[4].xxxx, TEMP[1].xyzz
> >  18: MUL TEMP[4].xyz, IMM[0].xxxx, TEMP[4].xyzz
> >  19: ADD TEMP[3].xyz, TEMP[3].xyzz, -TEMP[4].xyzz
> >  20: DP3 TEMP[4].x, TEMP[0].xyzz, TEMP[0].xyzz
> >  21: RSQ TEMP[4].x, TEMP[4].xxxx
> >  22: MUL TEMP[0].xyz, TEMP[0].xyzz, TEMP[4].xxxx
> >  23: DP3 TEMP[0].x, TEMP[3].xyzz, TEMP[0].xyzz
> >  24: MAX TEMP[0].x, TEMP[0].xxxx, IMM[0].yyyy
> >  25: POW TEMP[0].x, TEMP[0].xxxx, CONST[0].xxxx
> >  26: DP3 TEMP[1].x, TEMP[2].xyzz, TEMP[1].xyzz
> >  27: MAX TEMP[1].x, TEMP[1].xxxx, IMM[0].yyyy
> >  28: MUL TEMP[1].x, CONST[1].xxxx, TEMP[1].xxxx
> >  29: MAD OUT[2], CONST[2].xxxx, TEMP[0].xxxx, TEMP[1].xxxx
> >  30: ADD TEMP[0], IN[2], IMM[0].zzzz
> >  31: MUL OUT[1].xyz, TEMP[0].xyzx, IMM[0].wwwx
> >  32: MUL TEMP[0], CONST[8], IN[0].xxxx
> >  33: MAD TEMP[0], CONST[9], IN[0].yyyy, TEMP[0]
> >  34: MAD TEMP[0], CONST[10], IN[0].zzzz, TEMP[0]
> >  35: MAD OUT[0], CONST[11], IN[0].wwww, TEMP[0]
> >  36: END
> >
> > With neither the new optimization passes nor _mesa_optimize_program, the
> > shader has 44 instructions and 40 temporaries.  Both optimized shaders
> > have only 5 temporaries declared.  For every shader I've tried, in fact,
> > my register consolidation passes result in exactly the same number of
> > temporaries being used as when _mesa_optimize_program is used.  In terms
> > of instruction count, the only optimization visible that is implemented
> > in Mesa master but not in the GLSL IR to TGSI converter is copy
> > propagation to output registers, which accounts for 2 of the 3 extra
> > instructions in the st_glsl_to_tgsi version of the shader.
> >
> > One current weakness of my new optimization passes is that they don't
> > optimize code inside of loops as well as they should, although at least
> > they don't break code that uses loops to the best of my knowledge and
> > testing.
> >
> > I'd very much appreciate any comments, feedback, patches, or testing.
>
> I don't have any spare time to test anything right now.  The only
> feedback I have for now would be superficial (whitespace
> inconsistencies, comments, etc).  But I'm glad you're taking on this
> project.
>

FWIW, In order to keep all the other drivers working and especially those
which can't support integer opcodes, there should be a way for a driver to
report that it doesn't accept those opcodes and glsl_to_tgsi shouldn't
generate them then. The cap could be e.g. PIPE_CAP_SM4 or
PIPE_CAP_SHADER_MODEL returning a number >=4.

Marek
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20110428/691a8f1e/attachment.htm>