[Mesa-dev] GLSL IR to TGSI translator

Thu Apr 28 08:19:08 PDT 2011

On 4/27/2011 10:23 PM, Brian Paul wrote:
> On Tue, Apr 26, 2011 at 12:26 AM, Bryan Cain<bryancain3 at gmail.com>  wrote:
>> Hi,
>>
>> In the last week or so, I've been working on a direct translator from
>> GLSL IR to TGSI that does not go through Mesa IR.  Although it is still
>> a work in progress, it is now working and very usable.  So before I go
>> on, here is a link to the branch I've pushed to GitHub:
>>
>> https://github.com/Plombo/mesa/tree/glsl-130
>>
>> My main objective with this work is to make GLSL 1.30 support feasible
>> on Gallium drivers.  From what I understand, it would be difficult or
>> impossible to implement integer-specific opcodes such as shifting and
>> bit masking in Mesa IR, since it only supports floats.  TGSI, on the
>> other hand, doesn't have this problem, and already supports most or all
>> of the functionality required by GLSL 1.30.
> Unfortunately, TGSI doesn't have everything we need yet.  There's
> opcodes for binary AND, OR, XOR, etc. and a few integer operations,
> but it's incomplete.  It shouldn't be a big deal to add what's missing
> but it'll take a little time.
>
> I think everyone agrees that we want to eventually ditch Mesa's IR.  I
> _think_ that the only classic Mesa driver that uses Mesa IR and hasn't
> been deprecated by a Gallium driver, or already weaned from Mesa IR is
> swrast.  How much does the i965 driver still rely on swrast for
> fallbacks?  Do the Intel people see need for a GLSL IR executor for
> swrast?

I must not have noticed the integer functionality missing from TGSI.  I 
assume they're just the arithmetic opcodes?

>> The translator started as a modified version of ir_to_mesa, and that
>> origin is still obvious from reading the code.  Many parts of ir_to_mesa
>> are still untouched - glsl_to_tgsi is still a long way away from
>> eliminating all traces of Mesa IR.  It also contains a significant
>> amount of code adapted from st_mesa_to_tgsi, but modified to generate
>> TGSI code from the glsl_to_tgsi_instruction class instead of using Mesa
>> IR.  (It actually still generates Mesa IR instructions, but that could
>> be safely removed at some point since the generated Mesa IR instructions
>> are not actually used for anything.)  I'm planning to push more of the
>> conversion to TGSI higher up in the stack in the future, although the
>> remaining remnants of Mesa IR (such as the Mesa IR opcodes used by most
>> of glsl_to_tgsi) aren't doing any harm.
> I finally found a little time to look over your code.  As you said,
> it's basically a copy&  paste of the ir_to_mesa.cpp and
> st_mesa_to_tgsi.c code at this time.  Do you plan to eliminate all
> remnants of Mesa IR there before adding support for GLSL 1.30?  One
> easy step would be to replace use of Mesa IR opcodes with TGSI opcodes
> and add new TGSI opcodes for integer ops.

I do plan to eliminate the Mesa IR remnants, or the opcodes at the very 
least, before working on GLSL 1.30 support.  The main reason I haven't 
replaced the Mesa IR opcodes yet is _mesa_num_src_regs and 
_mesa_num_dst_regs.  Are there equivalents to these that work with TGSI 
opcodes?

>> Since the _mesa_optimize_program function is vital to generating
>> optimized code with ir_to_mesa, and it is not available when not using
>> Mesa IR, I've written some new optimization passes for
>> glsl_to_tgsi_visitor that perform dead code elimination and
>> consolidation of the temporary register space.  Although they are rather
>> simple, they do make a huge difference in the quality of the output.  As
>> an example, here is what it generates for the vertex shader in the
>> Mandelbrot GLSL demo from the Mesa demos repository:
>>
>> VERT
>> DCL IN[0]
>> DCL IN[1]
>> DCL IN[2]
>> DCL OUT[0], POSITION
>> DCL OUT[1], GENERIC[10]
>> DCL OUT[2], GENERIC[11]
>> DCL CONST[0..14]
>> DCL TEMP[0..4]
>> IMM FLT32 {    2.0000,     0.0000,    -0.5000,     5.0000}
>>   0: MUL TEMP[0], CONST[4], IN[0].xxxx
>>   1: MAD TEMP[0], CONST[5], IN[0].yyyy, TEMP[0]
>>   2: MAD TEMP[0], CONST[6], IN[0].zzzz, TEMP[0]
>>   3: MAD TEMP[0], CONST[7], IN[0].wwww, TEMP[0]
>>   4: MUL TEMP[1].xyz, CONST[12].xyzz, IN[1].xxxx
>>   5: MAD TEMP[1], CONST[13].xyzz, IN[1].yyyy, TEMP[1].xyzz
>>   6: MAD TEMP[1], CONST[14].xyzz, IN[1].zzzz, TEMP[1].xyzz
>>   7: DP3 TEMP[2].x, TEMP[1].xyzz, TEMP[1].xyzz
>>   8: RSQ TEMP[2].x, TEMP[2].xxxx
>>   9: MUL TEMP[1].xyz, TEMP[1].xyzz, TEMP[2].xxxx
>>   10: ADD TEMP[2].xyz, CONST[3].xyzz, -TEMP[0].xyzz
>>   11: DP3 TEMP[3].x, TEMP[2].xyzz, TEMP[2].xyzz
>>   12: RSQ TEMP[3].x, TEMP[3].xxxx
>>   13: MUL TEMP[2].xyz, TEMP[2].xyzz, TEMP[3].xxxx
>>   14: MOV TEMP[3].xyz, -TEMP[2].xyzx
>>   15: MOV TEMP[0].xyz, -TEMP[0].xyzx
>>   16: DP3 TEMP[4].x, TEMP[1].xyzz, TEMP[3].xyzz
>>   17: MUL TEMP[4].xyz, TEMP[4].xxxx, TEMP[1].xyzz
>>   18: MUL TEMP[4].xyz, IMM[0].xxxx, TEMP[4].xyzz
>>   19: ADD TEMP[3].xyz, TEMP[3].xyzz, -TEMP[4].xyzz
>>   20: DP3 TEMP[4].x, TEMP[0].xyzz, TEMP[0].xyzz
>>   21: RSQ TEMP[4].x, TEMP[4].xxxx
>>   22: MUL TEMP[0].xyz, TEMP[0].xyzz, TEMP[4].xxxx
>>   23: DP3 TEMP[0].x, TEMP[3].xyzz, TEMP[0].xyzz
>>   24: MAX TEMP[0].x, TEMP[0].xxxx, IMM[0].yyyy
>>   25: POW TEMP[0].x, TEMP[0].xxxx, CONST[0].xxxx
>>   26: DP3 TEMP[1].x, TEMP[2].xyzz, TEMP[1].xyzz
>>   27: MAX TEMP[1].x, TEMP[1].xxxx, IMM[0].yyyy
>>   28: MUL TEMP[1].x, CONST[1].xxxx, TEMP[1].xxxx
>>   29: MAD TEMP[0], CONST[2].xxxx, TEMP[0].xxxx, TEMP[1].xxxx
>>   30: MOV OUT[2], TEMP[0].xxxx
>>   31: ADD TEMP[0], IN[2], IMM[0].zzzz
>>   32: MUL TEMP[0].xyz, TEMP[0].xyzz, IMM[0].wwww
>>   33: MOV OUT[1].xyz, TEMP[0].xyzx
>>   34: MUL TEMP[0], CONST[8], IN[0].xxxx
>>   35: MAD TEMP[0], CONST[9], IN[0].yyyy, TEMP[0]
>>   36: MAD TEMP[0], CONST[10], IN[0].zzzz, TEMP[0]
>>   37: MAD TEMP[0], CONST[11], IN[0].wwww, TEMP[0]
>>   38: MOV OUT[0], TEMP[0]
>>   39: END
>>
>> Here is the same shader as generated by ir_to_mesa and st_mesa_to_tgsi
>> in Mesa master:
>>
>> VERT
>> DCL IN[0]
>> DCL IN[1]
>> DCL IN[2]
>> DCL OUT[0], POSITION
>> DCL OUT[1], GENERIC[10]
>> DCL OUT[2], GENERIC[11]
>> DCL CONST[0..14]
>> DCL TEMP[0..4]
>> IMM FLT32 {    2.0000,     0.0000,    -0.5000,     5.0000}
>>   0: MUL TEMP[0], CONST[4], IN[0].xxxx
>>   1: MAD TEMP[0], CONST[5], IN[0].yyyy, TEMP[0]
>>   2: MAD TEMP[0], CONST[6], IN[0].zzzz, TEMP[0]
>>   3: MAD TEMP[0], CONST[7], IN[0].wwww, TEMP[0]
>>   4: MUL TEMP[1].xyz, CONST[12].xyzz, IN[1].xxxx
>>   5: MAD TEMP[1].xyz, CONST[13].xyzz, IN[1].yyyy, TEMP[1].xyzz
>>   6: MAD TEMP[1].xyz, CONST[14].xyzz, IN[1].zzzz, TEMP[1].xyzz
>>   7: DP3 TEMP[2].x, TEMP[1].xyzz, TEMP[1].xyzz
>>   8: RSQ TEMP[2].x, TEMP[2].xxxx
>>   9: MUL TEMP[1].xyz, TEMP[1].xyzz, TEMP[2].xxxx
>>   10: ADD TEMP[2].xyz, CONST[3].xyzz, -TEMP[0].xyzz
>>   11: DP3 TEMP[3].x, TEMP[2].xyzz, TEMP[2].xyzz
>>   12: RSQ TEMP[3].x, TEMP[3].xxxx
>>   13: MUL TEMP[2].xyz, TEMP[2].xyzz, TEMP[3].xxxx
>>   14: MOV TEMP[3].xyz, -TEMP[2].xyzx
>>   15: MOV TEMP[0].xyz, -TEMP[0].xyzx
>>   16: DP3 TEMP[4].x, TEMP[1].xyzz, TEMP[3].xyzz
>>   17: MUL TEMP[4].xyz, TEMP[4].xxxx, TEMP[1].xyzz
>>   18: MUL TEMP[4].xyz, IMM[0].xxxx, TEMP[4].xyzz
>>   19: ADD TEMP[3].xyz, TEMP[3].xyzz, -TEMP[4].xyzz
>>   20: DP3 TEMP[4].x, TEMP[0].xyzz, TEMP[0].xyzz
>>   21: RSQ TEMP[4].x, TEMP[4].xxxx
>>   22: MUL TEMP[0].xyz, TEMP[0].xyzz, TEMP[4].xxxx
>>   23: DP3 TEMP[0].x, TEMP[3].xyzz, TEMP[0].xyzz
>>   24: MAX TEMP[0].x, TEMP[0].xxxx, IMM[0].yyyy
>>   25: POW TEMP[0].x, TEMP[0].xxxx, CONST[0].xxxx
>>   26: DP3 TEMP[1].x, TEMP[2].xyzz, TEMP[1].xyzz
>>   27: MAX TEMP[1].x, TEMP[1].xxxx, IMM[0].yyyy
>>   28: MUL TEMP[1].x, CONST[1].xxxx, TEMP[1].xxxx
>>   29: MAD OUT[2], CONST[2].xxxx, TEMP[0].xxxx, TEMP[1].xxxx
>>   30: ADD TEMP[0], IN[2], IMM[0].zzzz
>>   31: MUL OUT[1].xyz, TEMP[0].xyzx, IMM[0].wwwx
>>   32: MUL TEMP[0], CONST[8], IN[0].xxxx
>>   33: MAD TEMP[0], CONST[9], IN[0].yyyy, TEMP[0]
>>   34: MAD TEMP[0], CONST[10], IN[0].zzzz, TEMP[0]
>>   35: MAD OUT[0], CONST[11], IN[0].wwww, TEMP[0]
>>   36: END
>>
>> With neither the new optimization passes nor _mesa_optimize_program, the
>> shader has 44 instructions and 40 temporaries.  Both optimized shaders
>> have only 5 temporaries declared.  For every shader I've tried, in fact,
>> my register consolidation passes result in exactly the same number of
>> temporaries being used as when _mesa_optimize_program is used.  In terms
>> of instruction count, the only optimization visible that is implemented
>> in Mesa master but not in the GLSL IR to TGSI converter is copy
>> propagation to output registers, which accounts for 2 of the 3 extra
>> instructions in the st_glsl_to_tgsi version of the shader.
>>
>> One current weakness of my new optimization passes is that they don't
>> optimize code inside of loops as well as they should, although at least
>> they don't break code that uses loops to the best of my knowledge and
>> testing.
>>
>> I'd very much appreciate any comments, feedback, patches, or testing.
> I don't have any spare time to test anything right now.  The only
> feedback I have for now would be superficial (whitespace
> inconsistencies, comments, etc).  But I'm glad you're taking on this
> project.
>
> -Brian

Okay, thanks.

Bryan