[Mesa-dev] [RFC] Solving the TGSI indirect addressing optimization problem

Mon Mar 11 07:38:26 PDT 2013

Am 11.03.2013 14:47, schrieb Christoph Bumiller:
> On 11.03.2013 13:44, Christian König wrote:
>> Hi everybody,
>>
>> this problem has been open for quite some time now, with a bunch of different
>> opinions and sometimes even patches floating on the list.
> Nice, finally someone implements a proper solution.
> However, it seems like this isn't used for arrays in the IN and OUT
> files (varyings). Would it be much more work to use it there, too ?

Shouldn't be to much of a problem, but I just wanted to solve 
temporaries first and when that's working look at all the rest.

> Fragment Shader inputs seem to be read with "if (index == 0) return
> in[0] else if (index == 1) ..." sequences.

Well as said before it only handles temp arrays for now. That looks like 
the code that's generated if the driver reports to not have indirect 
support, do you know off hand where exactly that's handled? The 
glsl_to_tgsi code is unfortunately hard to read at best.

>
> And I may have spotted a bug in the following shader:
>
> in vec4 vertex[2];
> in vec4 color;
> out vec4 value[4];
>
> uniform int i, j;
>
> void main()
> {
>      gl_Position = vertex[i];
>
>      value[0] = vertex[0];
>      value[1] = vertex[1];
>      value[2] = vec4(0.0);
>      value[3] = vec4(0.0);
>      value[j] = color;
> }
>
> gives me
>
> DCL IN[0]
> DCL IN[1]
> DCL IN[2]
> DCL OUT[0], POSITION
> DCL OUT[1], GENERIC[12]
> DCL OUT[2], GENERIC[13]
> DCL OUT[3], GENERIC[14]
> DCL OUT[4], GENERIC[15]
> DCL CONST[0..1]
> DCL TEMP[0..3], LOCAL
> DCL TEMP[4], LOCAL
> DCL ADDR[0]
> IMM[0] FLT32 {    0.0000,     0.0000,     0.0000,     0.0000}
>    0: UARL ADDR[0].x, CONST[1].xxxx
>    1: MOV TEMP[4], IN[ADDR[0].x] <<< (not the bug) but this is invalid as
> there is no IN array, just single ones
>    2: MOV TEMP[0], IN[0]
>    3: MOV TEMP[1], IN[1]
>    4: MOV TEMP[2], IMM[0].xxxx
>    5: MOV TEMP[3], IMM[0].xxxx
>    6: UARL ADDR[0].x, CONST[0].xxxx
>    7: MOV TEMP[1][ADDR[0].x], IN[2]
> <<<
> why is this TEMP[1][] ? The array seems to be the first declaration ...

I numbered the declarations starting with 1 (and not 0), so I could use 
0 as "the SPECIAL case" saying that we want to address the whole range 
of registers and not just one declaration. I did this just for 
compatibility reasons, so I could look at handling temps only, and 
doesn't bother to much with inputs/outputs.

Well so far the patchset is just an RFC, and so I want to let the list 
see the patches before either implementing inputs/outputs as well or 
fully document such quirks/hacks.

>    8: MOV OUT[1], TEMP[0]
>    9: MOV OUT[2], TEMP[1]
>   10: MOV OUT[3], TEMP[2]
>   11: MOV OUT[4], TEMP[3]
>   12: MOV OUT[0], TEMP[4]
>   13: END
>
> Ideally this would not use TEMP arrays at all though, but output arrays
> (I vaguely recall some radeon card doesn't support this though. Is that
> just outputs or also inputs ?).

More or less correct, modern radeons don't have an "output" register 
space, but instead have "export" instructions. In the current driver we 
allocate registers for temps and outputs to work around this, but in the 
example above it wouldn't be necessary.

Inputs are just registers as well, either preloaded when starting the 
shader or filled in by special instructions (vector fetches, coordinate 
interpolation etc...).

Thanks for the comments,
Christian.