[Mesa-dev] [PATCH] glsl_to_tgsi: indirect array information
Vadim Girlin
vadimgirlin at gmail.com
Tue Jan 22 17:07:21 PST 2013
On 01/23/2013 04:42 AM, Christoph Bumiller wrote:
> On 23.01.2013 01:21, Vadim Girlin wrote:
>> On 01/23/2013 03:59 AM, Vincent Lejeune wrote:
>>>
>>>
>>> ----- Mail original -----
>>>> De : Vadim Girlin <vadimgirlin at gmail.com>
>>>> À : Christoph Bumiller <e0425955 at student.tuwien.ac.at>
>>>> Cc : mesa-dev at lists.freedesktop.org
>>>> Envoyé le : Mercredi 23 janvier 2013 0h44
>>>> Objet : Re: [Mesa-dev] [PATCH] glsl_to_tgsi: indirect array information
>>>>
>>>> On 01/22/2013 10:59 PM, Christoph Bumiller wrote:
>>>>> On 21.01.2013 21:10, Vadim Girlin wrote:
>>>>>> Provide the information about indirectly addressable arrays
>>>>>> (ranges of
>>>> temps) in
>>>>>> the shader to the drivers. TGSI representation itself isn't
>>>> modified, array
>>>>>> information is passed as an additional data in the
>>>>>> pipe_shader_state,
>>>> so the
>>>>>> drivers can use it as a hint for optimization.
>>>>>> ---
>>>>>>
>>>>>> It's far from being an ideal solution, but I saw the discussions
>>>> about that
>>>>>> problem starting from 2009 IIRC, and we still have no solution
>>>>>> (neither
>>>> good
>>>>>> nor bad) despite the years passed. I hope we can use this not very
>>>> intrusive
>>>>>> approach until we get something better.
>>>>>>
>>>>>
>>>>> I'd rather not have any hacks in the interface, let alone ones that
>>>>> solve the problem only partially (you still won't know which array is
>>>>> accessed by a particular instruction, which is important for
>>>>> optimization and essential in some cases for making INPUT/OUTPUT
>>>>> arrays
>>>>> work), and not just because it reduces the pressure on people to
>>>>> implement a proper solution.
>>>>>
>>>>> With this, you just get to know which range of TEMPs are indirectly
>>>>> addressed and which ones are not, and you can do the same by simply
>>>>> creating multiple declarations of TEMPs, one for each array, and
>>>>> adding
>>>>> a single bit of info to tgsi_declaration (which has 7 bits of padding
>>>>> anyway, so ample space), which is a lot less ugly, and doesn't suffer
>>>>> from an arbitrary limit, and doesn't require any modification of
>>>> drivers
>>>>> either.
>>>>>
>>>>
>>>> Array accessed by any indirect operand can be identified by the
>>>> immediate offset, e.g. TEMP[ADDR[0].x+1] implies the array starting from
>>>> 1, thus we can find it's entry in the information provided by this patch
>>>> to get the addressable range for every indirect operand. If I'm not
>>>> missing something, glsl_to_tgsi accumulates all other parts of the
>>>> offset in the address register before the indirect access. If I'm wrong,
>>>> we can fix it to ensure such behavior.
>>>
>>> I'm not sure about that ; when I worked on indirect addressing of
>>> const memory,
>>> I discovered when tracking vp/fo regression that the immediate offset
>>> is the result of
>>> glsl_to_tgsi constant propagation and not related to the underlying
>>> array.
>>> This means that the dynamic index can be negative, which is not always
>>> desirable depending on the hw. (In R600 case, const fetch instruction
>>> does not
>>> support negative index. MOVA inst does).
>>>
>>> For instance, the following pseudo code snippet is fine for an index
>>> value of -4 :
>>>
>>> uniform int index;
>>>
>>> float array[4];
>>> float data = array[6 + index];
>>>
>>> and is lowered to
>>> MOV TEMP[0] TEMP[ADDR[0].x + 6];
>>>
>>
>> I tried the following shader:
>>
>>> uniform int index;
>>>
>>> void main()
>>> {
>>> float array[4] = float[4](0.1, 0.2, 0.3, 0.4);
>>> float data = array[6 + index];
>>> gl_FragColor = vec4(data, 1.0, 0.0, 1.0);
>>> }
>>
>> Resulting TGSI:
>>
>>> --------------------------------------------------------------
>>> FRAG
>>> PROPERTY FS_COLOR0_WRITES_ALL_CBUFS 1
>>> DCL OUT[0], COLOR
>>> DCL CONST[0]
>>> DCL TEMP[0], LOCAL
>>> DCL TEMP[1], LOCAL
>>> DCL TEMP[2], LOCAL
>>> DCL TEMP[3], LOCAL
>>> DCL TEMP[4], LOCAL
>>> DCL TEMP[5], LOCAL
>>> DCL TEMP[6], LOCAL
>>> DCL TEMP[7], LOCAL
>>> DCL ADDR[0]
>>> IMM[0] FLT32 { 0.1000, 0.2000, 0.3000, 0.4000}
>>> IMM[1] FLT32 { 1.0000, 0.0000, 0.0000, 0.0000}
>>> IMM[2] INT32 {6, 0, 0, 0}
>>> 0: MOV TEMP[1].yzw, IMM[1].yxyx
>>> 1: MOV TEMP[2], IMM[0].xxxx
>>> 2: MOV TEMP[3], IMM[0].yyyy
>>> 3: MOV TEMP[4], IMM[0].zzzz
>>> 4: MOV TEMP[5], IMM[0].wwww
>>> 5: UADD TEMP[6].x, IMM[2].xxxx, CONST[0].xxxx
>>> 6: UARL ADDR[0].x, TEMP[6].xxxx
>>> 7: MOV TEMP[1].x, TEMP[ADDR[0].x+2].xxxx
>>> 8: MOV_SAT OUT[0], TEMP[1]
>>> 9: END
>>> --------------------------------------------------------------
>>
>> Also I tried the following:
>>
>>> uniform float array[4];
>>> uniform int index;
>>>
>>> void main()
>>> {
>>> float data = array[6 + index];
>>> gl_FragColor = vec4(data, 1.0, 0.0, 1.0);
>>> }
>>
>> Resulting TGSI:
>>
>>> --------------------------------------------------------------
>>> FRAG
>>> PROPERTY FS_COLOR0_WRITES_ALL_CBUFS 1
>>> DCL OUT[0], COLOR
>>> DCL CONST[0..4]
>>> DCL TEMP[0], LOCAL
>>> DCL TEMP[1], LOCAL
>>> DCL ADDR[0]
>>> IMM[0] FLT32 { 1.0000, 0.0000, 0.0000, 0.0000}
>>> IMM[1] INT32 {6, 0, 0, 0}
>>> 0: MOV TEMP[0].yzw, IMM[0].yxyx
>>> 1: UADD TEMP[1].x, IMM[1].xxxx, CONST[0].xxxx
>>> 2: UARL ADDR[0].x, TEMP[1].xxxx
>>> 3: MOV TEMP[0].x, CONST[ADDR[0].x+1].xxxx
>>> 4: MOV_SAT OUT[0], TEMP[0]
>>> 5: END
>>> --------------------------------------------------------------
>>
>> So far immediate offset in the indirect operand is always equal to the
>> start offset of the array. Could you provide some more complete example
>> that demonstrates the problem, please.
>>
>> Vadim
>>
>
> Not really, because shaders like
>
> float array[8];
>
> uniform int pos;
>
> void main()
> {
> array[0] = 1.0;
> array[1] = 2.0;
> array[2] = 3.0;
> array[3] = 4.0;
> gl_FragColor = vec4(array[pos - 16],
> array[pos - 17],
> array[pos - 18],
> array[pos - 19]);
> }
>
> yield the terribly unoptimized
>
> 0: MOV TEMP[1].x, IMM[0].xxxx
> 1: MOV TEMP[2].x, IMM[0].yyyy
> 2: MOV TEMP[3].x, IMM[0].zzzz
> 3: MOV TEMP[4].x, IMM[0].wwww
> 4: UADD TEMP[9].x, CONST[0].xxxx, IMM[1].xxxx
> 5: UARL ADDR[0].x, TEMP[9].xxxx
> 6: MOV TEMP[10].x, TEMP[ADDR[0].x+1].xxxx
> 7: UADD TEMP[11].x, CONST[0].xxxx, IMM[1].yyyy
> 8: UARL ADDR[0].x, TEMP[11].xxxx
> 9: MOV TEMP[10].y, TEMP[ADDR[0].x+1].xxxx
> 10: UADD TEMP[12].x, CONST[0].xxxx, IMM[1].zzzz
> 11: UARL ADDR[0].x, TEMP[12].xxxx
> 12: MOV TEMP[10].z, TEMP[ADDR[0].x+1].xxxx
> 13: UADD TEMP[13].x, CONST[0].xxxx, IMM[1].wwww
> 14: UARL ADDR[0].x, TEMP[13].xxxx
> 15: MOV TEMP[10].w, TEMP[ADDR[0].x+1].xxxx
> 16: MOV OUT[0], TEMP[10]
> 17: END
>
> instead of simply adjusting the offset and NOT emitting tons of ARLs.
> But this is NOT guaranteed behaviour and neither should it be (I did
> suggest that in the past, but some people disagreed and they convinced me).
>
I agree that it's terribly unoptimized, but it doesn't change the fact
that currently we can use immediate offset to match it to the array
info. Is anybody going to optimize this tomorrow to break this patch?
We can discuss it forever though and probably it doesn't makes sense.
This discussion won't be any different from the previous discussions of
that problem - the result will be same - we'll have no working solution.
Vadim
> Also, mesa is not the only state tracker out there so try not to rely on
> it's special perks too much.
>
>>> I didn't test your patch atm, but I think you may have to fix
>>> glsl_to_tgsi.
>>> Otherwise I'm in favor of implementing something not optimal but far
>>> better that what we have currently.
>>>
>>> Vincent
>>>
>>>>
>>>> I'll be perfectly OK with any other solution, as long as it's a really
>>>> working (already implemented) solution that I can use today, not just
>>>> some abstract ideas in the discussions. This patch isn't perfect and can
>>>> be improved, but it already works for me. I'll be very happy to use any
>>>> other solution from you or anyone else.
>>>>
>>>> Vadim
>>>>
>
>
More information about the mesa-dev
mailing list