[Mesa-dev] [PATCH] glsl_to_tgsi: indirect array information
Vincent Lejeune
vljn at ovi.com
Tue Jan 22 17:06:55 PST 2013
----- Mail original -----
> De : Vadim Girlin <vadimgirlin at gmail.com>
> À : Vincent Lejeune <vljn at ovi.com>
> Cc : "mesa-dev at lists.freedesktop.org" <mesa-dev at lists.freedesktop.org>
> Envoyé le : Mercredi 23 janvier 2013 1h21
> Objet : Re: [Mesa-dev] [PATCH] glsl_to_tgsi: indirect array information
>
> On 01/23/2013 03:59 AM, Vincent Lejeune wrote:
>>
>>
>> ----- Mail original -----
>>> De : Vadim Girlin <vadimgirlin at gmail.com>
>>> À : Christoph Bumiller <e0425955 at student.tuwien.ac.at>
>>> Cc : mesa-dev at lists.freedesktop.org
>>> Envoyé le : Mercredi 23 janvier 2013 0h44
>>> Objet : Re: [Mesa-dev] [PATCH] glsl_to_tgsi: indirect array information
>>>
>>> On 01/22/2013 10:59 PM, Christoph Bumiller wrote:
>>>> On 21.01.2013 21:10, Vadim Girlin wrote:
>>>>> Provide the information about indirectly addressable arrays
> (ranges of
>>> temps) in
>>>>> the shader to the drivers. TGSI representation itself
> isn't
>>> modified, array
>>>>> information is passed as an additional data in the
> pipe_shader_state,
>>> so the
>>>>> drivers can use it as a hint for optimization.
>>>>> ---
>>>>>
>>>>> It's far from being an ideal solution, but I saw the
> discussions
>>> about that
>>>>> problem starting from 2009 IIRC, and we still have no
> solution (neither
>>> good
>>>>> nor bad) despite the years passed. I hope we can use this not
> very
>>> intrusive
>>>>> approach until we get something better.
>>>>>
>>>>
>>>> I'd rather not have any hacks in the interface, let alone
> ones that
>>>> solve the problem only partially (you still won't know which
> array is
>>>> accessed by a particular instruction, which is important for
>>>> optimization and essential in some cases for making INPUT/OUTPUT
> arrays
>>>> work), and not just because it reduces the pressure on people to
>>>> implement a proper solution.
>>>>
>>>> With this, you just get to know which range of TEMPs are
> indirectly
>>>> addressed and which ones are not, and you can do the same by
> simply
>>>> creating multiple declarations of TEMPs, one for each array, and
> adding
>>>> a single bit of info to tgsi_declaration (which has 7 bits of
> padding
>>>> anyway, so ample space), which is a lot less ugly, and
> doesn't suffer
>>>> from an arbitrary limit, and doesn't require any modification
> of
>>> drivers
>>>> either.
>>>>
>>>
>>> Array accessed by any indirect operand can be identified by the
>>> immediate offset, e.g. TEMP[ADDR[0].x+1] implies the array starting
> from
>>> 1, thus we can find it's entry in the information provided by this
> patch
>>> to get the addressable range for every indirect operand. If I'm not
>>> missing something, glsl_to_tgsi accumulates all other parts of the
>>> offset in the address register before the indirect access. If I'm
> wrong,
>>> we can fix it to ensure such behavior.
>>
>> I'm not sure about that ; when I worked on indirect addressing of const
> memory,
>> I discovered when tracking vp/fo regression that the immediate offset is
> the result of
>> glsl_to_tgsi constant propagation and not related to the underlying
> array.
>> This means that the dynamic index can be negative, which is not always
>> desirable depending on the hw. (In R600 case, const fetch instruction does
> not
>> support negative index. MOVA inst does).
>>
>> For instance, the following pseudo code snippet is fine for an index value
> of -4 :
>>
>> uniform int index;
>>
>> float array[4];
>> float data = array[6 + index];
>>
>> and is lowered to
>> MOV TEMP[0] TEMP[ADDR[0].x + 6];
>>
>
> I tried the following shader:
>
>> uniform int index;
>>
>> void main()
>> {
>> float array[4] = float[4](0.1, 0.2, 0.3, 0.4);
>> float data = array[6 + index];
>> gl_FragColor = vec4(data, 1.0, 0.0, 1.0);
>> }
>
> Resulting TGSI:
>
>> --------------------------------------------------------------
>> FRAG
>> PROPERTY FS_COLOR0_WRITES_ALL_CBUFS 1
>> DCL OUT[0], COLOR
>> DCL CONST[0]
>> DCL TEMP[0], LOCAL
>> DCL TEMP[1], LOCAL
>> DCL TEMP[2], LOCAL
>> DCL TEMP[3], LOCAL
>> DCL TEMP[4], LOCAL
>> DCL TEMP[5], LOCAL
>> DCL TEMP[6], LOCAL
>> DCL TEMP[7], LOCAL
>> DCL ADDR[0]
>> IMM[0] FLT32 { 0.1000, 0.2000, 0.3000, 0.4000}
>> IMM[1] FLT32 { 1.0000, 0.0000, 0.0000, 0.0000}
>> IMM[2] INT32 {6, 0, 0, 0}
>> 0: MOV TEMP[1].yzw, IMM[1].yxyx
>> 1: MOV TEMP[2], IMM[0].xxxx
>> 2: MOV TEMP[3], IMM[0].yyyy
>> 3: MOV TEMP[4], IMM[0].zzzz
>> 4: MOV TEMP[5], IMM[0].wwww
>> 5: UADD TEMP[6].x, IMM[2].xxxx, CONST[0].xxxx
>> 6: UARL ADDR[0].x, TEMP[6].xxxx
>> 7: MOV TEMP[1].x, TEMP[ADDR[0].x+2].xxxx
>> 8: MOV_SAT OUT[0], TEMP[1]
>> 9: END
>> --------------------------------------------------------------
>
> Also I tried the following:
>
>> uniform float array[4];
>> uniform int index;
>>
>> void main()
>> {
>> float data = array[6 + index];
>> gl_FragColor = vec4(data, 1.0, 0.0, 1.0);
>> }
>
> Resulting TGSI:
>
>> --------------------------------------------------------------
>> FRAG
>> PROPERTY FS_COLOR0_WRITES_ALL_CBUFS 1
>> DCL OUT[0], COLOR
>> DCL CONST[0..4]
>> DCL TEMP[0], LOCAL
>> DCL TEMP[1], LOCAL
>> DCL ADDR[0]
>> IMM[0] FLT32 { 1.0000, 0.0000, 0.0000, 0.0000}
>> IMM[1] INT32 {6, 0, 0, 0}
>> 0: MOV TEMP[0].yzw, IMM[0].yxyx
>> 1: UADD TEMP[1].x, IMM[1].xxxx, CONST[0].xxxx
>> 2: UARL ADDR[0].x, TEMP[1].xxxx
>> 3: MOV TEMP[0].x, CONST[ADDR[0].x+1].xxxx
>> 4: MOV_SAT OUT[0], TEMP[0]
>> 5: END
>> --------------------------------------------------------------
>
> So far immediate offset in the indirect operand is always equal to the
> start offset of the array. Could you provide some more complete example
> that demonstrates the problem, please.
In fact I was wrong, glsl_to_tgsi generates good offset, the behavior only concerns
vpfp.
See the piglit tests :
/home/vlj/piglit/framework/../bin/vpfp-generic -auto tests/shaders/generic/vp-arl-neg-array.vpfp
and
/home/vlj/piglit/framework/../bin/vpfp-generic -auto tests/shaders/generic/vp-arl-neg-array-2.vpfp
>
> Vadim
>
>> I didn't test your patch atm, but I think you may have to fix
> glsl_to_tgsi.
>> Otherwise I'm in favor of implementing something not optimal but far
>> better that what we have currently.
>>
>> Vincent
>>
>>>
>>> I'll be perfectly OK with any other solution, as long as it's a
> really
>>> working (already implemented) solution that I can use today, not just
>>> some abstract ideas in the discussions. This patch isn't perfect
> and can
>>> be improved, but it already works for me. I'll be very happy to use
> any
>>> other solution from you or anyone else.
>>>
>>> Vadim
>>>
>>>>> src/gallium/auxiliary/tgsi/tgsi_ureg.c | 2 ++
>>>>> src/gallium/include/pipe/p_state.h | 13 +++++++
>>>>> src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 57
>>> +++++++++++++++++++++++++-----
>>>>> src/mesa/state_tracker/st_program.c | 4 +++
>>>>> src/mesa/state_tracker/st_program.h | 1 +
>>>>> 5 files changed, 69 insertions(+), 8 deletions(-)
>>>>>
>>>>> diff --git a/src/gallium/auxiliary/tgsi/tgsi_ureg.c
>>> b/src/gallium/auxiliary/tgsi/tgsi_ureg.c
>>>>> index 3c2a923..61db431 100644
>>>>> --- a/src/gallium/auxiliary/tgsi/tgsi_ureg.c
>>>>> +++ b/src/gallium/auxiliary/tgsi/tgsi_ureg.c
>>>>> @@ -1658,6 +1658,8 @@ void *ureg_create_shader( struct
> ureg_program
>>> *ureg,
>>>>> else
>>>>> memset(&state.stream_output, 0,
>>> sizeof(state.stream_output));
>>>>>
>>>>> + memset(&state.array_info, 0,
> sizeof(state.array_info));
>>>>> +
>>>>> if (ureg->processor == TGSI_PROCESSOR_VERTEX)
>>>>> return pipe->create_vs_state( pipe, &state );
>>>>> else
>>>>> diff --git a/src/gallium/include/pipe/p_state.h
>>> b/src/gallium/include/pipe/p_state.h
>>>>> index ab49cab..4490f2e 100644
>>>>> --- a/src/gallium/include/pipe/p_state.h
>>>>> +++ b/src/gallium/include/pipe/p_state.h
>>>>> @@ -65,6 +65,8 @@ extern "C" {
>>>>> #define PIPE_MAX_TEXTURE_LEVELS 16
>>>>> #define PIPE_MAX_SO_BUFFERS 4
>>>>>
>>>>> +#define PIPE_MAX_INDIRECT_ARRAYS 16
>>>>> +
>>>>>
>>>>> struct pipe_reference
>>>>> {
>>>>> @@ -205,11 +207,22 @@ struct pipe_stream_output_info
>>>>> } output[PIPE_MAX_SHADER_OUTPUTS];
>>>>> };
>>>>>
>>>>> +struct pipe_shader_indirect_array {
>>>>> + unsigned index:16;
>>>>> + unsigned size:16;
>>>>> +};
>>>>> +
>>>>> +struct pipe_shader_array_info
>>>>> +{
>>>>> + struct pipe_shader_indirect_array
> arrays[PIPE_MAX_INDIRECT_ARRAYS];
>>>>> + unsigned num_arrays;
>>>>> +};
>>>>>
>>>>> struct pipe_shader_state
>>>>> {
>>>>> const struct tgsi_token *tokens;
>>>>> struct pipe_stream_output_info stream_output;
>>>>> + struct pipe_shader_array_info array_info;
>>>>> };
>>>>>
>>>>>
>>>>> diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
>>> b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
>>>>> index 1d96e90..4ded1be 100644
>>>>> --- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
>>>>> +++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
>>>>> @@ -110,6 +110,7 @@ public:
>>>>> this->index2D = 0;
>>>>> this->type = type ? type->base_type :
> GLSL_TYPE_ERROR;
>>>>> this->reladdr = NULL;
>>>>> + this->size = 0;
>>>>> }
>>>>>
>>>>> st_src_reg(gl_register_file file, int index, int type)
>>>>> @@ -121,6 +122,7 @@ public:
>>>>> this->swizzle = SWIZZLE_XYZW;
>>>>> this->negate = 0;
>>>>> this->reladdr = NULL;
>>>>> + this->size = 0;
>>>>> }
>>>>>
>>>>> st_src_reg(gl_register_file file, int index, int type,
> int
>>> index2D)
>>>>> @@ -132,6 +134,7 @@ public:
>>>>> this->swizzle = SWIZZLE_XYZW;
>>>>> this->negate = 0;
>>>>> this->reladdr = NULL;
>>>>> + this->size = 0;
>>>>> }
>>>>>
>>>>> st_src_reg()
>>>>> @@ -143,6 +146,7 @@ public:
>>>>> this->swizzle = 0;
>>>>> this->negate = 0;
>>>>> this->reladdr = NULL;
>>>>> + this->size = 0;
>>>>> }
>>>>>
>>>>> explicit st_src_reg(st_dst_reg reg);
>>>>> @@ -155,6 +159,7 @@ public:
>>>>> int type; /** GLSL_TYPE_* from GLSL IR (enum
> glsl_base_type) */
>>>>> /** Register index should be offset by the integer in
> this reg. */
>>>>> st_src_reg *reladdr;
>>>>> + int size;
>>>>> };
>>>>>
>>>>> class st_dst_reg {
>>>>> @@ -244,8 +249,9 @@ public:
>>>>>
>>>>> class variable_storage : public exec_node {
>>>>> public:
>>>>> - variable_storage(ir_variable *var, gl_register_file file,
> int
>>> index)
>>>>> - : file(file), index(index), var(var)
>>>>> + variable_storage(ir_variable *var, gl_register_file file,
> int
>>> index,
>>>>> + int size = 0)
>>>>> + : file(file), index(index), var(var), size(size)
>>>>> {
>>>>> /* empty */
>>>>> }
>>>>> @@ -253,6 +259,7 @@ public:
>>>>> gl_register_file file;
>>>>> int index;
>>>>> ir_variable *var; /* variable that maps to this, if any
> */
>>>>> + int size;
>>>>> };
>>>>>
>>>>> class immediate_storage : public exec_node {
>>>>> @@ -312,6 +319,7 @@ public:
>>>>> struct gl_program *prog;
>>>>> struct gl_shader_program *shader_program;
>>>>> struct gl_shader_compiler_options *options;
>>>>> + struct pipe_shader_array_info array_info;
>>>>>
>>>>> int next_temp;
>>>>>
>>>>> @@ -445,6 +453,8 @@ public:
>>>>> void emit_block_mov(ir_assignment *ir, const struct
> glsl_type
>>> *type,
>>>>> st_dst_reg *l, st_src_reg *r);
>>>>>
>>>>> + void add_array_info(const st_src_reg *src);
>>>>> +
>>>>> void *mem_ctx;
>>>>> };
>>>>>
>>>>> @@ -1004,7 +1014,8 @@ glsl_to_tgsi_visitor::get_temp(const
> glsl_type
>>> *type)
>>>>> src.file = PROGRAM_TEMPORARY;
>>>>> src.index = next_temp;
>>>>> src.reladdr = NULL;
>>>>> - next_temp += type_size(type);
>>>>> + src.size = type_size(type);
>>>>> + next_temp += src.size;
>>>>>
>>>>> if (type->is_array() || type->is_record()) {
>>>>> src.swizzle = SWIZZLE_NOOP;
>>>>> @@ -1075,9 +1086,10 @@
> glsl_to_tgsi_visitor::visit(ir_variable *ir)
>>>>> assert((int) ir->num_state_slots ==
>>> type_size(ir->type));
>>>>>
>>>>> storage = new(mem_ctx) variable_storage(ir,
>>> PROGRAM_TEMPORARY,
>>>>> - this->next_temp);
>>>>> +
> this->next_temp,
>>>>> +
>>> type_size(ir->type));
>>>>> this->variables.push_tail(storage);
>>>>> - this->next_temp += type_size(ir->type);
>>>>> + this->next_temp += storage->size;
>>>>>
>>>>> dst = st_dst_reg(st_src_reg(PROGRAM_TEMPORARY,
>>> storage->index,
>>>>> native_integers ? ir->type->base_type
> :
>>> GLSL_TYPE_FLOAT));
>>>>> @@ -2029,10 +2041,10 @@
>>> glsl_to_tgsi_visitor::visit(ir_dereference_variable *ir)
>>>>> case ir_var_auto:
>>>>> case ir_var_temporary:
>>>>> entry = new(mem_ctx) variable_storage(var,
>>> PROGRAM_TEMPORARY,
>>>>> - this->next_temp);
>>>>> +
> this->next_temp,
>>>>> +
>>> type_size(var->type));
>>>>> this->variables.push_tail(entry);
>>>>> -
>>>>> - next_temp += type_size(var->type);
>>>>> + next_temp += entry->size;
>>>>> break;
>>>>> }
>>>>>
>>>>> @@ -2043,6 +2055,8 @@
>>> glsl_to_tgsi_visitor::visit(ir_dereference_variable *ir)
>>>>> }
>>>>>
>>>>> this->result = st_src_reg(entry->file,
> entry->index,
>>> var->type);
>>>>> + this->result.size = entry->size;
>>>>> +
>>>>> if (!native_integers)
>>>>> this->result.type = GLSL_TYPE_FLOAT;
>>>>> }
>>>>> @@ -2062,12 +2076,16 @@
>>> glsl_to_tgsi_visitor::visit(ir_dereference_array *ir)
>>>>> if (index) {
>>>>> src.index += index->value.i[0] * element_size;
>>>>> } else {
>>>>> +
>>>>> /* Variable index array dereference. It eats the
>>> "vec4" of the
>>>>> * base of the array and an index that offsets the
> TGSI
>>> register
>>>>> * index.
>>>>> */
>>>>> ir->array_index->accept(this);
>>>>>
>>>>> + if (src.size > 1 && src.file ==
> PROGRAM_TEMPORARY)
>>>>> + add_array_info(&src);
>>>>> +
>>>>> st_src_reg index_reg;
>>>>>
>>>>> if (element_size == 1) {
>>>>> @@ -2969,6 +2987,7 @@
> glsl_to_tgsi_visitor::glsl_to_tgsi_visitor()
>>>>> prog = NULL;
>>>>> shader_program = NULL;
>>>>> options = NULL;
>>>>> + array_info.num_arrays = 0;
>>>>> }
>>>>>
>>>>> glsl_to_tgsi_visitor::~glsl_to_tgsi_visitor()
>>>>> @@ -3771,6 +3790,23 @@
> glsl_to_tgsi_visitor::renumber_registers(void)
>>>>> this->next_temp = new_index;
>>>>> }
>>>>>
>>>>> +void
>>>>> +glsl_to_tgsi_visitor::add_array_info(const st_src_reg *src)
>>>>> +{
>>>>> + unsigned i;
>>>>> +
>>>>> + if (array_info.num_arrays == PIPE_MAX_INDIRECT_ARRAYS)
>>>>> + return;
>>>>> +
>>>>> + for (i = 0; i < array_info.num_arrays; ++i) {
>>>>> + if (array_info.arrays[i].index ==
> (unsigned)src->index)
>>>>> + return;
>>>>> + }
>>>>> + array_info.arrays[i].index = src->index;
>>>>> + array_info.arrays[i].size = src->size;
>>>>> + ++array_info.num_arrays;
>>>>> +}
>>>>> +
>>>>> /**
>>>>> * Returns a fragment program which implements the current
> pixel
>>> transfer ops.
>>>>> * Based on get_pixel_transfer_program in
> st_atom_pixeltransfer.c.
>>>>> @@ -4798,6 +4834,11 @@ st_translate_program(
>>>>> }
>>>>>
>>>>> if (program->indirect_addr_temps) {
>>>>> + struct pipe_shader_array_info *ai =
>>>>> + &((struct st_fragment_program*)
> proginfo)->array_info;
>>>>> +
>>>>> + memcpy(ai, &program->array_info, sizeof(struct
>>> pipe_shader_array_info));
>>>>> +
>>>>> /* If temps are accessed with indirect addressing,
> declare
>>> temporaries
>>>>> * in sequential order. Else, we declare them on
> demand
>>> elsewhere.
>>>>> * (Note: the number of temporaries is equal to
>>> program->next_temp)
>>>>> diff --git a/src/mesa/state_tracker/st_program.c
>>> b/src/mesa/state_tracker/st_program.c
>>>>> index a9111b5..1370c76 100644
>>>>> --- a/src/mesa/state_tracker/st_program.c
>>>>> +++ b/src/mesa/state_tracker/st_program.c
>>>>> @@ -739,6 +739,10 @@ st_translate_fragment_program(struct
> st_context
>>> *st,
>>>>> fs_output_semantic_index,
> FALSE,
>>>>> key->clamp_color);
>>>>>
>>>>> + if (stfp->glsl_to_tgsi)
>>>>> + memcpy(&variant->tgsi.array_info,
>>> &stfp->array_info,
>>>>> + sizeof(struct pipe_shader_array_info));
>>>>> +
>>>>> variant->tgsi.tokens = ureg_get_tokens( ureg, NULL );
>>>>> ureg_destroy( ureg );
>>>>>
>>>>> diff --git a/src/mesa/state_tracker/st_program.h
>>> b/src/mesa/state_tracker/st_program.h
>>>>> index 23a262c..d6b2b0f 100644
>>>>> --- a/src/mesa/state_tracker/st_program.h
>>>>> +++ b/src/mesa/state_tracker/st_program.h
>>>>> @@ -90,6 +90,7 @@ struct st_fragment_program
>>>>> {
>>>>> struct gl_fragment_program Base;
>>>>> struct glsl_to_tgsi_visitor* glsl_to_tgsi;
>>>>> + struct pipe_shader_array_info array_info;
>>>>>
>>>>> struct st_fp_variant *variants;
>>>>> };
>>>>>
>>>>
>>>
>>> _______________________________________________
>>> mesa-dev mailing list
>>> mesa-dev at lists.freedesktop.org
>>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>>
>
More information about the mesa-dev
mailing list