[Mesa-dev] [PATCH] glsl_to_tgsi: indirect array information

Tue Jan 22 16:21:16 PST 2013

On 01/23/2013 03:59 AM, Vincent Lejeune wrote:
>
>
> ----- Mail original -----
>> De : Vadim Girlin <vadimgirlin at gmail.com>
>> À : Christoph Bumiller <e0425955 at student.tuwien.ac.at>
>> Cc : mesa-dev at lists.freedesktop.org
>> Envoyé le : Mercredi 23 janvier 2013 0h44
>> Objet : Re: [Mesa-dev] [PATCH] glsl_to_tgsi: indirect array information
>>
>> On 01/22/2013 10:59 PM, Christoph Bumiller wrote:
>>>   On 21.01.2013 21:10, Vadim Girlin wrote:
>>>>   Provide the information about indirectly addressable arrays (ranges of
>> temps) in
>>>>   the shader to the drivers. TGSI representation itself isn't
>> modified, array
>>>>   information is passed as an additional data in the pipe_shader_state,
>> so the
>>>>   drivers can use it as a hint for optimization.
>>>>   ---
>>>>
>>>>   It's far from being an ideal solution, but I saw the discussions
>> about that
>>>>   problem starting from 2009 IIRC, and we still have no solution (neither
>> good
>>>>   nor bad) despite the years passed. I hope we can use this not very
>> intrusive
>>>>   approach until we get something better.
>>>>
>>>
>>>   I'd rather not have any hacks in the interface, let alone ones that
>>>   solve the problem only partially (you still won't know which array is
>>>   accessed by a particular instruction, which is important for
>>>   optimization and essential in some cases for making INPUT/OUTPUT arrays
>>>   work), and not just because it reduces the pressure on people to
>>>   implement a proper solution.
>>>
>>>   With this, you just get to know which range of TEMPs are indirectly
>>>   addressed and which ones are not, and you can do the same by simply
>>>   creating multiple declarations of TEMPs, one for each array, and adding
>>>   a single bit of info to tgsi_declaration (which has 7 bits of padding
>>>   anyway, so ample space), which is a lot less ugly, and doesn't suffer
>>>   from an arbitrary limit, and doesn't require any modification of
>> drivers
>>>   either.
>>>
>>
>> Array accessed by any indirect operand can be identified by the
>> immediate offset, e.g. TEMP[ADDR[0].x+1] implies the array starting from
>> 1, thus we can find it's entry in the information provided by this patch
>> to get the addressable range for every indirect operand. If I'm not
>> missing something, glsl_to_tgsi accumulates all other parts of the
>> offset in the address register before the indirect access. If I'm wrong,
>> we can fix it to ensure such behavior.
>
> I'm not sure about that ; when I worked on indirect addressing of const memory,
> I discovered when tracking vp/fo regression that the immediate offset is the result of
>   glsl_to_tgsi constant propagation and not related to the underlying array.
> This means that the dynamic index can be negative, which is not always
> desirable depending on the hw. (In R600 case, const fetch instruction does not
> support negative index. MOVA inst does).
>
> For instance, the following pseudo code snippet is fine for an index value of -4 :
>
> uniform int index;
>
> float array[4];
> float data = array[6 + index];
>
> and is lowered to
> MOV TEMP[0] TEMP[ADDR[0].x + 6];
>

I tried the following shader:

> uniform int index;
>
> void main()
> {
>     float array[4] = float[4](0.1, 0.2, 0.3, 0.4);
>     float data = array[6 + index];
>     gl_FragColor = vec4(data, 1.0, 0.0, 1.0);
> }

Resulting TGSI:

> --------------------------------------------------------------
> FRAG
> PROPERTY FS_COLOR0_WRITES_ALL_CBUFS 1
> DCL OUT[0], COLOR
> DCL CONST[0]
> DCL TEMP[0], LOCAL
> DCL TEMP[1], LOCAL
> DCL TEMP[2], LOCAL
> DCL TEMP[3], LOCAL
> DCL TEMP[4], LOCAL
> DCL TEMP[5], LOCAL
> DCL TEMP[6], LOCAL
> DCL TEMP[7], LOCAL
> DCL ADDR[0]
> IMM[0] FLT32 {    0.1000,     0.2000,     0.3000,     0.4000}
> IMM[1] FLT32 {    1.0000,     0.0000,     0.0000,     0.0000}
> IMM[2] INT32 {6, 0, 0, 0}
>   0: MOV TEMP[1].yzw, IMM[1].yxyx
>   1: MOV TEMP[2], IMM[0].xxxx
>   2: MOV TEMP[3], IMM[0].yyyy
>   3: MOV TEMP[4], IMM[0].zzzz
>   4: MOV TEMP[5], IMM[0].wwww
>   5: UADD TEMP[6].x, IMM[2].xxxx, CONST[0].xxxx
>   6: UARL ADDR[0].x, TEMP[6].xxxx
>   7: MOV TEMP[1].x, TEMP[ADDR[0].x+2].xxxx
>   8: MOV_SAT OUT[0], TEMP[1]
>   9: END
> --------------------------------------------------------------

Also I tried the following:

> uniform float array[4];
> uniform int index;
>
> void main()
> {
>     float data = array[6 + index];
>     gl_FragColor = vec4(data, 1.0, 0.0, 1.0);
> }

Resulting TGSI:

> --------------------------------------------------------------
> FRAG
> PROPERTY FS_COLOR0_WRITES_ALL_CBUFS 1
> DCL OUT[0], COLOR
> DCL CONST[0..4]
> DCL TEMP[0], LOCAL
> DCL TEMP[1], LOCAL
> DCL ADDR[0]
> IMM[0] FLT32 {    1.0000,     0.0000,     0.0000,     0.0000}
> IMM[1] INT32 {6, 0, 0, 0}
>   0: MOV TEMP[0].yzw, IMM[0].yxyx
>   1: UADD TEMP[1].x, IMM[1].xxxx, CONST[0].xxxx
>   2: UARL ADDR[0].x, TEMP[1].xxxx
>   3: MOV TEMP[0].x, CONST[ADDR[0].x+1].xxxx
>   4: MOV_SAT OUT[0], TEMP[0]
>   5: END
> --------------------------------------------------------------

So far immediate offset in the indirect operand is always equal to the 
start offset of the array. Could you provide some more complete example 
that demonstrates the problem, please.

Vadim

> I didn't test your patch atm, but I think you may have to fix glsl_to_tgsi.
> Otherwise I'm in favor of implementing something not optimal but far
> better that what we have currently.
>
> Vincent
>
>>
>> I'll be perfectly OK with any other solution, as long as it's a really
>> working (already implemented) solution that I can use today, not just
>> some abstract ideas in the discussions. This patch isn't perfect and can
>> be improved, but it already works for me. I'll be very happy to use any
>> other solution from you or anyone else.
>>
>> Vadim
>>
>>>>     src/gallium/auxiliary/tgsi/tgsi_ureg.c     |  2 ++
>>>>     src/gallium/include/pipe/p_state.h         | 13 +++++++
>>>>     src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 57
>> +++++++++++++++++++++++++-----
>>>>     src/mesa/state_tracker/st_program.c        |  4 +++
>>>>     src/mesa/state_tracker/st_program.h        |  1 +
>>>>     5 files changed, 69 insertions(+), 8 deletions(-)
>>>>
>>>>   diff --git a/src/gallium/auxiliary/tgsi/tgsi_ureg.c
>> b/src/gallium/auxiliary/tgsi/tgsi_ureg.c
>>>>   index 3c2a923..61db431 100644
>>>>   --- a/src/gallium/auxiliary/tgsi/tgsi_ureg.c
>>>>   +++ b/src/gallium/auxiliary/tgsi/tgsi_ureg.c
>>>>   @@ -1658,6 +1658,8 @@ void *ureg_create_shader( struct ureg_program
>> *ureg,
>>>>        else
>>>>           memset(&state.stream_output, 0,
>> sizeof(state.stream_output));
>>>>
>>>>   +   memset(&state.array_info, 0, sizeof(state.array_info));
>>>>   +
>>>>        if (ureg->processor == TGSI_PROCESSOR_VERTEX)
>>>>           return pipe->create_vs_state( pipe, &state );
>>>>        else
>>>>   diff --git a/src/gallium/include/pipe/p_state.h
>> b/src/gallium/include/pipe/p_state.h
>>>>   index ab49cab..4490f2e 100644
>>>>   --- a/src/gallium/include/pipe/p_state.h
>>>>   +++ b/src/gallium/include/pipe/p_state.h
>>>>   @@ -65,6 +65,8 @@ extern "C" {
>>>>     #define PIPE_MAX_TEXTURE_LEVELS   16
>>>>     #define PIPE_MAX_SO_BUFFERS        4
>>>>
>>>>   +#define PIPE_MAX_INDIRECT_ARRAYS  16
>>>>   +
>>>>
>>>>     struct pipe_reference
>>>>     {
>>>>   @@ -205,11 +207,22 @@ struct pipe_stream_output_info
>>>>        } output[PIPE_MAX_SHADER_OUTPUTS];
>>>>     };
>>>>
>>>>   +struct pipe_shader_indirect_array {
>>>>   +   unsigned index:16;
>>>>   +   unsigned size:16;
>>>>   +};
>>>>   +
>>>>   +struct pipe_shader_array_info
>>>>   +{
>>>>   +   struct pipe_shader_indirect_array arrays[PIPE_MAX_INDIRECT_ARRAYS];
>>>>   +   unsigned num_arrays;
>>>>   +};
>>>>
>>>>     struct pipe_shader_state
>>>>     {
>>>>        const struct tgsi_token *tokens;
>>>>        struct pipe_stream_output_info stream_output;
>>>>   +   struct pipe_shader_array_info array_info;
>>>>     };
>>>>
>>>>
>>>>   diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
>> b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
>>>>   index 1d96e90..4ded1be 100644
>>>>   --- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
>>>>   +++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
>>>>   @@ -110,6 +110,7 @@ public:
>>>>           this->index2D = 0;
>>>>           this->type = type ? type->base_type : GLSL_TYPE_ERROR;
>>>>           this->reladdr = NULL;
>>>>   +      this->size = 0;
>>>>        }
>>>>
>>>>        st_src_reg(gl_register_file file, int index, int type)
>>>>   @@ -121,6 +122,7 @@ public:
>>>>           this->swizzle = SWIZZLE_XYZW;
>>>>           this->negate = 0;
>>>>           this->reladdr = NULL;
>>>>   +      this->size = 0;
>>>>        }
>>>>
>>>>        st_src_reg(gl_register_file file, int index, int type, int
>> index2D)
>>>>   @@ -132,6 +134,7 @@ public:
>>>>           this->swizzle = SWIZZLE_XYZW;
>>>>           this->negate = 0;
>>>>           this->reladdr = NULL;
>>>>   +      this->size = 0;
>>>>        }
>>>>
>>>>        st_src_reg()
>>>>   @@ -143,6 +146,7 @@ public:
>>>>           this->swizzle = 0;
>>>>           this->negate = 0;
>>>>           this->reladdr = NULL;
>>>>   +      this->size = 0;
>>>>        }
>>>>
>>>>        explicit st_src_reg(st_dst_reg reg);
>>>>   @@ -155,6 +159,7 @@ public:
>>>>        int type; /** GLSL_TYPE_* from GLSL IR (enum glsl_base_type) */
>>>>        /** Register index should be offset by the integer in this reg. */
>>>>        st_src_reg *reladdr;
>>>>   +   int size;
>>>>     };
>>>>
>>>>     class st_dst_reg {
>>>>   @@ -244,8 +249,9 @@ public:
>>>>
>>>>     class variable_storage : public exec_node {
>>>>     public:
>>>>   -   variable_storage(ir_variable *var, gl_register_file file, int
>> index)
>>>>   -      : file(file), index(index), var(var)
>>>>   +   variable_storage(ir_variable *var, gl_register_file file, int
>> index,
>>>>   +                    int size = 0)
>>>>   +      : file(file), index(index), var(var), size(size)
>>>>        {
>>>>           /* empty */
>>>>        }
>>>>   @@ -253,6 +259,7 @@ public:
>>>>        gl_register_file file;
>>>>        int index;
>>>>        ir_variable *var; /* variable that maps to this, if any */
>>>>   +   int size;
>>>>     };
>>>>
>>>>     class immediate_storage : public exec_node {
>>>>   @@ -312,6 +319,7 @@ public:
>>>>        struct gl_program *prog;
>>>>        struct gl_shader_program *shader_program;
>>>>        struct gl_shader_compiler_options *options;
>>>>   +   struct pipe_shader_array_info array_info;
>>>>
>>>>        int next_temp;
>>>>
>>>>   @@ -445,6 +453,8 @@ public:
>>>>        void emit_block_mov(ir_assignment *ir, const struct glsl_type
>> *type,
>>>>                            st_dst_reg *l, st_src_reg *r);
>>>>
>>>>   +   void add_array_info(const st_src_reg *src);
>>>>   +
>>>>        void *mem_ctx;
>>>>     };
>>>>
>>>>   @@ -1004,7 +1014,8 @@ glsl_to_tgsi_visitor::get_temp(const glsl_type
>> *type)
>>>>        src.file = PROGRAM_TEMPORARY;
>>>>        src.index = next_temp;
>>>>        src.reladdr = NULL;
>>>>   -   next_temp += type_size(type);
>>>>   +   src.size = type_size(type);
>>>>   +   next_temp += src.size;
>>>>
>>>>        if (type->is_array() || type->is_record()) {
>>>>           src.swizzle = SWIZZLE_NOOP;
>>>>   @@ -1075,9 +1086,10 @@ glsl_to_tgsi_visitor::visit(ir_variable *ir)
>>>>              assert((int) ir->num_state_slots ==
>> type_size(ir->type));
>>>>
>>>>              storage = new(mem_ctx) variable_storage(ir,
>> PROGRAM_TEMPORARY,
>>>>   -                             this->next_temp);
>>>>   +                                                 this->next_temp,
>>>>   +
>> type_size(ir->type));
>>>>              this->variables.push_tail(storage);
>>>>   -         this->next_temp += type_size(ir->type);
>>>>   +         this->next_temp += storage->size;
>>>>
>>>>              dst = st_dst_reg(st_src_reg(PROGRAM_TEMPORARY,
>> storage->index,
>>>>                    native_integers ? ir->type->base_type :
>> GLSL_TYPE_FLOAT));
>>>>   @@ -2029,10 +2041,10 @@
>> glsl_to_tgsi_visitor::visit(ir_dereference_variable *ir)
>>>>           case ir_var_auto:
>>>>           case ir_var_temporary:
>>>>              entry = new(mem_ctx) variable_storage(var,
>> PROGRAM_TEMPORARY,
>>>>   -                               this->next_temp);
>>>>   +                                               this->next_temp,
>>>>   +
>> type_size(var->type));
>>>>              this->variables.push_tail(entry);
>>>>   -
>>>>   -         next_temp += type_size(var->type);
>>>>   +         next_temp += entry->size;
>>>>              break;
>>>>           }
>>>>
>>>>   @@ -2043,6 +2055,8 @@
>> glsl_to_tgsi_visitor::visit(ir_dereference_variable *ir)
>>>>        }
>>>>
>>>>        this->result = st_src_reg(entry->file, entry->index,
>> var->type);
>>>>   +   this->result.size = entry->size;
>>>>   +
>>>>        if (!native_integers)
>>>>           this->result.type = GLSL_TYPE_FLOAT;
>>>>     }
>>>>   @@ -2062,12 +2076,16 @@
>> glsl_to_tgsi_visitor::visit(ir_dereference_array *ir)
>>>>        if (index) {
>>>>           src.index += index->value.i[0] * element_size;
>>>>        } else {
>>>>   +
>>>>           /* Variable index array dereference.  It eats the
>> "vec4" of the
>>>>            * base of the array and an index that offsets the TGSI
>> register
>>>>            * index.
>>>>            */
>>>>           ir->array_index->accept(this);
>>>>
>>>>   +      if (src.size > 1 && src.file == PROGRAM_TEMPORARY)
>>>>   +         add_array_info(&src);
>>>>   +
>>>>           st_src_reg index_reg;
>>>>
>>>>           if (element_size == 1) {
>>>>   @@ -2969,6 +2987,7 @@ glsl_to_tgsi_visitor::glsl_to_tgsi_visitor()
>>>>        prog = NULL;
>>>>        shader_program = NULL;
>>>>        options = NULL;
>>>>   +   array_info.num_arrays = 0;
>>>>     }
>>>>
>>>>     glsl_to_tgsi_visitor::~glsl_to_tgsi_visitor()
>>>>   @@ -3771,6 +3790,23 @@ glsl_to_tgsi_visitor::renumber_registers(void)
>>>>        this->next_temp = new_index;
>>>>     }
>>>>
>>>>   +void
>>>>   +glsl_to_tgsi_visitor::add_array_info(const st_src_reg *src)
>>>>   +{
>>>>   +   unsigned i;
>>>>   +
>>>>   +   if (array_info.num_arrays == PIPE_MAX_INDIRECT_ARRAYS)
>>>>   +      return;
>>>>   +
>>>>   +   for (i = 0; i < array_info.num_arrays; ++i) {
>>>>   +      if (array_info.arrays[i].index == (unsigned)src->index)
>>>>   +         return;
>>>>   +   }
>>>>   +   array_info.arrays[i].index = src->index;
>>>>   +   array_info.arrays[i].size = src->size;
>>>>   +   ++array_info.num_arrays;
>>>>   +}
>>>>   +
>>>>     /**
>>>>      * Returns a fragment program which implements the current pixel
>> transfer ops.
>>>>      * Based on get_pixel_transfer_program in st_atom_pixeltransfer.c.
>>>>   @@ -4798,6 +4834,11 @@ st_translate_program(
>>>>        }
>>>>
>>>>        if (program->indirect_addr_temps) {
>>>>   +      struct pipe_shader_array_info *ai =
>>>>   +         &((struct st_fragment_program*) proginfo)->array_info;
>>>>   +
>>>>   +      memcpy(ai, &program->array_info, sizeof(struct
>> pipe_shader_array_info));
>>>>   +
>>>>           /* If temps are accessed with indirect addressing, declare
>> temporaries
>>>>            * in sequential order.  Else, we declare them on demand
>> elsewhere.
>>>>            * (Note: the number of temporaries is equal to
>> program->next_temp)
>>>>   diff --git a/src/mesa/state_tracker/st_program.c
>> b/src/mesa/state_tracker/st_program.c
>>>>   index a9111b5..1370c76 100644
>>>>   --- a/src/mesa/state_tracker/st_program.c
>>>>   +++ b/src/mesa/state_tracker/st_program.c
>>>>   @@ -739,6 +739,10 @@ st_translate_fragment_program(struct st_context
>> *st,
>>>>                                     fs_output_semantic_index, FALSE,
>>>>                                     key->clamp_color);
>>>>
>>>>   +   if (stfp->glsl_to_tgsi)
>>>>   +      memcpy(&variant->tgsi.array_info,
>> &stfp->array_info,
>>>>   +             sizeof(struct pipe_shader_array_info));
>>>>   +
>>>>        variant->tgsi.tokens = ureg_get_tokens( ureg, NULL );
>>>>        ureg_destroy( ureg );
>>>>
>>>>   diff --git a/src/mesa/state_tracker/st_program.h
>> b/src/mesa/state_tracker/st_program.h
>>>>   index 23a262c..d6b2b0f 100644
>>>>   --- a/src/mesa/state_tracker/st_program.h
>>>>   +++ b/src/mesa/state_tracker/st_program.h
>>>>   @@ -90,6 +90,7 @@ struct st_fragment_program
>>>>     {
>>>>        struct gl_fragment_program Base;
>>>>        struct glsl_to_tgsi_visitor* glsl_to_tgsi;
>>>>   +   struct pipe_shader_array_info array_info;
>>>>
>>>>        struct st_fp_variant *variants;
>>>>     };
>>>>
>>>
>>
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev at lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>