[Mesa-dev] [PATCH] glsl_to_tgsi: indirect array information

Tue Jan 22 17:06:55 PST 2013

----- Mail original -----
> De : Vadim Girlin <vadimgirlin at gmail.com>
> À : Vincent Lejeune <vljn at ovi.com>
> Cc : "mesa-dev at lists.freedesktop.org" <mesa-dev at lists.freedesktop.org>
> Envoyé le : Mercredi 23 janvier 2013 1h21
> Objet : Re: [Mesa-dev] [PATCH] glsl_to_tgsi: indirect array information
> 
> On 01/23/2013 03:59 AM, Vincent Lejeune wrote:
>> 
>> 
>>  ----- Mail original -----
>>>  De : Vadim Girlin <vadimgirlin at gmail.com>
>>>  À : Christoph Bumiller <e0425955 at student.tuwien.ac.at>
>>>  Cc : mesa-dev at lists.freedesktop.org
>>>  Envoyé le : Mercredi 23 janvier 2013 0h44
>>>  Objet : Re: [Mesa-dev] [PATCH] glsl_to_tgsi: indirect array information
>>> 
>>>  On 01/22/2013 10:59 PM, Christoph Bumiller wrote:
>>>>    On 21.01.2013 21:10, Vadim Girlin wrote:
>>>>>    Provide the information about indirectly addressable arrays 
> (ranges of
>>>  temps) in
>>>>>    the shader to the drivers. TGSI representation itself 
> isn't
>>>  modified, array
>>>>>    information is passed as an additional data in the 
> pipe_shader_state,
>>>  so the
>>>>>    drivers can use it as a hint for optimization.
>>>>>    ---
>>>>> 
>>>>>    It's far from being an ideal solution, but I saw the 
> discussions
>>>  about that
>>>>>    problem starting from 2009 IIRC, and we still have no 
> solution (neither
>>>  good
>>>>>    nor bad) despite the years passed. I hope we can use this not 
> very
>>>  intrusive
>>>>>    approach until we get something better.
>>>>> 
>>>> 
>>>>    I'd rather not have any hacks in the interface, let alone 
> ones that
>>>>    solve the problem only partially (you still won't know which 
> array is
>>>>    accessed by a particular instruction, which is important for
>>>>    optimization and essential in some cases for making INPUT/OUTPUT 
> arrays
>>>>    work), and not just because it reduces the pressure on people to
>>>>    implement a proper solution.
>>>> 
>>>>    With this, you just get to know which range of TEMPs are 
> indirectly
>>>>    addressed and which ones are not, and you can do the same by 
> simply
>>>>    creating multiple declarations of TEMPs, one for each array, and 
> adding
>>>>    a single bit of info to tgsi_declaration (which has 7 bits of 
> padding
>>>>    anyway, so ample space), which is a lot less ugly, and 
> doesn't suffer
>>>>    from an arbitrary limit, and doesn't require any modification 
> of
>>>  drivers
>>>>    either.
>>>> 
>>> 
>>>  Array accessed by any indirect operand can be identified by the
>>>  immediate offset, e.g. TEMP[ADDR[0].x+1] implies the array starting 
> from
>>>  1, thus we can find it's entry in the information provided by this 
> patch
>>>  to get the addressable range for every indirect operand. If I'm not
>>>  missing something, glsl_to_tgsi accumulates all other parts of the
>>>  offset in the address register before the indirect access. If I'm 
> wrong,
>>>  we can fix it to ensure such behavior.
>> 
>>  I'm not sure about that ; when I worked on indirect addressing of const 
> memory,
>>  I discovered when tracking vp/fo regression that the immediate offset is 
> the result of
>>    glsl_to_tgsi constant propagation and not related to the underlying 
> array.
>>  This means that the dynamic index can be negative, which is not always
>>  desirable depending on the hw. (In R600 case, const fetch instruction does 
> not
>>  support negative index. MOVA inst does).
>> 
>>  For instance, the following pseudo code snippet is fine for an index value 
> of -4 :
>> 
>>  uniform int index;
>> 
>>  float array[4];
>>  float data = array[6 + index];
>> 
>>  and is lowered to
>>  MOV TEMP[0] TEMP[ADDR[0].x + 6];
>> 
> 
> I tried the following shader:
> 
>>  uniform int index;
>> 
>>  void main()
>>  {
>>      float array[4] = float[4](0.1, 0.2, 0.3, 0.4);
>>      float data = array[6 + index];
>>      gl_FragColor = vec4(data, 1.0, 0.0, 1.0);
>>  }
> 
> Resulting TGSI:
> 
>>  --------------------------------------------------------------
>>  FRAG
>>  PROPERTY FS_COLOR0_WRITES_ALL_CBUFS 1
>>  DCL OUT[0], COLOR
>>  DCL CONST[0]
>>  DCL TEMP[0], LOCAL
>>  DCL TEMP[1], LOCAL
>>  DCL TEMP[2], LOCAL
>>  DCL TEMP[3], LOCAL
>>  DCL TEMP[4], LOCAL
>>  DCL TEMP[5], LOCAL
>>  DCL TEMP[6], LOCAL
>>  DCL TEMP[7], LOCAL
>>  DCL ADDR[0]
>>  IMM[0] FLT32 {    0.1000,     0.2000,     0.3000,     0.4000}
>>  IMM[1] FLT32 {    1.0000,     0.0000,     0.0000,     0.0000}
>>  IMM[2] INT32 {6, 0, 0, 0}
>>    0: MOV TEMP[1].yzw, IMM[1].yxyx
>>    1: MOV TEMP[2], IMM[0].xxxx
>>    2: MOV TEMP[3], IMM[0].yyyy
>>    3: MOV TEMP[4], IMM[0].zzzz
>>    4: MOV TEMP[5], IMM[0].wwww
>>    5: UADD TEMP[6].x, IMM[2].xxxx, CONST[0].xxxx
>>    6: UARL ADDR[0].x, TEMP[6].xxxx
>>    7: MOV TEMP[1].x, TEMP[ADDR[0].x+2].xxxx
>>    8: MOV_SAT OUT[0], TEMP[1]
>>    9: END
>>  --------------------------------------------------------------
> 
> Also I tried the following:
> 
>>  uniform float array[4];
>>  uniform int index;
>> 
>>  void main()
>>  {
>>      float data = array[6 + index];
>>      gl_FragColor = vec4(data, 1.0, 0.0, 1.0);
>>  }
> 
> Resulting TGSI:
> 
>>  --------------------------------------------------------------
>>  FRAG
>>  PROPERTY FS_COLOR0_WRITES_ALL_CBUFS 1
>>  DCL OUT[0], COLOR
>>  DCL CONST[0..4]
>>  DCL TEMP[0], LOCAL
>>  DCL TEMP[1], LOCAL
>>  DCL ADDR[0]
>>  IMM[0] FLT32 {    1.0000,     0.0000,     0.0000,     0.0000}
>>  IMM[1] INT32 {6, 0, 0, 0}
>>    0: MOV TEMP[0].yzw, IMM[0].yxyx
>>    1: UADD TEMP[1].x, IMM[1].xxxx, CONST[0].xxxx
>>    2: UARL ADDR[0].x, TEMP[1].xxxx
>>    3: MOV TEMP[0].x, CONST[ADDR[0].x+1].xxxx
>>    4: MOV_SAT OUT[0], TEMP[0]
>>    5: END
>>  --------------------------------------------------------------
> 
> So far immediate offset in the indirect operand is always equal to the 
> start offset of the array. Could you provide some more complete example 
> that demonstrates the problem, please.

In fact I was wrong, glsl_to_tgsi generates good offset, the behavior only concerns
vpfp.

See the piglit tests : 
/home/vlj/piglit/framework/../bin/vpfp-generic -auto tests/shaders/generic/vp-arl-neg-array.vpfp
and 
/home/vlj/piglit/framework/../bin/vpfp-generic -auto tests/shaders/generic/vp-arl-neg-array-2.vpfp

> 
> Vadim
> 
>>  I didn't test your patch atm, but I think you may have to fix 
> glsl_to_tgsi.
>>  Otherwise I'm in favor of implementing something not optimal but far
>>  better that what we have currently.
>> 
>>  Vincent
>> 
>>> 
>>>  I'll be perfectly OK with any other solution, as long as it's a 
> really
>>>  working (already implemented) solution that I can use today, not just
>>>  some abstract ideas in the discussions. This patch isn't perfect 
> and can
>>>  be improved, but it already works for me. I'll be very happy to use 
> any
>>>  other solution from you or anyone else.
>>> 
>>>  Vadim
>>> 
>>>>>      src/gallium/auxiliary/tgsi/tgsi_ureg.c     |  2 ++
>>>>>      src/gallium/include/pipe/p_state.h         | 13 +++++++
>>>>>      src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 57
>>>  +++++++++++++++++++++++++-----
>>>>>      src/mesa/state_tracker/st_program.c        |  4 +++
>>>>>      src/mesa/state_tracker/st_program.h        |  1 +
>>>>>      5 files changed, 69 insertions(+), 8 deletions(-)
>>>>> 
>>>>>    diff --git a/src/gallium/auxiliary/tgsi/tgsi_ureg.c
>>>  b/src/gallium/auxiliary/tgsi/tgsi_ureg.c
>>>>>    index 3c2a923..61db431 100644
>>>>>    --- a/src/gallium/auxiliary/tgsi/tgsi_ureg.c
>>>>>    +++ b/src/gallium/auxiliary/tgsi/tgsi_ureg.c
>>>>>    @@ -1658,6 +1658,8 @@ void *ureg_create_shader( struct 
> ureg_program
>>>  *ureg,
>>>>>         else
>>>>>            memset(&state.stream_output, 0,
>>>  sizeof(state.stream_output));
>>>>> 
>>>>>    +   memset(&state.array_info, 0, 
> sizeof(state.array_info));
>>>>>    +
>>>>>         if (ureg->processor == TGSI_PROCESSOR_VERTEX)
>>>>>            return pipe->create_vs_state( pipe, &state );
>>>>>         else
>>>>>    diff --git a/src/gallium/include/pipe/p_state.h
>>>  b/src/gallium/include/pipe/p_state.h
>>>>>    index ab49cab..4490f2e 100644
>>>>>    --- a/src/gallium/include/pipe/p_state.h
>>>>>    +++ b/src/gallium/include/pipe/p_state.h
>>>>>    @@ -65,6 +65,8 @@ extern "C" {
>>>>>      #define PIPE_MAX_TEXTURE_LEVELS   16
>>>>>      #define PIPE_MAX_SO_BUFFERS        4
>>>>> 
>>>>>    +#define PIPE_MAX_INDIRECT_ARRAYS  16
>>>>>    +
>>>>> 
>>>>>      struct pipe_reference
>>>>>      {
>>>>>    @@ -205,11 +207,22 @@ struct pipe_stream_output_info
>>>>>         } output[PIPE_MAX_SHADER_OUTPUTS];
>>>>>      };
>>>>> 
>>>>>    +struct pipe_shader_indirect_array {
>>>>>    +   unsigned index:16;
>>>>>    +   unsigned size:16;
>>>>>    +};
>>>>>    +
>>>>>    +struct pipe_shader_array_info
>>>>>    +{
>>>>>    +   struct pipe_shader_indirect_array 
> arrays[PIPE_MAX_INDIRECT_ARRAYS];
>>>>>    +   unsigned num_arrays;
>>>>>    +};
>>>>> 
>>>>>      struct pipe_shader_state
>>>>>      {
>>>>>         const struct tgsi_token *tokens;
>>>>>         struct pipe_stream_output_info stream_output;
>>>>>    +   struct pipe_shader_array_info array_info;
>>>>>      };
>>>>> 
>>>>> 
>>>>>    diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
>>>  b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
>>>>>    index 1d96e90..4ded1be 100644
>>>>>    --- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
>>>>>    +++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
>>>>>    @@ -110,6 +110,7 @@ public:
>>>>>            this->index2D = 0;
>>>>>            this->type = type ? type->base_type : 
> GLSL_TYPE_ERROR;
>>>>>            this->reladdr = NULL;
>>>>>    +      this->size = 0;
>>>>>         }
>>>>> 
>>>>>         st_src_reg(gl_register_file file, int index, int type)
>>>>>    @@ -121,6 +122,7 @@ public:
>>>>>            this->swizzle = SWIZZLE_XYZW;
>>>>>            this->negate = 0;
>>>>>            this->reladdr = NULL;
>>>>>    +      this->size = 0;
>>>>>         }
>>>>> 
>>>>>         st_src_reg(gl_register_file file, int index, int type, 
> int
>>>  index2D)
>>>>>    @@ -132,6 +134,7 @@ public:
>>>>>            this->swizzle = SWIZZLE_XYZW;
>>>>>            this->negate = 0;
>>>>>            this->reladdr = NULL;
>>>>>    +      this->size = 0;
>>>>>         }
>>>>> 
>>>>>         st_src_reg()
>>>>>    @@ -143,6 +146,7 @@ public:
>>>>>            this->swizzle = 0;
>>>>>            this->negate = 0;
>>>>>            this->reladdr = NULL;
>>>>>    +      this->size = 0;
>>>>>         }
>>>>> 
>>>>>         explicit st_src_reg(st_dst_reg reg);
>>>>>    @@ -155,6 +159,7 @@ public:
>>>>>         int type; /** GLSL_TYPE_* from GLSL IR (enum 
> glsl_base_type) */
>>>>>         /** Register index should be offset by the integer in 
> this reg. */
>>>>>         st_src_reg *reladdr;
>>>>>    +   int size;
>>>>>      };
>>>>> 
>>>>>      class st_dst_reg {
>>>>>    @@ -244,8 +249,9 @@ public:
>>>>> 
>>>>>      class variable_storage : public exec_node {
>>>>>      public:
>>>>>    -   variable_storage(ir_variable *var, gl_register_file file, 
> int
>>>  index)
>>>>>    -      : file(file), index(index), var(var)
>>>>>    +   variable_storage(ir_variable *var, gl_register_file file, 
> int
>>>  index,
>>>>>    +                    int size = 0)
>>>>>    +      : file(file), index(index), var(var), size(size)
>>>>>         {
>>>>>            /* empty */
>>>>>         }
>>>>>    @@ -253,6 +259,7 @@ public:
>>>>>         gl_register_file file;
>>>>>         int index;
>>>>>         ir_variable *var; /* variable that maps to this, if any 
> */
>>>>>    +   int size;
>>>>>      };
>>>>> 
>>>>>      class immediate_storage : public exec_node {
>>>>>    @@ -312,6 +319,7 @@ public:
>>>>>         struct gl_program *prog;
>>>>>         struct gl_shader_program *shader_program;
>>>>>         struct gl_shader_compiler_options *options;
>>>>>    +   struct pipe_shader_array_info array_info;
>>>>> 
>>>>>         int next_temp;
>>>>> 
>>>>>    @@ -445,6 +453,8 @@ public:
>>>>>         void emit_block_mov(ir_assignment *ir, const struct 
> glsl_type
>>>  *type,
>>>>>                             st_dst_reg *l, st_src_reg *r);
>>>>> 
>>>>>    +   void add_array_info(const st_src_reg *src);
>>>>>    +
>>>>>         void *mem_ctx;
>>>>>      };
>>>>> 
>>>>>    @@ -1004,7 +1014,8 @@ glsl_to_tgsi_visitor::get_temp(const 
> glsl_type
>>>  *type)
>>>>>         src.file = PROGRAM_TEMPORARY;
>>>>>         src.index = next_temp;
>>>>>         src.reladdr = NULL;
>>>>>    -   next_temp += type_size(type);
>>>>>    +   src.size = type_size(type);
>>>>>    +   next_temp += src.size;
>>>>> 
>>>>>         if (type->is_array() || type->is_record()) {
>>>>>            src.swizzle = SWIZZLE_NOOP;
>>>>>    @@ -1075,9 +1086,10 @@ 
> glsl_to_tgsi_visitor::visit(ir_variable *ir)
>>>>>               assert((int) ir->num_state_slots ==
>>>  type_size(ir->type));
>>>>> 
>>>>>               storage = new(mem_ctx) variable_storage(ir,
>>>  PROGRAM_TEMPORARY,
>>>>>    -                             this->next_temp);
>>>>>    +                                                
> this->next_temp,
>>>>>    +
>>>  type_size(ir->type));
>>>>>               this->variables.push_tail(storage);
>>>>>    -         this->next_temp += type_size(ir->type);
>>>>>    +         this->next_temp += storage->size;
>>>>> 
>>>>>               dst = st_dst_reg(st_src_reg(PROGRAM_TEMPORARY,
>>>  storage->index,
>>>>>                     native_integers ? ir->type->base_type 
> :
>>>  GLSL_TYPE_FLOAT));
>>>>>    @@ -2029,10 +2041,10 @@
>>>  glsl_to_tgsi_visitor::visit(ir_dereference_variable *ir)
>>>>>            case ir_var_auto:
>>>>>            case ir_var_temporary:
>>>>>               entry = new(mem_ctx) variable_storage(var,
>>>  PROGRAM_TEMPORARY,
>>>>>    -                               this->next_temp);
>>>>>    +                                              
> this->next_temp,
>>>>>    +
>>>  type_size(var->type));
>>>>>               this->variables.push_tail(entry);
>>>>>    -
>>>>>    -         next_temp += type_size(var->type);
>>>>>    +         next_temp += entry->size;
>>>>>               break;
>>>>>            }
>>>>> 
>>>>>    @@ -2043,6 +2055,8 @@
>>>  glsl_to_tgsi_visitor::visit(ir_dereference_variable *ir)
>>>>>         }
>>>>> 
>>>>>         this->result = st_src_reg(entry->file, 
> entry->index,
>>>  var->type);
>>>>>    +   this->result.size = entry->size;
>>>>>    +
>>>>>         if (!native_integers)
>>>>>            this->result.type = GLSL_TYPE_FLOAT;
>>>>>      }
>>>>>    @@ -2062,12 +2076,16 @@
>>>  glsl_to_tgsi_visitor::visit(ir_dereference_array *ir)
>>>>>         if (index) {
>>>>>            src.index += index->value.i[0] * element_size;
>>>>>         } else {
>>>>>    +
>>>>>            /* Variable index array dereference.  It eats the
>>>  "vec4" of the
>>>>>             * base of the array and an index that offsets the 
> TGSI
>>>  register
>>>>>             * index.
>>>>>             */
>>>>>            ir->array_index->accept(this);
>>>>> 
>>>>>    +      if (src.size > 1 && src.file == 
> PROGRAM_TEMPORARY)
>>>>>    +         add_array_info(&src);
>>>>>    +
>>>>>            st_src_reg index_reg;
>>>>> 
>>>>>            if (element_size == 1) {
>>>>>    @@ -2969,6 +2987,7 @@ 
> glsl_to_tgsi_visitor::glsl_to_tgsi_visitor()
>>>>>         prog = NULL;
>>>>>         shader_program = NULL;
>>>>>         options = NULL;
>>>>>    +   array_info.num_arrays = 0;
>>>>>      }
>>>>> 
>>>>>      glsl_to_tgsi_visitor::~glsl_to_tgsi_visitor()
>>>>>    @@ -3771,6 +3790,23 @@ 
> glsl_to_tgsi_visitor::renumber_registers(void)
>>>>>         this->next_temp = new_index;
>>>>>      }
>>>>> 
>>>>>    +void
>>>>>    +glsl_to_tgsi_visitor::add_array_info(const st_src_reg *src)
>>>>>    +{
>>>>>    +   unsigned i;
>>>>>    +
>>>>>    +   if (array_info.num_arrays == PIPE_MAX_INDIRECT_ARRAYS)
>>>>>    +      return;
>>>>>    +
>>>>>    +   for (i = 0; i < array_info.num_arrays; ++i) {
>>>>>    +      if (array_info.arrays[i].index == 
> (unsigned)src->index)
>>>>>    +         return;
>>>>>    +   }
>>>>>    +   array_info.arrays[i].index = src->index;
>>>>>    +   array_info.arrays[i].size = src->size;
>>>>>    +   ++array_info.num_arrays;
>>>>>    +}
>>>>>    +
>>>>>      /**
>>>>>       * Returns a fragment program which implements the current 
> pixel
>>>  transfer ops.
>>>>>       * Based on get_pixel_transfer_program in 
> st_atom_pixeltransfer.c.
>>>>>    @@ -4798,6 +4834,11 @@ st_translate_program(
>>>>>         }
>>>>> 
>>>>>         if (program->indirect_addr_temps) {
>>>>>    +      struct pipe_shader_array_info *ai =
>>>>>    +         &((struct st_fragment_program*) 
> proginfo)->array_info;
>>>>>    +
>>>>>    +      memcpy(ai, &program->array_info, sizeof(struct
>>>  pipe_shader_array_info));
>>>>>    +
>>>>>            /* If temps are accessed with indirect addressing, 
> declare
>>>  temporaries
>>>>>             * in sequential order.  Else, we declare them on 
> demand
>>>  elsewhere.
>>>>>             * (Note: the number of temporaries is equal to
>>>  program->next_temp)
>>>>>    diff --git a/src/mesa/state_tracker/st_program.c
>>>  b/src/mesa/state_tracker/st_program.c
>>>>>    index a9111b5..1370c76 100644
>>>>>    --- a/src/mesa/state_tracker/st_program.c
>>>>>    +++ b/src/mesa/state_tracker/st_program.c
>>>>>    @@ -739,6 +739,10 @@ st_translate_fragment_program(struct 
> st_context
>>>  *st,
>>>>>                                      fs_output_semantic_index, 
> FALSE,
>>>>>                                      key->clamp_color);
>>>>> 
>>>>>    +   if (stfp->glsl_to_tgsi)
>>>>>    +      memcpy(&variant->tgsi.array_info,
>>>  &stfp->array_info,
>>>>>    +             sizeof(struct pipe_shader_array_info));
>>>>>    +
>>>>>         variant->tgsi.tokens = ureg_get_tokens( ureg, NULL );
>>>>>         ureg_destroy( ureg );
>>>>> 
>>>>>    diff --git a/src/mesa/state_tracker/st_program.h
>>>  b/src/mesa/state_tracker/st_program.h
>>>>>    index 23a262c..d6b2b0f 100644
>>>>>    --- a/src/mesa/state_tracker/st_program.h
>>>>>    +++ b/src/mesa/state_tracker/st_program.h
>>>>>    @@ -90,6 +90,7 @@ struct st_fragment_program
>>>>>      {
>>>>>         struct gl_fragment_program Base;
>>>>>         struct glsl_to_tgsi_visitor* glsl_to_tgsi;
>>>>>    +   struct pipe_shader_array_info array_info;
>>>>> 
>>>>>         struct st_fp_variant *variants;
>>>>>      };
>>>>> 
>>>> 
>>> 
>>>  _______________________________________________
>>>  mesa-dev mailing list
>>>  mesa-dev at lists.freedesktop.org
>>>  http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>> 
>