[Mesa-dev] [PATCH] glsl/gallium: add a remove_output lowering pass

Mon Jan 2 02:33:39 PST 2012

On 12/31/2011 03:52 PM, Vincent Lejeune wrote:
> Current glsl_to_tgsi::remove_output_read pass did not work properly when
> indirect addressing was involved ; this commit replaces it with
> a lowering pass that occurs before glsl_to_tgsi visitor is called.
> This patch fix varying-array related piglit test.
> ---
>  src/glsl/Makefile.sources                  |    1 +
>  src/glsl/lower_remove_output_read.cpp      |   97 ++++++++++++++++++++++++++++
>  src/glsl/lower_remove_output_read.h        |   62 ++++++++++++++++++
>  src/mesa/state_tracker/st_glsl_to_tgsi.cpp |   20 ++++--
>  4 files changed, 172 insertions(+), 8 deletions(-)
>  create mode 100644 src/glsl/lower_remove_output_read.cpp
>  create mode 100644 src/glsl/lower_remove_output_read.h

Vincent,

I like this!  I have a couple of comments below.  To save some trouble,
I've actually gone ahead and made the changes, and will send out a
proposed v2 of this patch shortly.

> diff --git a/src/glsl/Makefile.sources b/src/glsl/Makefile.sources
> index c65bfe4..6c80089 100644
> --- a/src/glsl/Makefile.sources
> +++ b/src/glsl/Makefile.sources
> @@ -60,6 +60,7 @@ LIBGLSL_CXX_SOURCES := \
>  	lower_vec_index_to_cond_assign.cpp \
>  	lower_vec_index_to_swizzle.cpp \
>  	lower_vector.cpp \
> +	lower_remove_output_read.cpp \
>  	opt_algebraic.cpp \
>  	opt_constant_folding.cpp \
>  	opt_constant_propagation.cpp \
> diff --git a/src/glsl/lower_remove_output_read.cpp b/src/glsl/lower_remove_output_read.cpp
> new file mode 100644
> index 0000000..8150580
> --- /dev/null
> +++ b/src/glsl/lower_remove_output_read.cpp
> @@ -0,0 +1,97 @@
> +/*
> + * Copyright © 2010 Intel Corporation
> + * Copyright © 2012 Vincent Lejeune
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
> + * DEALINGS IN THE SOFTWARE.
> + */
> +
> +#include "lower_remove_output_read.h"
> +#include "ir.h"
> +
> +
> +void
> +output_read_remover::add_replacement_pair(ir_variable *output, ir_variable *temp)
> +{
> +   if (replacements_count == 0) {
> +      replacements_array = (struct replacement_pair *) ralloc_array_size(mem_ctx, sizeof(struct replacement_pair),1);
> +   }
> +   else {
> +      replacements_array = (struct replacement_pair *) reralloc_array_size(mem_ctx,replacements_array, sizeof(struct replacement_pair), replacements_count + 1);
> +   }

reralloc_array_size actually does an initial allocation if the pointer
you pass is NULL, so you don't need a special case for zero.  Plus, you
could use the handy "reralloc" macro which does the typecast and sizeof
for you.  This could just be:

   replacements_array = reralloc(mem_ctx, replacements_array, struct
replacement_pair, replacements_count + 1);

However, you probably ought to avoid reallocating the array each time
you encounter a new shader output.  (It's probably not _critical_ since
there aren't typically many outputs, but still worth fixing.)  The
typical solution is to maintain the array size as a second counter and
double the size of the array each time.

> +   hash_table_insert(replacements,temp,output);
> +   replacements_array[replacements_count].output = output;
> +   replacements_array[replacements_count].temp = temp;
> +   replacements_count++;
> +}
> +
> +output_read_remover::output_read_remover():ir_hierarchical_visitor(), replacements_count(0)

No need to call the parent class constructor explicitly; C++ just does
that for you.

> +{
> +   replacements = hash_table_ctor(0,hash_table_pointer_hash,hash_table_pointer_compare);

Probably want to initialize replacements_array here.

> +   mem_ctx = ralloc_context(NULL);
> +}
> +
> +output_read_remover::~output_read_remover()
> +{
> +   hash_table_dtor(replacements);
> +   ralloc_free(mem_ctx);
> +}
> +
> +ir_visitor_status
> +output_read_remover::visit(ir_dereference_variable *ir)
> +{
> +   ir_variable* temp = (ir_variable*) hash_table_find(replacements, ir->var);
> +   if (temp) {
> +      ir->var = temp;
> +   }
> +   else if(ir->var->mode == ir_var_out) {

I'd suggest checking for ir_var_out first.  That way, you don't have to
even bother with the hash table lookup for most variables.  Faster. :)

A more serious issue, however, is that ir_var_out has a double meaning:
- Shader output variables
- Function "out" parameters

I think you're safe here since you're calling this pass at codegen time,
presumably after all the functions have been inlined.  Calling this on a
shader that still had functions with out-params could break it horribly
(since, unless it has a return statement, you'd never copy the temps
back to the real out params.)

That said, Eric, Ian, and I all agree that using ir_var_out to mean two
things is stupid, and I believe Ian has patches floating around to fix
that.  Once those land, you won't have to worry about this at all.

I'll ask Ian what the status on those is.

> +      ir_variable* temp = new (ir->var) ir_variable(ir->var->type,ir->var->name,ir_var_temporary);

Hmm.  I guess using ir->var as the context works, since the shader
output isn't going to get removed.  I'd still feel a bit safer if we
used the same allocation context as the original variable, though:
ralloc_parent(ir->var).

> +      add_replacement_pair(ir->var,temp);
> +      ir->var = temp;
> +   }
> +   return visit_continue;
> +}
> +
> +ir_visitor_status
> +output_read_remover::visit_enter(ir_return *ir)

I'd use visit_leave here just to be safe.  Again, I think you're safe
due to inlining, but...paranoia.

> +{
> +    for (unsigned i = 0; i < replacements_count; i++) {
> +      ir_dereference_variable *lhs = new (ir) ir_dereference_variable(replacements_array[i].output);
> +      ir_dereference_variable *rhs = new (ir) ir_dereference_variable(replacements_array[i].temp);
> +      ir_assignment* assign = new (ir) ir_assignment(lhs, rhs);
> +      ir->insert_before(assign);
> +   }
> +   return visit_continue;
> +}
> +
> +ir_visitor_status
> +output_read_remover::visit_leave(ir_function *f)
> +{
> +   if (strcmp(f->name,"main") != 0)
> +      return visit_continue;
> +   exec_list empty;
> +   ir_function_signature* sig = f->matching_signature(&empty);

Blargh :( I guess this works, but I'm not a fan of creating a blank list
of function parameters and pattern matching on signatures.

You can change this to visit_leave(ir_function_signature *) and just
check sig->function_name() against "main".  Easier.

> +   for (unsigned i = 0; i < replacements_count; i++) {
> +      ir_dereference_variable *lhs = new (f) ir_dereference_variable(replacements_array[i].output);
> +      ir_dereference_variable *rhs = new (f) ir_dereference_variable(replacements_array[i].temp);
> +      ir_assignment* assign = new (f) ir_assignment(lhs, rhs);
> +      sig->body.push_tail(assign);
> +   }
> +   return visit_continue;
> +}
> diff --git a/src/glsl/lower_remove_output_read.h b/src/glsl/lower_remove_output_read.h
> new file mode 100644
> index 0000000..825047d
> --- /dev/null
> +++ b/src/glsl/lower_remove_output_read.h
> @@ -0,0 +1,62 @@
> +/*
> + * Copyright (C) 2005-2007  Brian Paul   All Rights Reserved.
> + * Copyright (C) 2008  VMware, Inc.   All Rights Reserved.
> + * Copyright © 2010 Intel Corporation
> + * Copyright © 2011 Bryan Cain
> + * Copyright © 2012 Vincent Lejeune

I'm pretty sure that your new header file doesn't contain any code by
Brian Paul, VMware, or Bryan Cain. :)

It doesn't matter though; in the v2 I'm about to send out, I removed
this file.

> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
> + * DEALINGS IN THE SOFTWARE.
> + */
> +
> +#ifndef LOWER_REMOVE_OUTPUT_READ_H
> +#define LOWER_REMOVE_OUTPUT_READ_H
> +
> +#include "ir_hierarchical_visitor.h"
> +#include "program/hash_table.h"
> +
> +/**
> + * In GLSL shaders, varying vars can be read and written.
> + * On some hardware, trying to read an output register causes trouble.
> + * This pass replaces every output access with a temporary variable.
> + * It then adds required assignement to fill outputs.
> + *
> + */
> +
> +class output_read_remover : public ir_hierarchical_visitor {
> +protected:
> +   hash_table* replacements;
> +   struct replacement_pair {
> +      ir_variable *output;
> +      ir_variable *temp;
> +   };
> +   struct replacement_pair *replacements_array;
> +   unsigned replacements_count;
> +
> +   void add_replacement_pair(class ir_variable *, class ir_variable *);
> +   void *mem_ctx;
> +public:
> +   output_read_remover();
> +   ~output_read_remover();
> +   virtual ir_visitor_status visit(class ir_dereference_variable *);
> +   virtual ir_visitor_status visit_leave(class ir_function *);
> +   virtual ir_visitor_status visit_enter(class ir_return *);
> +};
> +
> +#endif // LOWER_REMOVE_OUTPUT_READ_H
> diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
> index 28b8c2a..c3df807 100644
> --- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
> +++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
> @@ -41,6 +41,7 @@
>  #include "../glsl/program.h"
>  #include "ir_optimization.h"
>  #include "ast.h"
> +#include "lower_remove_output_read.h"
>  
>  #include "main/mtypes.h"
>  #include "main/shaderobj.h"
> @@ -5023,6 +5024,17 @@ get_mesa_program(struct gl_context *ctx,
>     _mesa_generate_parameters_list_for_uniforms(shader_program, shader,
>  					       prog->Parameters);
>  
> +   if (!screen->get_shader_param(screen, pipe_shader_type,
> +                                 PIPE_SHADER_CAP_OUTPUT_READ)) {
> +      /* Remove reads to output registers, and to varyings in vertex shaders. */
> +      output_read_remover orr_v;
> +      foreach_list(node, shader->ir) {
> +         ir_instruction *inst = (ir_instruction *) node;
> +         inst->accept(&orr_v);
> +      }
> +   }

You can actually just use visit_list_elements(&orr_v, shader_>ir).
Also, in most other places, we just provide a wrapper function
(lower_output_reads(exec_list *)) that does the lowering for you.  It's
simpler to use, and also hides the visitor class completely, so you
don't need to put it in a public header file.

> +
>     /* Emit intermediate IR for main(). */
>     visit_exec_list(shader->ir, v);
>  
> @@ -5069,14 +5081,6 @@ get_mesa_program(struct gl_context *ctx,
>     }
>  #endif
>  
> -   if (!screen->get_shader_param(screen, pipe_shader_type,
> -                                 PIPE_SHADER_CAP_OUTPUT_READ)) {
> -      /* Remove reads to output registers, and to varyings in vertex shaders. */
> -      v->remove_output_reads(PROGRAM_OUTPUT);
> -      if (target == GL_VERTEX_PROGRAM_ARB)
> -         v->remove_output_reads(PROGRAM_VARYING);
> -   }

I'm pretty sure that glsl_to_tgsi_visitor::remove_output_reads is dead
after this change, so you probably want to delete it.

Also, I'd split the patch up a bit: (1) add the new pass, (2) switch to
the new pass, (3) delete the old pass.

>     /* Perform optimizations on the instructions in the glsl_to_tgsi_visitor. */
>     v->simplify_cmp();
>     v->copy_propagate();