[Mesa-dev] [PATCH] glsl/gallium: add a remove_output lowering pass
Kenneth Graunke
kenneth at whitecape.org
Mon Jan 2 02:33:39 PST 2012
On 12/31/2011 03:52 PM, Vincent Lejeune wrote:
> Current glsl_to_tgsi::remove_output_read pass did not work properly when
> indirect addressing was involved ; this commit replaces it with
> a lowering pass that occurs before glsl_to_tgsi visitor is called.
> This patch fix varying-array related piglit test.
> ---
> src/glsl/Makefile.sources | 1 +
> src/glsl/lower_remove_output_read.cpp | 97 ++++++++++++++++++++++++++++
> src/glsl/lower_remove_output_read.h | 62 ++++++++++++++++++
> src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 20 ++++--
> 4 files changed, 172 insertions(+), 8 deletions(-)
> create mode 100644 src/glsl/lower_remove_output_read.cpp
> create mode 100644 src/glsl/lower_remove_output_read.h
Vincent,
I like this! I have a couple of comments below. To save some trouble,
I've actually gone ahead and made the changes, and will send out a
proposed v2 of this patch shortly.
> diff --git a/src/glsl/Makefile.sources b/src/glsl/Makefile.sources
> index c65bfe4..6c80089 100644
> --- a/src/glsl/Makefile.sources
> +++ b/src/glsl/Makefile.sources
> @@ -60,6 +60,7 @@ LIBGLSL_CXX_SOURCES := \
> lower_vec_index_to_cond_assign.cpp \
> lower_vec_index_to_swizzle.cpp \
> lower_vector.cpp \
> + lower_remove_output_read.cpp \
> opt_algebraic.cpp \
> opt_constant_folding.cpp \
> opt_constant_propagation.cpp \
> diff --git a/src/glsl/lower_remove_output_read.cpp b/src/glsl/lower_remove_output_read.cpp
> new file mode 100644
> index 0000000..8150580
> --- /dev/null
> +++ b/src/glsl/lower_remove_output_read.cpp
> @@ -0,0 +1,97 @@
> +/*
> + * Copyright © 2010 Intel Corporation
> + * Copyright © 2012 Vincent Lejeune
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
> + * DEALINGS IN THE SOFTWARE.
> + */
> +
> +#include "lower_remove_output_read.h"
> +#include "ir.h"
> +
> +
> +void
> +output_read_remover::add_replacement_pair(ir_variable *output, ir_variable *temp)
> +{
> + if (replacements_count == 0) {
> + replacements_array = (struct replacement_pair *) ralloc_array_size(mem_ctx, sizeof(struct replacement_pair),1);
> + }
> + else {
> + replacements_array = (struct replacement_pair *) reralloc_array_size(mem_ctx,replacements_array, sizeof(struct replacement_pair), replacements_count + 1);
> + }
reralloc_array_size actually does an initial allocation if the pointer
you pass is NULL, so you don't need a special case for zero. Plus, you
could use the handy "reralloc" macro which does the typecast and sizeof
for you. This could just be:
replacements_array = reralloc(mem_ctx, replacements_array, struct
replacement_pair, replacements_count + 1);
However, you probably ought to avoid reallocating the array each time
you encounter a new shader output. (It's probably not _critical_ since
there aren't typically many outputs, but still worth fixing.) The
typical solution is to maintain the array size as a second counter and
double the size of the array each time.
> + hash_table_insert(replacements,temp,output);
> + replacements_array[replacements_count].output = output;
> + replacements_array[replacements_count].temp = temp;
> + replacements_count++;
> +}
> +
> +output_read_remover::output_read_remover():ir_hierarchical_visitor(), replacements_count(0)
No need to call the parent class constructor explicitly; C++ just does
that for you.
> +{
> + replacements = hash_table_ctor(0,hash_table_pointer_hash,hash_table_pointer_compare);
Probably want to initialize replacements_array here.
> + mem_ctx = ralloc_context(NULL);
> +}
> +
> +output_read_remover::~output_read_remover()
> +{
> + hash_table_dtor(replacements);
> + ralloc_free(mem_ctx);
> +}
> +
> +ir_visitor_status
> +output_read_remover::visit(ir_dereference_variable *ir)
> +{
> + ir_variable* temp = (ir_variable*) hash_table_find(replacements, ir->var);
> + if (temp) {
> + ir->var = temp;
> + }
> + else if(ir->var->mode == ir_var_out) {
I'd suggest checking for ir_var_out first. That way, you don't have to
even bother with the hash table lookup for most variables. Faster. :)
A more serious issue, however, is that ir_var_out has a double meaning:
- Shader output variables
- Function "out" parameters
I think you're safe here since you're calling this pass at codegen time,
presumably after all the functions have been inlined. Calling this on a
shader that still had functions with out-params could break it horribly
(since, unless it has a return statement, you'd never copy the temps
back to the real out params.)
That said, Eric, Ian, and I all agree that using ir_var_out to mean two
things is stupid, and I believe Ian has patches floating around to fix
that. Once those land, you won't have to worry about this at all.
I'll ask Ian what the status on those is.
> + ir_variable* temp = new (ir->var) ir_variable(ir->var->type,ir->var->name,ir_var_temporary);
Hmm. I guess using ir->var as the context works, since the shader
output isn't going to get removed. I'd still feel a bit safer if we
used the same allocation context as the original variable, though:
ralloc_parent(ir->var).
> + add_replacement_pair(ir->var,temp);
> + ir->var = temp;
> + }
> + return visit_continue;
> +}
> +
> +ir_visitor_status
> +output_read_remover::visit_enter(ir_return *ir)
I'd use visit_leave here just to be safe. Again, I think you're safe
due to inlining, but...paranoia.
> +{
> + for (unsigned i = 0; i < replacements_count; i++) {
> + ir_dereference_variable *lhs = new (ir) ir_dereference_variable(replacements_array[i].output);
> + ir_dereference_variable *rhs = new (ir) ir_dereference_variable(replacements_array[i].temp);
> + ir_assignment* assign = new (ir) ir_assignment(lhs, rhs);
> + ir->insert_before(assign);
> + }
> + return visit_continue;
> +}
> +
> +ir_visitor_status
> +output_read_remover::visit_leave(ir_function *f)
> +{
> + if (strcmp(f->name,"main") != 0)
> + return visit_continue;
> + exec_list empty;
> + ir_function_signature* sig = f->matching_signature(&empty);
Blargh :( I guess this works, but I'm not a fan of creating a blank list
of function parameters and pattern matching on signatures.
You can change this to visit_leave(ir_function_signature *) and just
check sig->function_name() against "main". Easier.
> + for (unsigned i = 0; i < replacements_count; i++) {
> + ir_dereference_variable *lhs = new (f) ir_dereference_variable(replacements_array[i].output);
> + ir_dereference_variable *rhs = new (f) ir_dereference_variable(replacements_array[i].temp);
> + ir_assignment* assign = new (f) ir_assignment(lhs, rhs);
> + sig->body.push_tail(assign);
> + }
> + return visit_continue;
> +}
> diff --git a/src/glsl/lower_remove_output_read.h b/src/glsl/lower_remove_output_read.h
> new file mode 100644
> index 0000000..825047d
> --- /dev/null
> +++ b/src/glsl/lower_remove_output_read.h
> @@ -0,0 +1,62 @@
> +/*
> + * Copyright (C) 2005-2007 Brian Paul All Rights Reserved.
> + * Copyright (C) 2008 VMware, Inc. All Rights Reserved.
> + * Copyright © 2010 Intel Corporation
> + * Copyright © 2011 Bryan Cain
> + * Copyright © 2012 Vincent Lejeune
I'm pretty sure that your new header file doesn't contain any code by
Brian Paul, VMware, or Bryan Cain. :)
It doesn't matter though; in the v2 I'm about to send out, I removed
this file.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
> + * DEALINGS IN THE SOFTWARE.
> + */
> +
> +#ifndef LOWER_REMOVE_OUTPUT_READ_H
> +#define LOWER_REMOVE_OUTPUT_READ_H
> +
> +#include "ir_hierarchical_visitor.h"
> +#include "program/hash_table.h"
> +
> +/**
> + * In GLSL shaders, varying vars can be read and written.
> + * On some hardware, trying to read an output register causes trouble.
> + * This pass replaces every output access with a temporary variable.
> + * It then adds required assignement to fill outputs.
> + *
> + */
> +
> +class output_read_remover : public ir_hierarchical_visitor {
> +protected:
> + hash_table* replacements;
> + struct replacement_pair {
> + ir_variable *output;
> + ir_variable *temp;
> + };
> + struct replacement_pair *replacements_array;
> + unsigned replacements_count;
> +
> + void add_replacement_pair(class ir_variable *, class ir_variable *);
> + void *mem_ctx;
> +public:
> + output_read_remover();
> + ~output_read_remover();
> + virtual ir_visitor_status visit(class ir_dereference_variable *);
> + virtual ir_visitor_status visit_leave(class ir_function *);
> + virtual ir_visitor_status visit_enter(class ir_return *);
> +};
> +
> +#endif // LOWER_REMOVE_OUTPUT_READ_H
> diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
> index 28b8c2a..c3df807 100644
> --- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
> +++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
> @@ -41,6 +41,7 @@
> #include "../glsl/program.h"
> #include "ir_optimization.h"
> #include "ast.h"
> +#include "lower_remove_output_read.h"
>
> #include "main/mtypes.h"
> #include "main/shaderobj.h"
> @@ -5023,6 +5024,17 @@ get_mesa_program(struct gl_context *ctx,
> _mesa_generate_parameters_list_for_uniforms(shader_program, shader,
> prog->Parameters);
>
> + if (!screen->get_shader_param(screen, pipe_shader_type,
> + PIPE_SHADER_CAP_OUTPUT_READ)) {
> + /* Remove reads to output registers, and to varyings in vertex shaders. */
> + output_read_remover orr_v;
> + foreach_list(node, shader->ir) {
> + ir_instruction *inst = (ir_instruction *) node;
> + inst->accept(&orr_v);
> + }
> + }
You can actually just use visit_list_elements(&orr_v, shader_>ir).
Also, in most other places, we just provide a wrapper function
(lower_output_reads(exec_list *)) that does the lowering for you. It's
simpler to use, and also hides the visitor class completely, so you
don't need to put it in a public header file.
> +
> /* Emit intermediate IR for main(). */
> visit_exec_list(shader->ir, v);
>
> @@ -5069,14 +5081,6 @@ get_mesa_program(struct gl_context *ctx,
> }
> #endif
>
> - if (!screen->get_shader_param(screen, pipe_shader_type,
> - PIPE_SHADER_CAP_OUTPUT_READ)) {
> - /* Remove reads to output registers, and to varyings in vertex shaders. */
> - v->remove_output_reads(PROGRAM_OUTPUT);
> - if (target == GL_VERTEX_PROGRAM_ARB)
> - v->remove_output_reads(PROGRAM_VARYING);
> - }
I'm pretty sure that glsl_to_tgsi_visitor::remove_output_reads is dead
after this change, so you probably want to delete it.
Also, I'd split the patch up a bit: (1) add the new pass, (2) switch to
the new pass, (3) delete the old pass.
> /* Perform optimizations on the instructions in the glsl_to_tgsi_visitor. */
> v->simplify_cmp();
> v->copy_propagate();
More information about the mesa-dev
mailing list