[Mesa-dev] [PATCH 11/11] nir: Copy propagation between blocks
Caio Marcelo de Oliveira Filho
caio.oliveira at intel.com
Sat Sep 15 05:45:33 UTC 2018
Extend the pass to propagate the copies information along the control
flow graph. It performs two walks, first it collects the vars
that were written inside each node. Then it walks applying the copy
propagation using a list of copies previously available. At each node
the list is invalidated according to results from the first walk.
This approach is simpler than a full data-flow analysis, but covers
various cases. If derefs are used for operating on more memory
resources (e.g. SSBOs), the difference from a regular pass is expected
to be more visible -- as the SSA copy propagation pass won't apply to
those.
A full data-flow analysis would handle more scenarios: conditional
breaks in the control flow and merge equivalent effects from multiple
branches (e.g. using a phi node to merge the source for writes to the
same deref). However, as previous commentary in the code stated, its
complexity 'rapidly get out of hand'. The current patch is a good
intermediate step towards more complex analysis.
The 'copies' linked list was modified to use util_dynarray to make it
more convenient to clone it (to handle ifs/loops).
Annotated shader-db results for Skylake:
total instructions in shared programs: 15105796 -> 15105451 (<.01%)
instructions in affected programs: 152293 -> 151948 (-0.23%)
helped: 96
HURT: 17
All the HURTs and many HELPs are one instruction. Looking
at pass by pass outputs, the copy prop kicks in removing a
bunch of loads correctly, which ends up altering what other
other optimizations kick. In those cases the copies would be
propagated after lowering to SSA.
In few HELPs we are actually helping doing more than was
possible previously, e.g. consolidating load_uniforms from
different blocks. Most of those are from
shaders/dolphin/ubershaders/.
total cycles in shared programs: 566048861 -> 565954876 (-0.02%)
cycles in affected programs: 151461830 -> 151367845 (-0.06%)
helped: 2933
HURT: 2950
A lot of noise on both sides.
total loops in shared programs: 4603 -> 4603 (0.00%)
loops in affected programs: 0 -> 0
helped: 0
HURT: 0
total spills in shared programs: 11085 -> 11073 (-0.11%)
spills in affected programs: 23 -> 11 (-52.17%)
helped: 1
HURT: 0
The shaders/dolphin/ubershaders/12.shader_test was able to
pull a couple of loads from inside if statements and reuse
them.
total fills in shared programs: 23143 -> 23089 (-0.23%)
fills in affected programs: 2718 -> 2664 (-1.99%)
helped: 27
HURT: 0
All from shaders/dolphin/ubershaders/.
LOST: 0
GAINED: 0
The other generations follow the same overall shape. The spills and
fills HURTs are all from the same game.
shader-db results for Broadwell.
total instructions in shared programs: 15402037 -> 15401841 (<.01%)
instructions in affected programs: 144386 -> 144190 (-0.14%)
helped: 86
HURT: 9
total cycles in shared programs: 600912755 -> 600902486 (<.01%)
cycles in affected programs: 185662820 -> 185652551 (<.01%)
helped: 2598
HURT: 3053
total loops in shared programs: 4579 -> 4579 (0.00%)
loops in affected programs: 0 -> 0
helped: 0
HURT: 0
total spills in shared programs: 80929 -> 80924 (<.01%)
spills in affected programs: 720 -> 715 (-0.69%)
helped: 1
HURT: 5
total fills in shared programs: 93057 -> 93013 (-0.05%)
fills in affected programs: 3398 -> 3354 (-1.29%)
helped: 27
HURT: 5
LOST: 0
GAINED: 2
shader-db results for Haswell:
total instructions in shared programs: 9231975 -> 9230357 (-0.02%)
instructions in affected programs: 44992 -> 43374 (-3.60%)
helped: 27
HURT: 69
total cycles in shared programs: 87760587 -> 87727502 (-0.04%)
cycles in affected programs: 7720673 -> 7687588 (-0.43%)
helped: 1609
HURT: 1416
total loops in shared programs: 1830 -> 1830 (0.00%)
loops in affected programs: 0 -> 0
helped: 0
HURT: 0
total spills in shared programs: 1988 -> 1692 (-14.89%)
spills in affected programs: 296 -> 0
helped: 1
HURT: 0
total fills in shared programs: 2103 -> 1668 (-20.68%)
fills in affected programs: 438 -> 3 (-99.32%)
helped: 4
HURT: 0
LOST: 0
GAINED: 1
---
src/compiler/nir/nir_opt_copy_prop_vars.c | 394 +++++++++++++++++-----
1 file changed, 317 insertions(+), 77 deletions(-)
diff --git a/src/compiler/nir/nir_opt_copy_prop_vars.c b/src/compiler/nir/nir_opt_copy_prop_vars.c
index f58abfbb69f..966ccbdec53 100644
--- a/src/compiler/nir/nir_opt_copy_prop_vars.c
+++ b/src/compiler/nir/nir_opt_copy_prop_vars.c
@@ -26,6 +26,7 @@
#include "nir_deref.h"
#include "util/bitscan.h"
+#include "util/u_dynarray.h"
/**
* Variable-based copy propagation
@@ -42,16 +43,21 @@
* to do this because it isn't aware of variable writes that may alias the
* value and make the former load invalid.
*
- * Unfortunately, properly handling all of those cases makes this path rather
- * complex. In order to avoid additional complexity, this pass is entirely
- * block-local. If we tried to make it global, the data-flow analysis would
- * rapidly get out of hand. Fortunately, for anything that is only ever
- * accessed directly, we get SSA based copy-propagation which is extremely
- * powerful so this isn't that great a loss.
+ * This pass uses an intermediate solution between being local / "per-block"
+ * and a complete data-flow analysis. It follows the control flow graph, and
+ * propagate the available copy information forward, invalidating data at each
+ * cf_node.
*
* Removal of dead writes to variables is handled by another pass.
*/
+struct vars_written {
+ nir_variable_mode modes;
+
+ /* Key is deref and value is the uintptr_t with the write mask. */
+ struct hash_table *derefs;
+};
+
struct value {
bool is_ssa;
union {
@@ -61,61 +67,170 @@ struct value {
};
struct copy_entry {
- struct list_head link;
-
struct value src;
nir_deref_instr *dst;
};
struct copy_prop_var_state {
- nir_shader *shader;
+ nir_function_impl *impl;
void *mem_ctx;
+ void *lin_ctx;
- struct list_head copies;
-
- /* We're going to be allocating and deleting a lot of copy entries so we'll
- * keep a free list to avoid thrashing malloc too badly.
+ /* Maps nodes to vars_written. Used to invalidate copy entries when
+ * visiting each node.
*/
- struct list_head copy_free_list;
+ struct hash_table *vars_written_map;
bool progress;
};
-static struct copy_entry *
-copy_entry_create(struct copy_prop_var_state *state,
- nir_deref_instr *dst_deref)
+static struct vars_written *
+create_vars_written(struct copy_prop_var_state *state)
{
- struct copy_entry *entry;
- if (!list_empty(&state->copy_free_list)) {
- struct list_head *item = state->copy_free_list.next;
- list_del(item);
- entry = LIST_ENTRY(struct copy_entry, item, link);
- memset(entry, 0, sizeof(*entry));
- } else {
- entry = rzalloc(state->mem_ctx, struct copy_entry);
+ struct vars_written *written =
+ linear_zalloc_child(state->lin_ctx, sizeof(struct vars_written));
+ written->derefs = _mesa_hash_table_create(state->mem_ctx, _mesa_hash_pointer,
+ _mesa_key_pointer_equal);
+ return written;
+}
+
+static void
+gather_vars_written(struct copy_prop_var_state *state,
+ struct vars_written *written,
+ nir_cf_node *cf_node)
+{
+ struct vars_written *new_written = NULL;
+
+ switch (cf_node->type) {
+ case nir_cf_node_function: {
+ nir_function_impl *impl = nir_cf_node_as_function(cf_node);
+ foreach_list_typed_safe(nir_cf_node, cf_node, node, &impl->body)
+ gather_vars_written(state, NULL, cf_node);
+ break;
}
- entry->dst = dst_deref;
- list_add(&entry->link, &state->copies);
+ case nir_cf_node_block: {
+ if (!written)
+ break;
- return entry;
+ nir_block *block = nir_cf_node_as_block(cf_node);
+ nir_foreach_instr(instr, block) {
+ if (instr->type == nir_instr_type_call) {
+ written->modes |= nir_var_shader_out |
+ nir_var_global |
+ nir_var_shader_storage |
+ nir_var_shared;
+ continue;
+ }
+
+ if (instr->type != nir_instr_type_intrinsic)
+ continue;
+
+ nir_intrinsic_instr *intrin = nir_instr_as_intrinsic(instr);
+ switch (intrin->intrinsic) {
+ case nir_intrinsic_barrier:
+ case nir_intrinsic_memory_barrier:
+ written->modes |= nir_var_shader_out |
+ nir_var_shader_storage |
+ nir_var_shared;
+ break;
+
+ case nir_intrinsic_emit_vertex:
+ case nir_intrinsic_emit_vertex_with_counter:
+ written->modes = nir_var_shader_out;
+ break;
+
+ case nir_intrinsic_store_deref:
+ case nir_intrinsic_copy_deref: {
+ /* Destination in _both_ store_deref and copy_deref is src[0]. */
+ nir_deref_instr *dst = nir_src_as_deref(intrin->src[0]);
+
+ uintptr_t mask = intrin->intrinsic == nir_intrinsic_store_deref ?
+ nir_intrinsic_write_mask(intrin) : (1 << glsl_get_vector_elements(dst->type)) - 1;
+
+ struct hash_entry *ht_entry = _mesa_hash_table_search(written->derefs, dst);
+ if (ht_entry)
+ ht_entry->data = (void *)(mask | (uintptr_t)ht_entry->data);
+ else
+ _mesa_hash_table_insert(written->derefs, dst, (void *)mask);
+
+ break;
+ }
+
+ default:
+ break;
+ }
+ }
+
+ break;
+ }
+
+ case nir_cf_node_if: {
+ nir_if *if_stmt = nir_cf_node_as_if(cf_node);
+
+ new_written = create_vars_written(state);
+
+ foreach_list_typed_safe(nir_cf_node, cf_node, node, &if_stmt->then_list)
+ gather_vars_written(state, new_written, cf_node);
+
+ foreach_list_typed_safe(nir_cf_node, cf_node, node, &if_stmt->else_list)
+ gather_vars_written(state, new_written, cf_node);
+
+ break;
+ }
+
+ case nir_cf_node_loop: {
+ nir_loop *loop = nir_cf_node_as_loop(cf_node);
+
+ new_written = create_vars_written(state);
+
+ foreach_list_typed_safe(nir_cf_node, cf_node, node, &loop->body)
+ gather_vars_written(state, new_written, cf_node);
+
+ break;
+ }
+ }
+
+ if (new_written) {
+ /* Merge new information to the parent control flow node. */
+ if (written) {
+ written->modes |= new_written->modes;
+ struct hash_entry *ht_entry;
+ hash_table_foreach(new_written->derefs, ht_entry) {
+ _mesa_hash_table_insert_pre_hashed(written->derefs, ht_entry->hash,
+ ht_entry->key, ht_entry->data);
+ }
+ }
+ _mesa_hash_table_insert(state->vars_written_map, cf_node, new_written);
+ }
+}
+
+static struct copy_entry *
+copy_entry_create(struct util_dynarray *copies,
+ nir_deref_instr *dst_deref)
+{
+ struct copy_entry new_entry = {
+ .dst = dst_deref,
+ };
+ util_dynarray_append(copies, struct copy_entry, new_entry);
+ return util_dynarray_top_ptr(copies, struct copy_entry);
}
static void
-copy_entry_remove(struct copy_prop_var_state *state, struct copy_entry *entry)
+copy_entry_remove(struct util_dynarray *copies,
+ struct copy_entry *entry)
{
- list_del(&entry->link);
- list_add(&entry->link, &state->copy_free_list);
+ *entry = util_dynarray_pop(copies, struct copy_entry);
}
static struct copy_entry *
-lookup_entry_for_deref(struct copy_prop_var_state *state,
+lookup_entry_for_deref(struct util_dynarray *copies,
nir_deref_instr *deref,
nir_deref_compare_result allowed_comparisons)
{
- list_for_each_entry(struct copy_entry, iter, &state->copies, link) {
+ util_dynarray_foreach(copies, struct copy_entry, iter) {
if (nir_compare_derefs(iter->dst, deref) & allowed_comparisons)
return iter;
}
@@ -124,16 +239,16 @@ lookup_entry_for_deref(struct copy_prop_var_state *state,
}
static struct copy_entry *
-get_entry_and_kill_aliases(struct copy_prop_var_state *state,
- nir_deref_instr *deref,
- unsigned write_mask)
+lookup_entry_and_kill_aliases(struct util_dynarray *copies,
+ nir_deref_instr *deref,
+ unsigned write_mask)
{
struct copy_entry *entry = NULL;
- list_for_each_entry_safe(struct copy_entry, iter, &state->copies, link) {
+ util_dynarray_foreach_reverse(copies, struct copy_entry, iter) {
if (!iter->src.is_ssa) {
/* If this write aliases the source of some entry, get rid of it */
if (nir_compare_derefs(iter->src.deref, deref) & nir_derefs_may_alias_bit) {
- copy_entry_remove(state, iter);
+ copy_entry_remove(copies, iter);
continue;
}
}
@@ -144,28 +259,50 @@ get_entry_and_kill_aliases(struct copy_prop_var_state *state,
assert(entry == NULL);
entry = iter;
} else if (comp & nir_derefs_may_alias_bit) {
- copy_entry_remove(state, iter);
+ copy_entry_remove(copies, iter);
}
}
+ return entry;
+}
+
+static void
+kill_aliases(struct util_dynarray *copies,
+ nir_deref_instr *deref,
+ unsigned write_mask)
+{
+ struct copy_entry *entry =
+ lookup_entry_and_kill_aliases(copies, deref, write_mask);
+ if (entry)
+ copy_entry_remove(copies, entry);
+}
+
+static struct copy_entry *
+get_entry_and_kill_aliases(struct util_dynarray *copies,
+ nir_deref_instr *deref,
+ unsigned write_mask)
+{
+ struct copy_entry *entry =
+ lookup_entry_and_kill_aliases(copies, deref, write_mask);
+
if (entry == NULL)
- entry = copy_entry_create(state, deref);
+ entry = copy_entry_create(copies, deref);
return entry;
}
static void
-apply_barrier_for_modes(struct copy_prop_var_state *state,
+apply_barrier_for_modes(struct util_dynarray *copies,
nir_variable_mode modes)
{
- list_for_each_entry_safe(struct copy_entry, iter, &state->copies, link) {
+ util_dynarray_foreach_reverse(copies, struct copy_entry, iter) {
nir_variable *dst_var = nir_deref_instr_get_variable(iter->dst);
nir_variable *src_var = iter->src.is_ssa ? NULL :
nir_deref_instr_get_variable(iter->src.deref);
if ((dst_var->data.mode & modes) ||
(src_var && (src_var->data.mode & modes)))
- copy_entry_remove(state, iter);
+ copy_entry_remove(copies, iter);
}
}
@@ -396,13 +533,34 @@ try_load_from_entry(struct copy_prop_var_state *state, struct copy_entry *entry,
}
static void
-copy_prop_vars_block(struct copy_prop_var_state *state,
- nir_builder *b, nir_block *block)
+invalidate_copies_for_node(struct copy_prop_var_state *state,
+ struct util_dynarray *copies,
+ nir_cf_node *cf_node)
{
- /* Start each block with a blank slate */
- list_for_each_entry_safe(struct copy_entry, iter, &state->copies, link)
- copy_entry_remove(state, iter);
+ struct hash_entry *ht_entry = _mesa_hash_table_search(state->vars_written_map, cf_node);
+ assert(ht_entry);
+
+ struct vars_written *written = ht_entry->data;
+ if (written->modes) {
+ util_dynarray_foreach_reverse(copies, struct copy_entry, entry) {
+ nir_variable *var = nir_deref_instr_get_variable(entry->dst);
+ if (var->data.mode & written->modes)
+ copy_entry_remove(copies, entry);
+ }
+ }
+ struct hash_entry *entry;
+ hash_table_foreach (written->derefs, entry) {
+ nir_deref_instr *deref_written = (nir_deref_instr *)entry->key;
+ kill_aliases(copies, deref_written, (uintptr_t)entry->data);
+ }
+}
+
+static void
+copy_prop_vars_block(struct copy_prop_var_state *state,
+ nir_builder *b, nir_block *block,
+ struct util_dynarray *copies)
+{
nir_foreach_instr_safe(instr, block) {
if (instr->type == nir_instr_type_call) {
apply_barrier_for_modes(copies, nir_var_shader_out |
@@ -426,14 +584,14 @@ copy_prop_vars_block(struct copy_prop_var_state *state,
case nir_intrinsic_emit_vertex:
case nir_intrinsic_emit_vertex_with_counter:
- apply_barrier_for_modes(state, nir_var_shader_out);
+ apply_barrier_for_modes(copies, nir_var_shader_out);
break;
case nir_intrinsic_load_deref: {
nir_deref_instr *src = nir_src_as_deref(intrin->src[0]);
struct copy_entry *src_entry =
- lookup_entry_for_deref(state, src, nir_derefs_a_contains_b_bit);
+ lookup_entry_for_deref(copies, src, nir_derefs_a_contains_b_bit);
struct value value;
if (try_load_from_entry(state, src_entry, b, intrin, src, &value)) {
if (value.is_ssa) {
@@ -478,9 +636,9 @@ copy_prop_vars_block(struct copy_prop_var_state *state,
* contains what we're looking for.
*/
struct copy_entry *store_entry =
- lookup_entry_for_deref(state, src, nir_derefs_equal_bit);
+ lookup_entry_for_deref(copies, src, nir_derefs_equal_bit);
if (!store_entry)
- store_entry = copy_entry_create(state, src);
+ store_entry = copy_entry_create(copies, src);
/* Set up a store to this entry with the value of the load. This way
* we can potentially remove subsequent loads. However, we use a
@@ -503,7 +661,7 @@ copy_prop_vars_block(struct copy_prop_var_state *state,
nir_deref_instr *dst = nir_src_as_deref(intrin->src[0]);
unsigned wrmask = nir_intrinsic_write_mask(intrin);
struct copy_entry *entry =
- get_entry_and_kill_aliases(state, dst, wrmask);
+ get_entry_and_kill_aliases(copies, dst, wrmask);
store_to_entry(state, entry, &value, wrmask);
break;
}
@@ -519,7 +677,7 @@ copy_prop_vars_block(struct copy_prop_var_state *state,
}
struct copy_entry *src_entry =
- lookup_entry_for_deref(state, src, nir_derefs_a_contains_b_bit);
+ lookup_entry_for_deref(copies, src, nir_derefs_a_contains_b_bit);
struct value value;
if (try_load_from_entry(state, src_entry, b, intrin, src, &value)) {
if (value.is_ssa) {
@@ -546,7 +704,7 @@ copy_prop_vars_block(struct copy_prop_var_state *state,
}
struct copy_entry *dst_entry =
- get_entry_and_kill_aliases(state, dst, 0xf);
+ get_entry_and_kill_aliases(copies, dst, 0xf);
store_to_entry(state, dst_entry, &value, 0xf);
break;
}
@@ -557,36 +715,118 @@ copy_prop_vars_block(struct copy_prop_var_state *state,
}
}
-bool
-nir_opt_copy_prop_vars(nir_shader *shader)
+static void
+copy_prop_vars_node(struct copy_prop_var_state *state,
+ struct util_dynarray *copies,
+ nir_cf_node *cf_node)
{
- struct copy_prop_var_state state;
+ switch (cf_node->type) {
+ case nir_cf_node_function: {
+ nir_function_impl *impl = nir_cf_node_as_function(cf_node);
- state.shader = shader;
- state.mem_ctx = ralloc_context(NULL);
- list_inithead(&state.copies);
- list_inithead(&state.copy_free_list);
+ struct util_dynarray impl_copies;
+ util_dynarray_init(&impl_copies, state->mem_ctx);
- bool global_progress = false;
- nir_foreach_function(function, shader) {
- if (!function->impl)
- continue;
+ foreach_list_typed_safe(nir_cf_node, cf_node, node, &impl->body)
+ copy_prop_vars_node(state, &impl_copies, cf_node);
+
+ break;
+ }
+ case nir_cf_node_block: {
+ nir_block *block = nir_cf_node_as_block(cf_node);
nir_builder b;
- nir_builder_init(&b, function->impl);
+ nir_builder_init(&b, state->impl);
+ copy_prop_vars_block(state, &b, block, copies);
+ break;
+ }
- state.progress = false;
- nir_foreach_block(block, function->impl)
- copy_prop_vars_block(&state, &b, block);
+ case nir_cf_node_if: {
+ nir_if *if_stmt = nir_cf_node_as_if(cf_node);
- if (state.progress) {
- nir_metadata_preserve(function->impl, nir_metadata_block_index |
- nir_metadata_dominance);
- global_progress = true;
- }
+ /* Clone the copies for each branch of the if statement. The idea is
+ * that they both see the same state of available copies, but do not
+ * interfere to each other.
+ */
+
+ struct util_dynarray then_copies;
+ util_dynarray_clone(&then_copies, state->mem_ctx, copies);
+
+ struct util_dynarray else_copies;
+ util_dynarray_clone(&else_copies, state->mem_ctx, copies);
+
+ foreach_list_typed_safe(nir_cf_node, cf_node, node, &if_stmt->then_list)
+ copy_prop_vars_node(state, &then_copies, cf_node);
+
+ foreach_list_typed_safe(nir_cf_node, cf_node, node, &if_stmt->else_list)
+ copy_prop_vars_node(state, &else_copies, cf_node);
+
+ /* Both branches copies can be ignored, since the effect of running both
+ * branches was captured in the first pass that collects vars_written.
+ */
+
+ invalidate_copies_for_node(state, copies, cf_node);
+
+ break;
}
- ralloc_free(state.mem_ctx);
+ case nir_cf_node_loop: {
+ nir_loop *loop = nir_cf_node_as_loop(cf_node);
+
+ /* Invalidate before cloning the copies for the loop, since the loop
+ * body can be executed more than once.
+ */
+
+ invalidate_copies_for_node(state, copies, cf_node);
+
+ struct util_dynarray loop_copies;
+ util_dynarray_clone(&loop_copies, state->mem_ctx, copies);
+
+ foreach_list_typed_safe(nir_cf_node, cf_node, node, &loop->body)
+ copy_prop_vars_node(state, &loop_copies, cf_node);
+
+ break;
+ }
+ }
+}
+
+static bool
+nir_copy_prop_vars_impl(nir_function_impl *impl)
+{
+ void *mem_ctx = ralloc_context(NULL);
+
+ struct copy_prop_var_state state = {
+ .impl = impl,
+ .mem_ctx = mem_ctx,
+ .lin_ctx = linear_zalloc_parent(mem_ctx, 0),
+
+ .vars_written_map = _mesa_hash_table_create(mem_ctx, _mesa_hash_pointer,
+ _mesa_key_pointer_equal),
+ };
+
+ gather_vars_written(&state, NULL, &impl->cf_node);
+
+ copy_prop_vars_node(&state, NULL, &impl->cf_node);
+
+ if (state.progress) {
+ nir_metadata_preserve(impl, nir_metadata_block_index |
+ nir_metadata_dominance);
+ }
+
+ ralloc_free(mem_ctx);
+ return state.progress;
+}
+
+bool
+nir_opt_copy_prop_vars(nir_shader *shader)
+{
+ bool progress = false;
+
+ nir_foreach_function(function, shader) {
+ if (!function->impl)
+ continue;
+ progress |= nir_copy_prop_vars_impl(function->impl);
+ }
- return global_progress;
+ return progress;
}
--
2.19.0
More information about the mesa-dev
mailing list