[Mesa-dev] [PATCH 1/2] glsl: Implement a SSBO load optimization pass

Iago Toral Quiroga itoral at igalia.com
Tue Oct 20 00:11:16 PDT 2015


This allows us to re-use the results of previous ssbo loads in situations
that are safe (i.e. when there are no stores, atomic operations or
memory barriers in between).

This is particularly useful for things like matrix multiplications, where
for a mat4 buffer variable we cut the number of loads from 16 (4 reads of
each column) down to 4 (1 read of each column).

The pass can only cache ssbo loads that involve constant blocks and
offsets, but could be extended to compare sub-expressions for these
as well, similar to a CSE pass.

The way the cache works is simple: ssbo loads with constant block/offset
are included in a cache as they are seen. Stores invalidate cache entries.
Stores with non-constant offset invalidate all cached loads for the block
and stores with non-constant block invalidate all cache entries. There is
room to improve this by using the actual variable name we are accessing to
limit the entries that should be invalidated. We also need to invalidate
cache entries when we exit the block in which they have been defined
(i.e. inside if/else blocks or loops).

The cache optimization is built as a separate pass, instead of merging it
inside the lower_ubo_reference pass for a number of reasons:

1) The way we process assignments in visitors is that the LHS is
processed before the RHS. This creates a problem for an optimization
such as this when we do things like a = a + 1, since we would see the
store before the read when the actual execution order is reversed.
This could be fixed by re-implementing the logic in the visit_enter
method for ir_assignment in lower_ubo_reference and then returning
visit_continue_with_parent.

2) Some writes/reads need to be split into multiple smaller
writes/reads, and we need to handle caching for each one. This happens
deep inside the code that handles the lowering and some
of the information we need to do this is not available. This could also
be fixed by passing more data into the corresponding functions or by
making this data available as class members, but the current implementation
is already complex enough and  this would only contribute to the complexity.

3) We can have ssbo loads in the LHS too (i.e. a[a[0]] = ..). In these cases
the current code in lower_uo_reference would see the store before the read.
Probably fixable, but again would add more complexity to the lowering.

On the other hand, a separate pass that runs after the lowering sees
all the individal loads and stores in the correct order (so we don't need
to do any tricks) and it allows us to sepearate the lowering logic (which
is already complex) from the caching logic. It also gives us a chance to
run it after other optimization passes have run and turned constant
expressions for block/offset into constants, enabling more opportunities
for caching.
---
 src/glsl/Makefile.sources  |   1 +
 src/glsl/ir_optimization.h |   1 +
 src/glsl/opt_ssbo_load.cpp | 338 +++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 340 insertions(+)
 create mode 100644 src/glsl/opt_ssbo_load.cpp

diff --git a/src/glsl/Makefile.sources b/src/glsl/Makefile.sources
index ca87036..73c7514 100644
--- a/src/glsl/Makefile.sources
+++ b/src/glsl/Makefile.sources
@@ -201,6 +201,7 @@ LIBGLSL_FILES = \
 	opt_noop_swizzle.cpp \
 	opt_rebalance_tree.cpp \
 	opt_redundant_jumps.cpp \
+	opt_ssbo_load.cpp \
 	opt_structure_splitting.cpp \
 	opt_swizzle_swizzle.cpp \
 	opt_tree_grafting.cpp \
diff --git a/src/glsl/ir_optimization.h b/src/glsl/ir_optimization.h
index ce5c492..26677d7 100644
--- a/src/glsl/ir_optimization.h
+++ b/src/glsl/ir_optimization.h
@@ -125,6 +125,7 @@ bool lower_clip_distance(gl_shader *shader);
 void lower_output_reads(unsigned stage, exec_list *instructions);
 bool lower_packing_builtins(exec_list *instructions, int op_mask);
 void lower_ubo_reference(struct gl_shader *shader, exec_list *instructions);
+bool opt_ssbo_loads(struct gl_shader *shader, exec_list *instructions);
 void lower_packed_varyings(void *mem_ctx,
                            unsigned locations_used, ir_variable_mode mode,
                            unsigned gs_input_vertices, gl_shader *shader);
diff --git a/src/glsl/opt_ssbo_load.cpp b/src/glsl/opt_ssbo_load.cpp
new file mode 100644
index 0000000..5404907
--- /dev/null
+++ b/src/glsl/opt_ssbo_load.cpp
@@ -0,0 +1,338 @@
+/*
+ * Copyright © 2015 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file opt_ssbo_load.cpp
+ *
+ * IR optimization pass to reduce the number of SSBO loads by re-using previous
+ * loads when it is safe to do so (i.e. no stores have invalidated the load,
+ * no memory barriers in between, etc)
+ */
+
+#include "ir.h"
+#include "ir_builder.h"
+#include "program/hash_table.h"
+
+using namespace ir_builder;
+
+namespace {
+struct ssbo_load_cache_remove_closure {
+   struct hash_table *ht;
+   const char *key_prefix;
+   unsigned depth;
+};
+
+struct ssbo_load_cache_entry {
+   ir_dereference *load;
+   unsigned depth;
+};
+
+class ssbo_load_cache_visitor : public ir_hierarchical_visitor {
+public:
+   ssbo_load_cache_visitor(gl_shader *shader)
+   : current_depth(0), progress(false)
+   {
+      mem_ctx = ralloc_parent(shader->ir);
+      ssbo_load_cache =
+         hash_table_ctor(0, hash_table_string_hash, hash_table_string_compare);
+   }
+
+   ~ssbo_load_cache_visitor()
+   {
+      hash_table_dtor(ssbo_load_cache);
+   }
+
+   virtual ir_visitor_status visit_enter(ir_call *);
+
+   char *build_ssbo_load_cache_key(unsigned block_index,
+                                   unsigned offset);
+   ir_dereference *ssbo_load_cache_find(ir_rvalue *block_index,
+                                        ir_rvalue *offset);
+   void ssbo_load_cache_add(ir_dereference *deref,
+                            ir_rvalue *block_index,
+                            ir_rvalue *offset,
+                            unsigned depth);
+   void ssbo_load_cache_remove(ir_rvalue *block_index,
+                               ir_rvalue *offset);
+   void ssbo_load_cache_remove_by_key_prefix(const char *prefix);
+   void ssbo_load_cache_remove_all();
+   void ssbo_load_cache_remove_depth(unsigned depth);
+
+   ir_visitor_status visit_enter(ir_if *ir);
+   ir_visitor_status visit_enter(ir_loop *ir);
+   ir_visitor_status visit_leave(ir_loop *ir);
+
+   bool get_progress() { return this->progress; }
+
+private:
+   void *mem_ctx;
+   unsigned current_depth;
+   struct hash_table *ssbo_load_cache;
+   bool progress;
+};
+
+ir_visitor_status
+ssbo_load_cache_visitor::visit_enter(ir_call *ir)
+{
+   if (!ir->callee->is_intrinsic)
+      return visit_continue_with_parent;
+
+   if (!strcmp(ir->callee_name(), "__intrinsic_load_ssbo")) {
+      exec_node *param = ir->actual_parameters.get_head();
+      ir_rvalue *block = ((ir_instruction *)param)->as_rvalue();
+
+      param = param->get_next();
+      ir_rvalue *offset = ((ir_instruction *)param)->as_rvalue();
+
+      ir_dereference *cached_load = ssbo_load_cache_find(block, offset);
+      if (cached_load) {
+         ir_variable *var = ir->return_deref->variable_referenced();
+         base_ir->insert_before(assign(var, cached_load->clone(mem_ctx, NULL)));
+         ir->remove();
+         this->progress = true;
+      } else {
+         ssbo_load_cache_add(ir->return_deref, block, offset,
+                             this->current_depth);
+      }
+   } else if (!strcmp(ir->callee_name(), "__intrinsic_store_ssbo")) {
+      exec_node *param = ir->actual_parameters.get_head();
+      ir_rvalue *block = ((ir_instruction *)param)->as_rvalue();
+
+      param = param->get_next();
+      ir_rvalue *offset = ((ir_instruction *)param)->as_rvalue();
+
+      ssbo_load_cache_remove(block, offset);
+   } else if (strstr(ir->callee_name(), "__intrinsic_ssbo_atomic") ==
+       ir->callee_name()) {
+      exec_node *param = ir->actual_parameters.get_head();
+      ir_rvalue *block = ((ir_instruction *)param)->as_rvalue();
+
+      param = param->get_next();
+      ir_rvalue *offset = ((ir_instruction *)param)->as_rvalue();
+
+      ssbo_load_cache_remove(block, offset);
+   } else if (!strcmp(ir->callee_name(), "__intrinsic_memory_barrier")) {
+      ssbo_load_cache_remove_all();
+   }
+
+   return visit_continue_with_parent;
+}
+
+ir_visitor_status
+ssbo_load_cache_visitor::visit_enter(ir_if *ir)
+{
+   ir->condition->accept(this);
+
+   this->current_depth++;
+
+   if (!ir->then_instructions.is_empty()) {
+      visit_list_elements(this, &ir->then_instructions);
+      ssbo_load_cache_remove_depth(this->current_depth);
+   }
+
+   if (!ir->else_instructions.is_empty()) {
+      visit_list_elements(this, &ir->else_instructions);
+      ssbo_load_cache_remove_depth(this->current_depth);
+   }
+
+   this->current_depth--;
+
+   return visit_continue_with_parent;
+}
+
+ir_visitor_status
+ssbo_load_cache_visitor::visit_enter(ir_loop *ir)
+{
+   this->current_depth++;
+   return visit_continue;
+}
+
+ir_visitor_status
+ssbo_load_cache_visitor::visit_leave(ir_loop *ir)
+{
+   ssbo_load_cache_remove_depth(this->current_depth);
+   this->current_depth--;
+   return visit_continue;
+}
+
+char *
+ssbo_load_cache_visitor::build_ssbo_load_cache_key(unsigned block_index,
+                                                       unsigned offset)
+{
+   return ralloc_asprintf(mem_ctx, "%u-%u", block_index, offset);
+}
+
+ir_dereference *
+ssbo_load_cache_visitor::ssbo_load_cache_find(ir_rvalue *block_index,
+                                              ir_rvalue *offset)
+{
+   ir_constant *const_block_index = block_index->as_constant();
+   if (!const_block_index)
+      return NULL;
+
+   ir_constant *const_offset = offset->as_constant();
+   if (!const_offset)
+      return NULL;
+
+   char *cache_key =
+      build_ssbo_load_cache_key(const_block_index->value.u[0],
+                                const_offset->value.u[0]);
+
+   struct ssbo_load_cache_entry *entry = (struct ssbo_load_cache_entry *)
+      hash_table_find(ssbo_load_cache, cache_key);
+   return entry ? entry->load : NULL;
+}
+
+void
+ssbo_load_cache_visitor::ssbo_load_cache_add(ir_dereference *deref,
+                                             ir_rvalue *block_index,
+                                             ir_rvalue *offset,
+                                             unsigned depth)
+{
+   /* We only support caching SSBO loads with constant block and offset */
+   ir_constant *const_block_index = block_index->as_constant();
+   if (!const_block_index)
+      return;
+
+   ir_constant *const_offset = offset->as_constant();
+   if (!const_offset)
+      return;
+
+   char *cache_key =
+      build_ssbo_load_cache_key(const_block_index->value.u[0],
+                                const_offset->value.u[0]);
+
+   struct ssbo_load_cache_entry *entry = (struct ssbo_load_cache_entry *)
+      ralloc_size(this->mem_ctx, sizeof(struct ssbo_load_cache_entry));
+   entry->load = deref;
+   entry->depth = depth;
+
+   hash_table_replace(ssbo_load_cache, entry, cache_key);
+}
+
+static void
+ssbo_load_cache_remove_callback(const void *key, void *data, void *closure)
+{
+   struct ssbo_load_cache_remove_closure *c =
+      (struct ssbo_load_cache_remove_closure *) closure;
+
+   /* If we have a key_prefix, then we only want to delete entries
+    * with that key prefix. Otherwise, if we have depth > 0 we want
+    * to delete keys with that depth (or larger). Otheriwse (no prefix
+    * and depth == 0) we want to delete all keys.
+    */
+   assert((c->key_prefix && c->depth == 0) ||
+          (!c->key_prefix && c->depth > 0) ||
+          (!c->key_prefix && c->depth == 0));
+
+   if (c->key_prefix && strstr((char *)key, (char *)c->key_prefix) != key)
+      return;
+
+   if (c->depth > 0) {
+      struct ssbo_load_cache_entry *entry =
+         (struct ssbo_load_cache_entry *) data;
+      if (entry->depth < c->depth)
+         return;
+   }
+
+   hash_table_remove(c->ht, key);
+}
+
+void
+ssbo_load_cache_visitor::ssbo_load_cache_remove_by_key_prefix(const char *prefix)
+{
+   struct ssbo_load_cache_remove_closure c;
+   c.ht = ssbo_load_cache;
+   c.key_prefix = prefix;
+   c.depth = 0;
+
+   hash_table_call_foreach(ssbo_load_cache,
+                           ssbo_load_cache_remove_callback, &c);
+}
+
+void
+ssbo_load_cache_visitor::ssbo_load_cache_remove_all()
+{
+   struct ssbo_load_cache_remove_closure c;
+   c.ht = ssbo_load_cache;
+   c.key_prefix = NULL;
+   c.depth = 0;
+
+   hash_table_call_foreach(ssbo_load_cache,
+                           ssbo_load_cache_remove_callback, &c);
+}
+
+void
+ssbo_load_cache_visitor::ssbo_load_cache_remove_depth(unsigned depth)
+{
+   struct ssbo_load_cache_remove_closure c;
+   c.ht = ssbo_load_cache;
+   c.key_prefix = NULL;
+   c.depth = depth;
+
+   hash_table_call_foreach(ssbo_load_cache,
+                           ssbo_load_cache_remove_callback, &c);
+}
+
+void
+ssbo_load_cache_visitor::ssbo_load_cache_remove(ir_rvalue *block_index,
+                                                ir_rvalue *offset)
+{
+   ir_constant *const_block_index = block_index->as_constant();
+   if (!const_block_index) {
+      /* If we don't know the block index, then invalidate the entire cache.
+       * We could try to do better, for example, considering the actual
+       * field name we are accessing in the SSBO in the keys so we only
+       * invalidate those. This requires some work though.
+       */
+      ssbo_load_cache_remove_all();
+      return;
+   }
+
+   ir_constant *const_offset = offset->as_constant();
+   if (!const_offset) {
+      /* We know the block but not the offset, so invalidate all entries
+       * for the given block
+       */
+      ssbo_load_cache_remove_by_key_prefix(
+         ralloc_asprintf(mem_ctx, "%u-", const_block_index->value.u[0]));
+      return;
+   }
+
+   /* We know block and offset, so invalidate that particular load only */
+   char *cache_key =
+      build_ssbo_load_cache_key(const_block_index->value.u[0],
+                                const_offset->value.u[0]);
+
+   hash_table_remove(ssbo_load_cache, cache_key);
+}
+
+} /* Unnamed namespace */
+
+bool
+opt_ssbo_loads(struct gl_shader *shader, exec_list *instructions)
+{
+   ssbo_load_cache_visitor v(shader);
+   visit_list_elements(&v, instructions);
+   return v.get_progress();
+}
-- 
1.9.1



More information about the mesa-dev mailing list