[Mesa-dev] [PATCH 1/2] glsl: Implement a SSBO load optimization pass

Francisco Jerez currojerez at riseup.net
Tue Oct 20 03:22:49 PDT 2015


Iago Toral Quiroga <itoral at igalia.com> writes:

> This allows us to re-use the results of previous ssbo loads in situations
> that are safe (i.e. when there are no stores, atomic operations or
> memory barriers in between).
>
> This is particularly useful for things like matrix multiplications, where
> for a mat4 buffer variable we cut the number of loads from 16 (4 reads of
> each column) down to 4 (1 read of each column).
>
> The pass can only cache ssbo loads that involve constant blocks and
> offsets, but could be extended to compare sub-expressions for these
> as well, similar to a CSE pass.
>
> The way the cache works is simple: ssbo loads with constant block/offset
> are included in a cache as they are seen. Stores invalidate cache entries.
> Stores with non-constant offset invalidate all cached loads for the block
> and stores with non-constant block invalidate all cache entries. There is
> room to improve this by using the actual variable name we are accessing to
> limit the entries that should be invalidated. We also need to invalidate
> cache entries when we exit the block in which they have been defined
> (i.e. inside if/else blocks or loops).
>
> The cache optimization is built as a separate pass, instead of merging it
> inside the lower_ubo_reference pass for a number of reasons:
>
> 1) The way we process assignments in visitors is that the LHS is
> processed before the RHS. This creates a problem for an optimization
> such as this when we do things like a = a + 1, since we would see the
> store before the read when the actual execution order is reversed.
> This could be fixed by re-implementing the logic in the visit_enter
> method for ir_assignment in lower_ubo_reference and then returning
> visit_continue_with_parent.
>
> 2) Some writes/reads need to be split into multiple smaller
> writes/reads, and we need to handle caching for each one. This happens
> deep inside the code that handles the lowering and some
> of the information we need to do this is not available. This could also
> be fixed by passing more data into the corresponding functions or by
> making this data available as class members, but the current implementation
> is already complex enough and  this would only contribute to the complexity.
>
> 3) We can have ssbo loads in the LHS too (i.e. a[a[0]] = ..). In these cases
> the current code in lower_uo_reference would see the store before the read.
> Probably fixable, but again would add more complexity to the lowering.
>
> On the other hand, a separate pass that runs after the lowering sees
> all the individal loads and stores in the correct order (so we don't need
> to do any tricks) and it allows us to sepearate the lowering logic (which
> is already complex) from the caching logic. It also gives us a chance to
> run it after other optimization passes have run and turned constant
> expressions for block/offset into constants, enabling more opportunities
> for caching.

Seems like a restricted form of CSE that only handles SSBO loads, and
only the ones with constant arguments.  Why don't we CSE these? (and
other memory access operations like image loads)


> ---
>  src/glsl/Makefile.sources  |   1 +
>  src/glsl/ir_optimization.h |   1 +
>  src/glsl/opt_ssbo_load.cpp | 338 +++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 340 insertions(+)
>  create mode 100644 src/glsl/opt_ssbo_load.cpp
>
> diff --git a/src/glsl/Makefile.sources b/src/glsl/Makefile.sources
> index ca87036..73c7514 100644
> --- a/src/glsl/Makefile.sources
> +++ b/src/glsl/Makefile.sources
> @@ -201,6 +201,7 @@ LIBGLSL_FILES = \
>  	opt_noop_swizzle.cpp \
>  	opt_rebalance_tree.cpp \
>  	opt_redundant_jumps.cpp \
> +	opt_ssbo_load.cpp \
>  	opt_structure_splitting.cpp \
>  	opt_swizzle_swizzle.cpp \
>  	opt_tree_grafting.cpp \
> diff --git a/src/glsl/ir_optimization.h b/src/glsl/ir_optimization.h
> index ce5c492..26677d7 100644
> --- a/src/glsl/ir_optimization.h
> +++ b/src/glsl/ir_optimization.h
> @@ -125,6 +125,7 @@ bool lower_clip_distance(gl_shader *shader);
>  void lower_output_reads(unsigned stage, exec_list *instructions);
>  bool lower_packing_builtins(exec_list *instructions, int op_mask);
>  void lower_ubo_reference(struct gl_shader *shader, exec_list *instructions);
> +bool opt_ssbo_loads(struct gl_shader *shader, exec_list *instructions);
>  void lower_packed_varyings(void *mem_ctx,
>                             unsigned locations_used, ir_variable_mode mode,
>                             unsigned gs_input_vertices, gl_shader *shader);
> diff --git a/src/glsl/opt_ssbo_load.cpp b/src/glsl/opt_ssbo_load.cpp
> new file mode 100644
> index 0000000..5404907
> --- /dev/null
> +++ b/src/glsl/opt_ssbo_load.cpp
> @@ -0,0 +1,338 @@
> +/*
> + * Copyright © 2015 Intel Corporation
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
> + * DEALINGS IN THE SOFTWARE.
> + */
> +
> +/**
> + * \file opt_ssbo_load.cpp
> + *
> + * IR optimization pass to reduce the number of SSBO loads by re-using previous
> + * loads when it is safe to do so (i.e. no stores have invalidated the load,
> + * no memory barriers in between, etc)
> + */
> +
> +#include "ir.h"
> +#include "ir_builder.h"
> +#include "program/hash_table.h"
> +
> +using namespace ir_builder;
> +
> +namespace {
> +struct ssbo_load_cache_remove_closure {
> +   struct hash_table *ht;
> +   const char *key_prefix;
> +   unsigned depth;
> +};
> +
> +struct ssbo_load_cache_entry {
> +   ir_dereference *load;
> +   unsigned depth;
> +};
> +
> +class ssbo_load_cache_visitor : public ir_hierarchical_visitor {
> +public:
> +   ssbo_load_cache_visitor(gl_shader *shader)
> +   : current_depth(0), progress(false)
> +   {
> +      mem_ctx = ralloc_parent(shader->ir);
> +      ssbo_load_cache =
> +         hash_table_ctor(0, hash_table_string_hash, hash_table_string_compare);
> +   }
> +
> +   ~ssbo_load_cache_visitor()
> +   {
> +      hash_table_dtor(ssbo_load_cache);
> +   }
> +
> +   virtual ir_visitor_status visit_enter(ir_call *);
> +
> +   char *build_ssbo_load_cache_key(unsigned block_index,
> +                                   unsigned offset);
> +   ir_dereference *ssbo_load_cache_find(ir_rvalue *block_index,
> +                                        ir_rvalue *offset);
> +   void ssbo_load_cache_add(ir_dereference *deref,
> +                            ir_rvalue *block_index,
> +                            ir_rvalue *offset,
> +                            unsigned depth);
> +   void ssbo_load_cache_remove(ir_rvalue *block_index,
> +                               ir_rvalue *offset);
> +   void ssbo_load_cache_remove_by_key_prefix(const char *prefix);
> +   void ssbo_load_cache_remove_all();
> +   void ssbo_load_cache_remove_depth(unsigned depth);
> +
> +   ir_visitor_status visit_enter(ir_if *ir);
> +   ir_visitor_status visit_enter(ir_loop *ir);
> +   ir_visitor_status visit_leave(ir_loop *ir);
> +
> +   bool get_progress() { return this->progress; }
> +
> +private:
> +   void *mem_ctx;
> +   unsigned current_depth;
> +   struct hash_table *ssbo_load_cache;
> +   bool progress;
> +};
> +
> +ir_visitor_status
> +ssbo_load_cache_visitor::visit_enter(ir_call *ir)
> +{
> +   if (!ir->callee->is_intrinsic)
> +      return visit_continue_with_parent;
> +
> +   if (!strcmp(ir->callee_name(), "__intrinsic_load_ssbo")) {
> +      exec_node *param = ir->actual_parameters.get_head();
> +      ir_rvalue *block = ((ir_instruction *)param)->as_rvalue();
> +
> +      param = param->get_next();
> +      ir_rvalue *offset = ((ir_instruction *)param)->as_rvalue();
> +
> +      ir_dereference *cached_load = ssbo_load_cache_find(block, offset);
> +      if (cached_load) {
> +         ir_variable *var = ir->return_deref->variable_referenced();
> +         base_ir->insert_before(assign(var, cached_load->clone(mem_ctx, NULL)));
> +         ir->remove();
> +         this->progress = true;
> +      } else {
> +         ssbo_load_cache_add(ir->return_deref, block, offset,
> +                             this->current_depth);
> +      }
> +   } else if (!strcmp(ir->callee_name(), "__intrinsic_store_ssbo")) {
> +      exec_node *param = ir->actual_parameters.get_head();
> +      ir_rvalue *block = ((ir_instruction *)param)->as_rvalue();
> +
> +      param = param->get_next();
> +      ir_rvalue *offset = ((ir_instruction *)param)->as_rvalue();
> +
> +      ssbo_load_cache_remove(block, offset);
> +   } else if (strstr(ir->callee_name(), "__intrinsic_ssbo_atomic") ==
> +       ir->callee_name()) {
> +      exec_node *param = ir->actual_parameters.get_head();
> +      ir_rvalue *block = ((ir_instruction *)param)->as_rvalue();
> +
> +      param = param->get_next();
> +      ir_rvalue *offset = ((ir_instruction *)param)->as_rvalue();
> +
> +      ssbo_load_cache_remove(block, offset);
> +   } else if (!strcmp(ir->callee_name(), "__intrinsic_memory_barrier")) {
> +      ssbo_load_cache_remove_all();
> +   }
> +
> +   return visit_continue_with_parent;
> +}
> +
> +ir_visitor_status
> +ssbo_load_cache_visitor::visit_enter(ir_if *ir)
> +{
> +   ir->condition->accept(this);
> +
> +   this->current_depth++;
> +
> +   if (!ir->then_instructions.is_empty()) {
> +      visit_list_elements(this, &ir->then_instructions);
> +      ssbo_load_cache_remove_depth(this->current_depth);
> +   }
> +
> +   if (!ir->else_instructions.is_empty()) {
> +      visit_list_elements(this, &ir->else_instructions);
> +      ssbo_load_cache_remove_depth(this->current_depth);
> +   }
> +
> +   this->current_depth--;
> +
> +   return visit_continue_with_parent;
> +}
> +
> +ir_visitor_status
> +ssbo_load_cache_visitor::visit_enter(ir_loop *ir)
> +{
> +   this->current_depth++;
> +   return visit_continue;
> +}
> +
> +ir_visitor_status
> +ssbo_load_cache_visitor::visit_leave(ir_loop *ir)
> +{
> +   ssbo_load_cache_remove_depth(this->current_depth);
> +   this->current_depth--;
> +   return visit_continue;
> +}
> +
> +char *
> +ssbo_load_cache_visitor::build_ssbo_load_cache_key(unsigned block_index,
> +                                                       unsigned offset)
> +{
> +   return ralloc_asprintf(mem_ctx, "%u-%u", block_index, offset);
> +}
> +
> +ir_dereference *
> +ssbo_load_cache_visitor::ssbo_load_cache_find(ir_rvalue *block_index,
> +                                              ir_rvalue *offset)
> +{
> +   ir_constant *const_block_index = block_index->as_constant();
> +   if (!const_block_index)
> +      return NULL;
> +
> +   ir_constant *const_offset = offset->as_constant();
> +   if (!const_offset)
> +      return NULL;
> +
> +   char *cache_key =
> +      build_ssbo_load_cache_key(const_block_index->value.u[0],
> +                                const_offset->value.u[0]);
> +
> +   struct ssbo_load_cache_entry *entry = (struct ssbo_load_cache_entry *)
> +      hash_table_find(ssbo_load_cache, cache_key);
> +   return entry ? entry->load : NULL;
> +}
> +
> +void
> +ssbo_load_cache_visitor::ssbo_load_cache_add(ir_dereference *deref,
> +                                             ir_rvalue *block_index,
> +                                             ir_rvalue *offset,
> +                                             unsigned depth)
> +{
> +   /* We only support caching SSBO loads with constant block and offset */
> +   ir_constant *const_block_index = block_index->as_constant();
> +   if (!const_block_index)
> +      return;
> +
> +   ir_constant *const_offset = offset->as_constant();
> +   if (!const_offset)
> +      return;
> +
> +   char *cache_key =
> +      build_ssbo_load_cache_key(const_block_index->value.u[0],
> +                                const_offset->value.u[0]);
> +
> +   struct ssbo_load_cache_entry *entry = (struct ssbo_load_cache_entry *)
> +      ralloc_size(this->mem_ctx, sizeof(struct ssbo_load_cache_entry));
> +   entry->load = deref;
> +   entry->depth = depth;
> +
> +   hash_table_replace(ssbo_load_cache, entry, cache_key);
> +}
> +
> +static void
> +ssbo_load_cache_remove_callback(const void *key, void *data, void *closure)
> +{
> +   struct ssbo_load_cache_remove_closure *c =
> +      (struct ssbo_load_cache_remove_closure *) closure;
> +
> +   /* If we have a key_prefix, then we only want to delete entries
> +    * with that key prefix. Otherwise, if we have depth > 0 we want
> +    * to delete keys with that depth (or larger). Otheriwse (no prefix
> +    * and depth == 0) we want to delete all keys.
> +    */
> +   assert((c->key_prefix && c->depth == 0) ||
> +          (!c->key_prefix && c->depth > 0) ||
> +          (!c->key_prefix && c->depth == 0));
> +
> +   if (c->key_prefix && strstr((char *)key, (char *)c->key_prefix) != key)
> +      return;
> +
> +   if (c->depth > 0) {
> +      struct ssbo_load_cache_entry *entry =
> +         (struct ssbo_load_cache_entry *) data;
> +      if (entry->depth < c->depth)
> +         return;
> +   }
> +
> +   hash_table_remove(c->ht, key);
> +}
> +
> +void
> +ssbo_load_cache_visitor::ssbo_load_cache_remove_by_key_prefix(const char *prefix)
> +{
> +   struct ssbo_load_cache_remove_closure c;
> +   c.ht = ssbo_load_cache;
> +   c.key_prefix = prefix;
> +   c.depth = 0;
> +
> +   hash_table_call_foreach(ssbo_load_cache,
> +                           ssbo_load_cache_remove_callback, &c);
> +}
> +
> +void
> +ssbo_load_cache_visitor::ssbo_load_cache_remove_all()
> +{
> +   struct ssbo_load_cache_remove_closure c;
> +   c.ht = ssbo_load_cache;
> +   c.key_prefix = NULL;
> +   c.depth = 0;
> +
> +   hash_table_call_foreach(ssbo_load_cache,
> +                           ssbo_load_cache_remove_callback, &c);
> +}
> +
> +void
> +ssbo_load_cache_visitor::ssbo_load_cache_remove_depth(unsigned depth)
> +{
> +   struct ssbo_load_cache_remove_closure c;
> +   c.ht = ssbo_load_cache;
> +   c.key_prefix = NULL;
> +   c.depth = depth;
> +
> +   hash_table_call_foreach(ssbo_load_cache,
> +                           ssbo_load_cache_remove_callback, &c);
> +}
> +
> +void
> +ssbo_load_cache_visitor::ssbo_load_cache_remove(ir_rvalue *block_index,
> +                                                ir_rvalue *offset)
> +{
> +   ir_constant *const_block_index = block_index->as_constant();
> +   if (!const_block_index) {
> +      /* If we don't know the block index, then invalidate the entire cache.
> +       * We could try to do better, for example, considering the actual
> +       * field name we are accessing in the SSBO in the keys so we only
> +       * invalidate those. This requires some work though.
> +       */
> +      ssbo_load_cache_remove_all();
> +      return;
> +   }
> +
> +   ir_constant *const_offset = offset->as_constant();
> +   if (!const_offset) {
> +      /* We know the block but not the offset, so invalidate all entries
> +       * for the given block
> +       */
> +      ssbo_load_cache_remove_by_key_prefix(
> +         ralloc_asprintf(mem_ctx, "%u-", const_block_index->value.u[0]));
> +      return;
> +   }
> +
> +   /* We know block and offset, so invalidate that particular load only */
> +   char *cache_key =
> +      build_ssbo_load_cache_key(const_block_index->value.u[0],
> +                                const_offset->value.u[0]);
> +
> +   hash_table_remove(ssbo_load_cache, cache_key);
> +}
> +
> +} /* Unnamed namespace */
> +
> +bool
> +opt_ssbo_loads(struct gl_shader *shader, exec_list *instructions)
> +{
> +   ssbo_load_cache_visitor v(shader);
> +   visit_list_elements(&v, instructions);
> +   return v.get_progress();
> +}
> -- 
> 1.9.1
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 212 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20151020/1cc5faca/attachment.sig>


More information about the mesa-dev mailing list