[Mesa-dev] [PATCH v2 10/11] nir: Add an array copy optimization

Caio Marcelo de Oliveira Filho caio.oliveira at intel.com
Wed Aug 22 00:49:43 UTC 2018


On Sat, Jul 28, 2018 at 10:44:41PM -0700, Jason Ekstrand wrote:
> This peephole optimization looks for a series of load/store_deref or
> copy_deref instructions that copy an array from one variable to another
> and turns it into a copy_deref that copies the entire array.  The
> pattern it looks for is extremely specific but it's good enough to pick
> up on the input array copies in DXVK and should also be able to pick up
> the sequence generated by spirv_to_nir for a OpLoad of a large composite
> followed by OpStore.  It can always be improved later if needed.

I've liked this one :-)

I'd add a comment to the file with a subset of this description.  In
particular I'd consider mentioning that the pass expects the
loads/stores/copies to happen in order and also any other
stores/copies to different variables will spoil the detecting.

There are a few minor comments and a comment about store write mask.
Assuming those are resolved this patch is:

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira at intel.com>

Note: the "deref_map" (well, an evolution of it) from the dead write
elimination series, could be used to keep track of multiple of such
copies, with less restrictions.  That said, seems for the case we
about it is not necessary.

(...)

> +static bool
> +opt_find_array_copies_block(nir_builder *b, nir_block *block,
> +                            unsigned *num_ssa_defs, void *mem_ctx)
> +{
> +   bool progress = false;
> +
> +   struct match_state s;
> +   match_state_init(&s);
> +
> +   nir_variable *dst_var = NULL;
> +   unsigned prev_dst_var_last_write = *num_ssa_defs;
> +   unsigned dst_var_last_write = *num_ssa_defs;
> +
> +   nir_foreach_instr(instr, block) {
> +      /* Index the SSA defs before we do anything else. */
> +      nir_foreach_ssa_def(instr, index_ssa_def_cb, num_ssa_defs);
> +
> +      if (instr->type != nir_instr_type_intrinsic)
> +         continue;
> +
> +      nir_intrinsic_instr *intrin = nir_instr_as_intrinsic(instr);
> +      if (intrin->intrinsic != nir_intrinsic_copy_deref &&
> +          intrin->intrinsic != nir_intrinsic_store_deref)
> +         continue;
> +
> +      nir_deref_instr *dst_deref = nir_src_as_deref(intrin->src[0]);

Given it is cheap enough, would make sense to early return if
dst_deref is a var, at least before we make it an active copy?
Similar case for src_deref.  The code is correct (try_match_deref will
fail for variable since it can find an index), so let this up to you.


> +
> +      /* The destination must be local.  If we see a non-local store, we
> +       * continue on because it won't affect local stores or read-only
> +       * variables.
> +       */
> +      if (dst_deref->mode != nir_var_local)
> +         continue;

For correctness, should we reset if this is not a full write mask
store?


> +      /* We keep track of the SSA indicies where the two last-written

Typo here and elsewhere: "indices".


> +       * variables are written.  The prev_dst_var_last_write tells us when
> +       * the last store_deref to something other than dst happened.  If the
> +       * SSA def index from a load is greater than or equal to this number
> +       * then we know it happened afterwards and no writes to anything other
> +       * than dst occur between the load and the current instruction.
> +       */
> +      if (nir_deref_instr_get_variable(dst_deref) != dst_var) {
> +         prev_dst_var_last_write = dst_var_last_write;
> +         dst_var = nir_deref_instr_get_variable(dst_deref);
> +      }
> +      dst_var_last_write = *num_ssa_defs;
> +
> +      nir_deref_instr *src_deref;
> +      if (intrin->intrinsic == nir_intrinsic_copy_deref) {
> +         src_deref = nir_src_as_deref(intrin->src[1]);
> +      } else {
> +         assert(intrin->intrinsic == nir_intrinsic_store_deref);
> +         src_deref = get_deref_for_load_src(intrin->src[1],
> +                                            prev_dst_var_last_write);
> +      }
> +
> +      /* If we didn't find a valid src, then we have an unknown store and it
> +       * could mess things up.
> +       */
> +      if (src_deref == NULL)
> +         goto fail;
> +
> +      /* The destination must be local and the source must be either local or
> +       * something that's guaranteed to be read-only.
> +       */
> +      const nir_variable_mode read_only_modes =
> +         nir_var_shader_in | nir_var_uniform | nir_var_system_value;
> +      if (!(src_deref->mode & (nir_var_local | read_only_modes)))
> +         goto fail;
> +
> +      /* If we don't yet have an active copy, then make this instruction the
> +       * active copy.
> +       */

A note explaining that in the first store we still don't know which
array in the path is being copied would have helped me when reading
this.


> +      if (s.next_array_idx == 0) {
> +         /* We can't combine a copy if there is any chance the source and
> +          * destination will end up aliasing.  Just bail if they're the same
> +          * variable.
> +          */
> +         if (nir_deref_instr_get_variable(dst_deref) ==
> +             nir_deref_instr_get_variable(src_deref))

Could reuse dst_var instead of recalculate.


> +            goto fail;
> +
> +         /* The load/store pair is enough to guarantee the same bit size and
> +          * number of components but a copy_var requires the actual types to
> +          * match.
> +          */
> +         if (dst_deref->type != src_deref->type)
> +            continue;
> +
> +         nir_deref_path_init(&s.first_dst_path, dst_deref, mem_ctx);
> +         nir_deref_path_init(&s.first_src_path, src_deref, mem_ctx);
> +         s.next_array_idx = 1;
> +         continue;
> +      }
> +
> +      if (!try_match_deref(&s.first_dst_path, &s.dst_deref_array_idx,
> +                           dst_deref, s.next_array_idx, mem_ctx) ||
> +          !try_match_deref(&s.first_src_path, &s.src_deref_array_idx,
> +                           src_deref, s.next_array_idx, mem_ctx))
> +         goto fail;
> +
> +      if (s.next_array_idx == 1) {
> +         /* This is our first non-trivial match.  We now have indicies into
> +          * the search paths so we can do a couple more checks.
> +          */
> +         assert(s.dst_deref_array_idx > 0 && s.src_deref_array_idx > 0);
> +         const struct glsl_type *dst_arr_type =
> +            s.first_dst_path.path[s.dst_deref_array_idx - 1]->type;
> +         const struct glsl_type *src_arr_type =
> +            s.first_src_path.path[s.src_deref_array_idx - 1]->type;
> +
> +         assert(glsl_type_is_array(dst_arr_type) ||
> +                glsl_type_is_matrix(dst_arr_type));
> +         assert(glsl_type_is_array(src_arr_type) ||
> +                glsl_type_is_matrix(src_arr_type));
> +
> +         /* They must be the same length */
> +         s.array_len = glsl_get_length(dst_arr_type);
> +         if (s.array_len != glsl_get_length(src_arr_type))
> +            goto fail;
> +      }
> +
> +      s.next_array_idx++;
> +
> +      if (s.next_array_idx == s.array_len) {
> +         /* Hooray, We found a copy! */
> +         b->cursor = nir_after_instr(instr);
> +         nir_copy_deref(b, build_wildcard_deref(b, &s.first_dst_path,
> +                                                s.dst_deref_array_idx),
> +                           build_wildcard_deref(b, &s.first_src_path,
> +                                                s.src_deref_array_idx));
> +         match_state_reset(&s);
> +         progress = true;
> +      }
> +
> +      continue;
> +
> +   fail:
> +      match_state_reset(&s);
> +   }

Maybe call this label "reset:" instead of "fail:".

(...)


Thanks,
Caio


More information about the mesa-dev mailing list