[Mesa-dev] [PATCH v2 10/11] nir: Add an array copy optimization

Jason Ekstrand jason at jlekstrand.net
Wed Aug 22 16:52:29 UTC 2018


On Tue, Aug 21, 2018 at 7:49 PM Caio Marcelo de Oliveira Filho <
caio.oliveira at intel.com> wrote:

> On Sat, Jul 28, 2018 at 10:44:41PM -0700, Jason Ekstrand wrote:
> > This peephole optimization looks for a series of load/store_deref or
> > copy_deref instructions that copy an array from one variable to another
> > and turns it into a copy_deref that copies the entire array.  The
> > pattern it looks for is extremely specific but it's good enough to pick
> > up on the input array copies in DXVK and should also be able to pick up
> > the sequence generated by spirv_to_nir for a OpLoad of a large composite
> > followed by OpStore.  It can always be improved later if needed.
>
> I've liked this one :-)
>
> I'd add a comment to the file with a subset of this description.  In
> particular I'd consider mentioning that the pass expects the
> loads/stores/copies to happen in order and also any other
> stores/copies to different variables will spoil the detecting.
>
> There are a few minor comments and a comment about store write mask.
> Assuming those are resolved this patch is:
>
> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira at intel.com>
>

I've applied your suggestions.  Please double-check:

https://gitlab.freedesktop.org/jekstrand/mesa/commit/28c01e9a0ce84ff53fa8805e4a35a691ea3fc744


> Note: the "deref_map" (well, an evolution of it) from the dead write
> elimination series, could be used to keep track of multiple of such
> copies, with less restrictions.  That said, seems for the case we
> about it is not necessary.
>
> (...)
>
> > +static bool
> > +opt_find_array_copies_block(nir_builder *b, nir_block *block,
> > +                            unsigned *num_ssa_defs, void *mem_ctx)
> > +{
> > +   bool progress = false;
> > +
> > +   struct match_state s;
> > +   match_state_init(&s);
> > +
> > +   nir_variable *dst_var = NULL;
> > +   unsigned prev_dst_var_last_write = *num_ssa_defs;
> > +   unsigned dst_var_last_write = *num_ssa_defs;
> > +
> > +   nir_foreach_instr(instr, block) {
> > +      /* Index the SSA defs before we do anything else. */
> > +      nir_foreach_ssa_def(instr, index_ssa_def_cb, num_ssa_defs);
> > +
> > +      if (instr->type != nir_instr_type_intrinsic)
> > +         continue;
> > +
> > +      nir_intrinsic_instr *intrin = nir_instr_as_intrinsic(instr);
> > +      if (intrin->intrinsic != nir_intrinsic_copy_deref &&
> > +          intrin->intrinsic != nir_intrinsic_store_deref)
> > +         continue;
> > +
> > +      nir_deref_instr *dst_deref = nir_src_as_deref(intrin->src[0]);
>
> Given it is cheap enough, would make sense to early return if
> dst_deref is a var, at least before we make it an active copy?
> Similar case for src_deref.  The code is correct (try_match_deref will
> fail for variable since it can find an index), so let this up to you.
>

I think we probably could.  I'll add something.


> > +
> > +      /* The destination must be local.  If we see a non-local store, we
> > +       * continue on because it won't affect local stores or read-only
> > +       * variables.
> > +       */
> > +      if (dst_deref->mode != nir_var_local)
> > +         continue;
>
> For correctness, should we reset if this is not a full write mask
> store?
>

Yes, we should.  I'm not sure if that can happen in practice right now but
we should put it in there.


> > +      /* We keep track of the SSA indicies where the two last-written
>
> Typo here and elsewhere: "indices".
>

Fixed. Thanks!


> > +       * variables are written.  The prev_dst_var_last_write tells us
> when
> > +       * the last store_deref to something other than dst happened.  If
> the
> > +       * SSA def index from a load is greater than or equal to this
> number
> > +       * then we know it happened afterwards and no writes to anything
> other
> > +       * than dst occur between the load and the current instruction.
> > +       */
> > +      if (nir_deref_instr_get_variable(dst_deref) != dst_var) {
> > +         prev_dst_var_last_write = dst_var_last_write;
> > +         dst_var = nir_deref_instr_get_variable(dst_deref);
> > +      }
> > +      dst_var_last_write = *num_ssa_defs;
> > +
> > +      nir_deref_instr *src_deref;
> > +      if (intrin->intrinsic == nir_intrinsic_copy_deref) {
> > +         src_deref = nir_src_as_deref(intrin->src[1]);
> > +      } else {
> > +         assert(intrin->intrinsic == nir_intrinsic_store_deref);
> > +         src_deref = get_deref_for_load_src(intrin->src[1],
> > +                                            prev_dst_var_last_write);
> > +      }
> > +
> > +      /* If we didn't find a valid src, then we have an unknown store
> and it
> > +       * could mess things up.
> > +       */
> > +      if (src_deref == NULL)
> > +         goto fail;
> > +
> > +      /* The destination must be local and the source must be either
> local or
> > +       * something that's guaranteed to be read-only.
> > +       */
> > +      const nir_variable_mode read_only_modes =
> > +         nir_var_shader_in | nir_var_uniform | nir_var_system_value;
> > +      if (!(src_deref->mode & (nir_var_local | read_only_modes)))
> > +         goto fail;
> > +
> > +      /* If we don't yet have an active copy, then make this
> instruction the
> > +       * active copy.
> > +       */
>
> A note explaining that in the first store we still don't know which
> array in the path is being copied would have helped me when reading
> this.
>

Done.


> > +      if (s.next_array_idx == 0) {
> > +         /* We can't combine a copy if there is any chance the source
> and
> > +          * destination will end up aliasing.  Just bail if they're the
> same
> > +          * variable.
> > +          */
> > +         if (nir_deref_instr_get_variable(dst_deref) ==
> > +             nir_deref_instr_get_variable(src_deref))
>
> Could reuse dst_var instead of recalculate.
>

Sure.


> > +            goto fail;
> > +
> > +         /* The load/store pair is enough to guarantee the same bit
> size and
> > +          * number of components but a copy_var requires the actual
> types to
> > +          * match.
> > +          */
> > +         if (dst_deref->type != src_deref->type)
> > +            continue;
> > +
> > +         nir_deref_path_init(&s.first_dst_path, dst_deref, mem_ctx);
> > +         nir_deref_path_init(&s.first_src_path, src_deref, mem_ctx);
> > +         s.next_array_idx = 1;
> > +         continue;
> > +      }
> > +
> > +      if (!try_match_deref(&s.first_dst_path, &s.dst_deref_array_idx,
> > +                           dst_deref, s.next_array_idx, mem_ctx) ||
> > +          !try_match_deref(&s.first_src_path, &s.src_deref_array_idx,
> > +                           src_deref, s.next_array_idx, mem_ctx))
> > +         goto fail;
> > +
> > +      if (s.next_array_idx == 1) {
> > +         /* This is our first non-trivial match.  We now have indicies
> into
> > +          * the search paths so we can do a couple more checks.
> > +          */
> > +         assert(s.dst_deref_array_idx > 0 && s.src_deref_array_idx > 0);
> > +         const struct glsl_type *dst_arr_type =
> > +            s.first_dst_path.path[s.dst_deref_array_idx - 1]->type;
> > +         const struct glsl_type *src_arr_type =
> > +            s.first_src_path.path[s.src_deref_array_idx - 1]->type;
> > +
> > +         assert(glsl_type_is_array(dst_arr_type) ||
> > +                glsl_type_is_matrix(dst_arr_type));
> > +         assert(glsl_type_is_array(src_arr_type) ||
> > +                glsl_type_is_matrix(src_arr_type));
> > +
> > +         /* They must be the same length */
> > +         s.array_len = glsl_get_length(dst_arr_type);
> > +         if (s.array_len != glsl_get_length(src_arr_type))
> > +            goto fail;
> > +      }
> > +
> > +      s.next_array_idx++;
> > +
> > +      if (s.next_array_idx == s.array_len) {
> > +         /* Hooray, We found a copy! */
> > +         b->cursor = nir_after_instr(instr);
> > +         nir_copy_deref(b, build_wildcard_deref(b, &s.first_dst_path,
> > +                                                s.dst_deref_array_idx),
> > +                           build_wildcard_deref(b, &s.first_src_path,
> > +                                                s.src_deref_array_idx));
> > +         match_state_reset(&s);
> > +         progress = true;
> > +      }
> > +
> > +      continue;
> > +
> > +   fail:
> > +      match_state_reset(&s);
> > +   }
>
> Maybe call this label "reset:" instead of "fail:".
>

Sure.  That's reasonable.

--Jason
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20180822/c01f95ca/attachment-0001.html>


More information about the mesa-dev mailing list