[Mesa-dev] [PATCH v2 10/11] nir: Add an array copy optimization
Jason Ekstrand
jason at jlekstrand.net
Wed Aug 22 16:52:29 UTC 2018
On Tue, Aug 21, 2018 at 7:49 PM Caio Marcelo de Oliveira Filho <
caio.oliveira at intel.com> wrote:
> On Sat, Jul 28, 2018 at 10:44:41PM -0700, Jason Ekstrand wrote:
> > This peephole optimization looks for a series of load/store_deref or
> > copy_deref instructions that copy an array from one variable to another
> > and turns it into a copy_deref that copies the entire array. The
> > pattern it looks for is extremely specific but it's good enough to pick
> > up on the input array copies in DXVK and should also be able to pick up
> > the sequence generated by spirv_to_nir for a OpLoad of a large composite
> > followed by OpStore. It can always be improved later if needed.
>
> I've liked this one :-)
>
> I'd add a comment to the file with a subset of this description. In
> particular I'd consider mentioning that the pass expects the
> loads/stores/copies to happen in order and also any other
> stores/copies to different variables will spoil the detecting.
>
> There are a few minor comments and a comment about store write mask.
> Assuming those are resolved this patch is:
>
> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira at intel.com>
>
I've applied your suggestions. Please double-check:
https://gitlab.freedesktop.org/jekstrand/mesa/commit/28c01e9a0ce84ff53fa8805e4a35a691ea3fc744
> Note: the "deref_map" (well, an evolution of it) from the dead write
> elimination series, could be used to keep track of multiple of such
> copies, with less restrictions. That said, seems for the case we
> about it is not necessary.
>
> (...)
>
> > +static bool
> > +opt_find_array_copies_block(nir_builder *b, nir_block *block,
> > + unsigned *num_ssa_defs, void *mem_ctx)
> > +{
> > + bool progress = false;
> > +
> > + struct match_state s;
> > + match_state_init(&s);
> > +
> > + nir_variable *dst_var = NULL;
> > + unsigned prev_dst_var_last_write = *num_ssa_defs;
> > + unsigned dst_var_last_write = *num_ssa_defs;
> > +
> > + nir_foreach_instr(instr, block) {
> > + /* Index the SSA defs before we do anything else. */
> > + nir_foreach_ssa_def(instr, index_ssa_def_cb, num_ssa_defs);
> > +
> > + if (instr->type != nir_instr_type_intrinsic)
> > + continue;
> > +
> > + nir_intrinsic_instr *intrin = nir_instr_as_intrinsic(instr);
> > + if (intrin->intrinsic != nir_intrinsic_copy_deref &&
> > + intrin->intrinsic != nir_intrinsic_store_deref)
> > + continue;
> > +
> > + nir_deref_instr *dst_deref = nir_src_as_deref(intrin->src[0]);
>
> Given it is cheap enough, would make sense to early return if
> dst_deref is a var, at least before we make it an active copy?
> Similar case for src_deref. The code is correct (try_match_deref will
> fail for variable since it can find an index), so let this up to you.
>
I think we probably could. I'll add something.
> > +
> > + /* The destination must be local. If we see a non-local store, we
> > + * continue on because it won't affect local stores or read-only
> > + * variables.
> > + */
> > + if (dst_deref->mode != nir_var_local)
> > + continue;
>
> For correctness, should we reset if this is not a full write mask
> store?
>
Yes, we should. I'm not sure if that can happen in practice right now but
we should put it in there.
> > + /* We keep track of the SSA indicies where the two last-written
>
> Typo here and elsewhere: "indices".
>
Fixed. Thanks!
> > + * variables are written. The prev_dst_var_last_write tells us
> when
> > + * the last store_deref to something other than dst happened. If
> the
> > + * SSA def index from a load is greater than or equal to this
> number
> > + * then we know it happened afterwards and no writes to anything
> other
> > + * than dst occur between the load and the current instruction.
> > + */
> > + if (nir_deref_instr_get_variable(dst_deref) != dst_var) {
> > + prev_dst_var_last_write = dst_var_last_write;
> > + dst_var = nir_deref_instr_get_variable(dst_deref);
> > + }
> > + dst_var_last_write = *num_ssa_defs;
> > +
> > + nir_deref_instr *src_deref;
> > + if (intrin->intrinsic == nir_intrinsic_copy_deref) {
> > + src_deref = nir_src_as_deref(intrin->src[1]);
> > + } else {
> > + assert(intrin->intrinsic == nir_intrinsic_store_deref);
> > + src_deref = get_deref_for_load_src(intrin->src[1],
> > + prev_dst_var_last_write);
> > + }
> > +
> > + /* If we didn't find a valid src, then we have an unknown store
> and it
> > + * could mess things up.
> > + */
> > + if (src_deref == NULL)
> > + goto fail;
> > +
> > + /* The destination must be local and the source must be either
> local or
> > + * something that's guaranteed to be read-only.
> > + */
> > + const nir_variable_mode read_only_modes =
> > + nir_var_shader_in | nir_var_uniform | nir_var_system_value;
> > + if (!(src_deref->mode & (nir_var_local | read_only_modes)))
> > + goto fail;
> > +
> > + /* If we don't yet have an active copy, then make this
> instruction the
> > + * active copy.
> > + */
>
> A note explaining that in the first store we still don't know which
> array in the path is being copied would have helped me when reading
> this.
>
Done.
> > + if (s.next_array_idx == 0) {
> > + /* We can't combine a copy if there is any chance the source
> and
> > + * destination will end up aliasing. Just bail if they're the
> same
> > + * variable.
> > + */
> > + if (nir_deref_instr_get_variable(dst_deref) ==
> > + nir_deref_instr_get_variable(src_deref))
>
> Could reuse dst_var instead of recalculate.
>
Sure.
> > + goto fail;
> > +
> > + /* The load/store pair is enough to guarantee the same bit
> size and
> > + * number of components but a copy_var requires the actual
> types to
> > + * match.
> > + */
> > + if (dst_deref->type != src_deref->type)
> > + continue;
> > +
> > + nir_deref_path_init(&s.first_dst_path, dst_deref, mem_ctx);
> > + nir_deref_path_init(&s.first_src_path, src_deref, mem_ctx);
> > + s.next_array_idx = 1;
> > + continue;
> > + }
> > +
> > + if (!try_match_deref(&s.first_dst_path, &s.dst_deref_array_idx,
> > + dst_deref, s.next_array_idx, mem_ctx) ||
> > + !try_match_deref(&s.first_src_path, &s.src_deref_array_idx,
> > + src_deref, s.next_array_idx, mem_ctx))
> > + goto fail;
> > +
> > + if (s.next_array_idx == 1) {
> > + /* This is our first non-trivial match. We now have indicies
> into
> > + * the search paths so we can do a couple more checks.
> > + */
> > + assert(s.dst_deref_array_idx > 0 && s.src_deref_array_idx > 0);
> > + const struct glsl_type *dst_arr_type =
> > + s.first_dst_path.path[s.dst_deref_array_idx - 1]->type;
> > + const struct glsl_type *src_arr_type =
> > + s.first_src_path.path[s.src_deref_array_idx - 1]->type;
> > +
> > + assert(glsl_type_is_array(dst_arr_type) ||
> > + glsl_type_is_matrix(dst_arr_type));
> > + assert(glsl_type_is_array(src_arr_type) ||
> > + glsl_type_is_matrix(src_arr_type));
> > +
> > + /* They must be the same length */
> > + s.array_len = glsl_get_length(dst_arr_type);
> > + if (s.array_len != glsl_get_length(src_arr_type))
> > + goto fail;
> > + }
> > +
> > + s.next_array_idx++;
> > +
> > + if (s.next_array_idx == s.array_len) {
> > + /* Hooray, We found a copy! */
> > + b->cursor = nir_after_instr(instr);
> > + nir_copy_deref(b, build_wildcard_deref(b, &s.first_dst_path,
> > + s.dst_deref_array_idx),
> > + build_wildcard_deref(b, &s.first_src_path,
> > + s.src_deref_array_idx));
> > + match_state_reset(&s);
> > + progress = true;
> > + }
> > +
> > + continue;
> > +
> > + fail:
> > + match_state_reset(&s);
> > + }
>
> Maybe call this label "reset:" instead of "fail:".
>
Sure. That's reasonable.
--Jason
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20180822/c01f95ca/attachment-0001.html>
More information about the mesa-dev
mailing list