<div dir="ltr"><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Feb 14, 2019 at 7:25 PM Timothy Arceri <<a href="mailto:tarceri@itsqueeze.com">tarceri@itsqueeze.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On 15/2/19 7:51 am, Jason Ekstrand wrote:<br>
> ping?<br>
<br>
You seem to be asking me to rewrite the entire meat of the pass without <br>
any suggestion of how to achieve it, I didn't know how to reply.<br></blockquote><div><br></div><div>Right... Sorry, that wasn't incredibly helpful.  Perhaps I should have taken a bit more time to explain my thoughts more fully.  Yes, write-combining is a hard problem in general except that we have one big advantage.  Because we always run nir_lower_io_to_temporaries on outputs in all stages except TCS, we should be guaranteed two things:</div><div><br></div><div> 1. That variables with the nir_var_shader_out mode are only ever written and not read (except in TCS)<br></div><div> 2. That all writes to variables with the nir_var_shader_out mode happen in the last block of the shader.</div><div><br></div><div>This should make handling writes substantially easier.  We can just walk the last block forwards and take the last write we see of each component.  Because they all happen in one block, we don't have to worry about dataflow or anything like that.  It's also possible that I'm wrong, we don't have those guarantees, it's terribly complicated, and I need to sit down and try to think through all this stuff. :-/<br></div><div><br></div><div>For reads, as I said below, we can just emit them and hope CSE cleans them up.</div><div><br></div><div>Does that make more sense?  Sorry, I've been a bit curt with the e-mails and other correspondence lately and I need to try to slow down and explain myself better.  I didn't mean to be impatient with the ping I just was wondering why I hadn't heard anything.</div><div><br></div><div>--Jason<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
Recombining stores is tricky, this pass is based on Conners series for <br>
vectorizing alu instructions[1].<br>
<br>
Since we need to do stores this way we just do loads properly while we <br>
are at it.<br>
<br>
[1] <a href="https://gitlab.freedesktop.org/mesa/mesa/merge_requests/94/commits" rel="noreferrer" target="_blank">https://gitlab.freedesktop.org/mesa/mesa/merge_requests/94/commits</a><br>
<br>
> <br>
> On Mon, Feb 4, 2019 at 3:37 PM Jason Ekstrand <<a href="mailto:jason@jlekstrand.net" target="_blank">jason@jlekstrand.net</a> <br>
> <mailto:<a href="mailto:jason@jlekstrand.net" target="_blank">jason@jlekstrand.net</a>>> wrote:<br>
> <br>
>     On Mon, Nov 5, 2018 at 8:58 PM Timothy Arceri <<a href="mailto:tarceri@itsqueeze.com" target="_blank">tarceri@itsqueeze.com</a><br>
>     <mailto:<a href="mailto:tarceri@itsqueeze.com" target="_blank">tarceri@itsqueeze.com</a>>> wrote:<br>
> <br>
>         Once linking opts are done this pass recombines varying components.<br>
> <br>
>         This patch is loosely based on Connor's vectorize alu pass.<br>
> <br>
>         V2: skip fragment shaders<br>
> <br>
>         V3:<br>
>         - dont accidentally vectorise local vars<br>
>         - pass correct component to create_new_store()<br>
>         ---<br>
>           src/compiler/Makefile.sources           |   1 +<br>
>           src/compiler/nir/meson.build            |   1 +<br>
>           src/compiler/nir/nir.h                  |   2 +<br>
>           src/compiler/nir/nir_opt_vectorize_io.c | 527<br>
>         ++++++++++++++++++++++++<br>
>           4 files changed, 531 insertions(+)<br>
>           create mode 100644 src/compiler/nir/nir_opt_vectorize_io.c<br>
> <br>
>         diff --git a/src/compiler/Makefile.sources<br>
>         b/src/compiler/Makefile.sources<br>
>         index ae170f02e82..5991df5a61c 100644<br>
>         --- a/src/compiler/Makefile.sources<br>
>         +++ b/src/compiler/Makefile.sources<br>
>         @@ -290,6 +290,7 @@ NIR_FILES = \<br>
>                  nir/nir_opt_shrink_load.c \<br>
>                  nir/nir_opt_trivial_continues.c \<br>
>                  nir/nir_opt_undef.c \<br>
>         +       nir/nir_opt_vectorize_io.c \<br>
>                  nir/nir_phi_builder.c \<br>
>                  nir/nir_phi_builder.h \<br>
>                  nir/nir_print.c \<br>
>         diff --git a/src/compiler/nir/meson.build<br>
>         b/src/compiler/nir/meson.build<br>
>         index b0c3a7feb31..9555cc40e21 100644<br>
>         --- a/src/compiler/nir/meson.build<br>
>         +++ b/src/compiler/nir/meson.build<br>
>         @@ -174,6 +174,7 @@ files_libnir = files(<br>
>             'nir_opt_shrink_load.c',<br>
>             'nir_opt_trivial_continues.c',<br>
>             'nir_opt_undef.c',<br>
>         +  'nir_opt_vectorize_io.c',<br>
>             'nir_phi_builder.c',<br>
>             'nir_phi_builder.h',<br>
>             'nir_print.c',<br>
>         diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h<br>
>         index a0ae9a4430e..79bbdedaf00 100644<br>
>         --- a/src/compiler/nir/nir.h<br>
>         +++ b/src/compiler/nir/nir.h<br>
>         @@ -3105,6 +3105,8 @@ bool nir_opt_trivial_continues(nir_shader<br>
>         *shader);<br>
> <br>
>           bool nir_opt_undef(nir_shader *shader);<br>
> <br>
>         +bool nir_opt_vectorize_io(nir_shader *shader, bool skip_fs_inputs);<br>
>         +<br>
>           bool nir_opt_conditional_discard(nir_shader *shader);<br>
> <br>
>           void nir_sweep(nir_shader *shader);<br>
>         diff --git a/src/compiler/nir/nir_opt_vectorize_io.c<br>
>         b/src/compiler/nir/nir_opt_vectorize_io.c<br>
>         new file mode 100644<br>
>         index 00000000000..c2ab30d307b<br>
>         --- /dev/null<br>
>         +++ b/src/compiler/nir/nir_opt_vectorize_io.c<br>
>         @@ -0,0 +1,527 @@<br>
>         +/*<br>
>         + * Copyright © 2018 Timothy Arceri<br>
>         + *<br>
>         + * Permission is hereby granted, free of charge, to any person<br>
>         obtaining a<br>
>         + * copy of this software and associated documentation files<br>
>         (the "Software"),<br>
>         + * to deal in the Software without restriction, including<br>
>         without limitation<br>
>         + * the rights to use, copy, modify, merge, publish, distribute,<br>
>         sublicense,<br>
>         + * and/or sell copies of the Software, and to permit persons to<br>
>         whom the<br>
>         + * Software is furnished to do so, subject to the following<br>
>         conditions:<br>
>         + *<br>
>         + * The above copyright notice and this permission notice<br>
>         (including the next<br>
>         + * paragraph) shall be included in all copies or substantial<br>
>         portions of the<br>
>         + * Software.<br>
>         + *<br>
>         + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY<br>
>         KIND, EXPRESS OR<br>
>         + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF<br>
>         MERCHANTABILITY,<br>
>         + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO<br>
>         EVENT SHALL<br>
>         + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,<br>
>         DAMAGES OR OTHER<br>
>         + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR<br>
>         OTHERWISE, ARISING<br>
>         + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE<br>
>         OR OTHER DEALINGS<br>
>         + * IN THE SOFTWARE.<br>
>         + */<br>
>         +<br>
>         +#include "nir.h"<br>
>         +#include "nir_builder.h"<br>
>         +#include "nir_deref.h"<br>
>         +#include "util/u_dynarray.h"<br>
>         +#include "util/u_math.h"<br>
>         +<br>
>         +/** @file nir_opt_vectorize_io.c<br>
>         + *<br>
>         + * Replaces scalar nir_load_input/nir_store_output operations with<br>
>         + * vectorized instructions.<br>
>         + */<br>
>         +<br>
>         +static bool<br>
>         +is_io_load(nir_intrinsic_instr *intr)<br>
>         +{<br>
>         +   switch (intr->intrinsic) {<br>
>         +   case nir_intrinsic_interp_deref_at_centroid:<br>
>         +   case nir_intrinsic_interp_deref_at_sample:<br>
>         +   case nir_intrinsic_interp_deref_at_offset:<br>
>         +   case nir_intrinsic_load_deref:<br>
>         +      return true;<br>
>         +   default:<br>
>         +      return false;<br>
>         +   }<br>
>         +}<br>
>         +<br>
>         +static nir_deref_instr *<br>
>         +clone_deref_array(nir_builder *b, nir_deref_instr *dst_tail,<br>
>         +                  const nir_deref_instr *src_head)<br>
>         +{<br>
>         +   const nir_deref_instr *parent =<br>
>         nir_deref_instr_parent(src_head);<br>
>         +<br>
>         +   if (!parent)<br>
>         +      return dst_tail;<br>
>         +<br>
>         +   assert(src_head->deref_type == nir_deref_type_array);<br>
>         +<br>
>         +   dst_tail = clone_deref_array(b, dst_tail, parent);<br>
>         +<br>
>         +   return nir_build_deref_array(b, dst_tail,<br>
>         +                                nir_ssa_for_src(b,<br>
>         src_head->arr.index, 1));<br>
>         +}<br>
>         +<br>
>         +static bool<br>
>         +instr_can_rewrite(nir_instr *instr)<br>
>         +{<br>
>         +   if (instr->type != nir_instr_type_intrinsic)<br>
>         +      return false;<br>
>         +<br>
>         +   nir_intrinsic_instr *intr = nir_instr_as_intrinsic(instr);<br>
>         +<br>
>         +   if (intr->num_components != 1)<br>
>         +      return false;<br>
>         +<br>
>         +   if (!is_io_load(intr) &&<br>
>         +       intr->intrinsic != nir_intrinsic_store_deref)<br>
>         +      return false;<br>
>         +<br>
>         +   nir_variable *var =<br>
>         +      nir_deref_instr_get_variable(nir_src_as_deref(intr->src[0]));<br>
>         +<br>
>         +   if (var->data.mode != nir_var_shader_in &&<br>
>         +       var->data.mode != nir_var_shader_out)<br>
>         +      return false;<br>
> <br>
> <br>
>     Please check deref->mode before calling nir_deref_instr_get_variable<br>
>     so the pass doesn't trip up on incomplete deref chains.  We know<br>
>     shader_in/out are currently safe but we can't get the variable in<br>
>     general.<br>
> <br>
>         +<br>
>         +   /* TODO: add doubles support ? */<br>
>         +   if (glsl_type_is_64bit(glsl_without_array(var->type)))<br>
> <br>
> <br>
>     Maybe<br>
> <br>
>     if (glsl_get_bit_size(glsl_without_array(var->type)) != 32)<br>
> <br>
>     instead to be more future-proof?<br>
> <br>
>         +      return false;<br>
>         +<br>
>         +   /* Only touch user defined varyings as these are the only<br>
>         ones we split */<br>
>         +   if (var->data.location < VARYING_SLOT_VAR0 &&<br>
>         var->data.location >= 0)<br>
> <br>
> <br>
>     Why are we allowing negative locations?<br>
> <br>
>         +      return false;<br>
>         +<br>
>         +   /* Skip complex types we don't split in the first place */<br>
>         +   if (glsl_type_is_matrix(glsl_without_array(var->type)) ||<br>
>         +       glsl_type_is_struct(glsl_without_array(var->type)))<br>
>         +      return false;<br>
> <br>
> <br>
>     if (!glsl_type_is_vector_or_scalar(glsl_without_array(var->type)))<br>
> <br>
>     instead?  For that matter, glsl_type_is_scalar?<br>
> <br>
>         +<br>
>         +   return true;<br>
>         +}<br>
>         +<br>
>         +static bool<br>
>         +io_access_same_var(const nir_instr *instr1, const nir_instr<br>
>         *instr2)<br>
>         +{<br>
>         +   assert(instr1->type == nir_instr_type_intrinsic &&<br>
>         +          instr2->type == nir_instr_type_intrinsic);<br>
>         +<br>
>         +   nir_intrinsic_instr *intr1 = nir_instr_as_intrinsic(instr1);<br>
>         +   nir_intrinsic_instr *intr2 = nir_instr_as_intrinsic(instr2);<br>
>         +<br>
>         +   nir_variable *var1 =<br>
>         +     <br>
>         nir_deref_instr_get_variable(nir_src_as_deref(intr1->src[0]));<br>
>         +   nir_variable *var2 =<br>
>         +     <br>
>         nir_deref_instr_get_variable(nir_src_as_deref(intr2->src[0]));<br>
>         +<br>
>         +   /* We don't handle combining vars of different type e.g.<br>
>         different array<br>
>         +    * lengths so just skip if the type doesn't match.<br>
>         +    */<br>
>         +   if (var1->type != var2->type)<br>
> <br>
> <br>
>     Above, we allow vectors (maybe a mistake?) but here, we would fail<br>
>     to combine vec2 and float.<br>
> <br>
>         +      return false;<br>
>         +<br>
>         +   if (is_io_load(intr1) != is_io_load(intr2))<br>
>         +      return false;<br>
>         +<br>
>         +   if (var1->data.location != var2->data.location)<br>
>         +      return false;<br>
>         +<br>
>         +   return true;<br>
>         +}<br>
>         +<br>
>         +static struct util_dynarray *<br>
>         +vec_instr_stack_create(void *mem_ctx)<br>
>         +{<br>
>         +   struct util_dynarray *stack = ralloc(mem_ctx, struct<br>
>         util_dynarray);<br>
>         +   util_dynarray_init(stack, mem_ctx);<br>
>         +   return stack;<br>
>         +}<br>
>         +<br>
>         +static void<br>
>         +vec_instr_stack_push(struct util_dynarray *stack, nir_instr *instr)<br>
>         +{<br>
>         +   util_dynarray_append(stack, nir_instr *, instr);<br>
>         +}<br>
>         +<br>
>         +static void<br>
>         +create_new_load(nir_builder *b, nir_intrinsic_instr *intr,<br>
>         nir_variable *var,<br>
>         +                unsigned comp, unsigned num_comps)<br>
>         +{<br>
>         +   b->cursor = nir_before_instr(&intr->instr);<br>
>         +<br>
>         +   assert(intr->dest.is_ssa);<br>
>         +<br>
>         +   nir_intrinsic_instr *new_intr =<br>
>         +      nir_intrinsic_instr_create(b->shader, intr->intrinsic);<br>
>         +   nir_ssa_dest_init(&new_intr->instr, &new_intr->dest, num_comps,<br>
>         +                     intr->dest.ssa.bit_size, NULL);<br>
>         +   new_intr->num_components = num_comps;<br>
>         +<br>
>         +   nir_deref_instr *deref = nir_build_deref_var(b, var);<br>
>         +   deref = clone_deref_array(b, deref,<br>
>         nir_src_as_deref(intr->src[0]));<br>
>         +<br>
>         +   new_intr->src[0] = nir_src_for_ssa(&deref->dest.ssa);<br>
>         +<br>
>         +   if (intr->intrinsic == nir_intrinsic_interp_deref_at_offset ||<br>
>         +       intr->intrinsic == nir_intrinsic_interp_deref_at_sample)<br>
>         +      nir_src_copy(&new_intr->src[1], &intr->src[1],<br>
>         &new_intr->instr);<br>
>         +<br>
>         +   nir_builder_instr_insert(b, &new_intr->instr);<br>
>         +<br>
>         +   unsigned channel = comp - var->data.location_frac;<br>
>         +   nir_ssa_def *load = nir_channel(b, &new_intr->dest.ssa,<br>
>         channel);<br>
>         +   nir_ssa_def_rewrite_uses(&intr->dest.ssa,<br>
>         nir_src_for_ssa(load));<br>
>         +<br>
>         +   /* Remove the old load intrinsic */<br>
>         +   nir_instr_remove(&intr->instr);<br>
>         +}<br>
>         +<br>
>         +static void<br>
>         +create_new_store(nir_builder *b, nir_intrinsic_instr *intr,<br>
>         nir_variable *var,<br>
>         +                 nir_ssa_def **srcs, unsigned first_comp,<br>
>         unsigned num_comps)<br>
>         +{<br>
>         +   b->cursor = nir_before_instr(&intr->instr);<br>
>         +<br>
>         +   nir_intrinsic_instr *new_intr =<br>
>         +      nir_intrinsic_instr_create(b->shader, intr->intrinsic);<br>
>         +   new_intr->num_components = num_comps;<br>
>         +<br>
>         +   nir_intrinsic_set_write_mask(new_intr, (1 << num_comps) - 1);<br>
>         +<br>
>         +   nir_deref_instr *deref = nir_build_deref_var(b, var);<br>
>         +   deref = clone_deref_array(b, deref,<br>
>         nir_src_as_deref(intr->src[0]));<br>
>         +<br>
>         +   new_intr->src[0] = nir_src_for_ssa(&deref->dest.ssa);<br>
>         +<br>
>         +   nir_ssa_def *stores[4];<br>
>         +   for (unsigned i = 0; i < num_comps; i++) {<br>
>         +      stores[i] = srcs[first_comp + i];<br>
>         +   }<br>
>         +<br>
>         +   new_intr->src[1] = nir_src_for_ssa(nir_vec(b, stores,<br>
>         num_comps));<br>
>         +<br>
>         +   nir_builder_instr_insert(b, &new_intr->instr);<br>
>         +<br>
>         +   /* Remove the old store intrinsic */<br>
>         +   nir_instr_remove(&intr->instr);<br>
>         +}<br>
>         +<br>
>         +static bool<br>
>         +vec_instr_stack_pop(nir_builder *b, struct util_dynarray *stack,<br>
>         +                    nir_instr *instr,<br>
>         +                    nir_variable<br>
>         *input_vars[MAX_VARYINGS_INCL_PATCH][4],<br>
>         +                    nir_variable<br>
>         *output_vars[MAX_VARYINGS_INCL_PATCH][4])<br>
>         +{<br>
>         +   nir_instr *last = util_dynarray_pop(stack, nir_instr *);<br>
>         +<br>
>         +   assert(last == instr);<br>
>         +   assert(last->type == nir_instr_type_intrinsic);<br>
>         +<br>
>         +   nir_intrinsic_instr *intr = nir_instr_as_intrinsic(last);<br>
>         +   nir_variable *var =<br>
>         +      nir_deref_instr_get_variable(nir_src_as_deref(intr->src[0]));<br>
>         +   unsigned loc = var->data.location - VARYING_SLOT_VAR0;<br>
>         +<br>
>         +   nir_variable *new_var;<br>
>         +   if (var->data.mode == nir_var_shader_in)<br>
>         +      new_var = input_vars[loc][var->data.location_frac];<br>
>         +   else<br>
>         +      new_var = output_vars[loc][var->data.location_frac];<br>
>         +<br>
>         +   unsigned num_comps =<br>
>         +      glsl_get_vector_elements(glsl_without_array(new_var->type));<br>
>         +<br>
>         +   /* Don't bother walking the stack if this component can't be<br>
>         vectorised. */<br>
>         +   if (num_comps == 1)<br>
>         +      return false;<br>
>         +<br>
>         +   if (new_var == var)<br>
>         +      return false;<br>
>         +<br>
>         +   if (is_io_load(intr)) {<br>
>         +      create_new_load(b, intr, new_var,<br>
>         var->data.location_frac, num_comps);<br>
>         +      return true;<br>
>         +   }<br>
>         +<br>
>         +   b->cursor = nir_before_instr(last);<br>
>         +   nir_ssa_undef_instr *instr_undef =<br>
>         +      nir_ssa_undef_instr_create(b->shader, 1, 32);<br>
>         +   nir_builder_instr_insert(b, &instr_undef->instr);<br>
>         +<br>
>         +   nir_ssa_def *srcs[4];<br>
>         +   for (int i = 0; i < 4; i++) {<br>
>         +      srcs[i] = &instr_undef->def;<br>
>         +   }<br>
>         +   srcs[var->data.location_frac] = intr->src[1].ssa;<br>
>         +<br>
>         +   util_dynarray_foreach_reverse(stack, nir_instr *, stack_instr) {<br>
>         +      nir_intrinsic_instr *intr2 =<br>
>         nir_instr_as_intrinsic(*stack_instr);<br>
>         +      nir_variable *var2 =<br>
>         +       <br>
>           nir_deref_instr_get_variable(nir_src_as_deref(intr2->src[0]));<br>
>         +      unsigned loc2 = var2->data.location - VARYING_SLOT_VAR0;<br>
>         +<br>
>         +      if (output_vars[loc][var->data.location_frac] !=<br>
>         +          output_vars[loc2][var2->data.location_frac])<br>
>         +         continue;<br>
>         +<br>
>         +     <br>
>         assert(glsl_get_vector_elements(glsl_without_array(var2->type))<br>
>         == 1);<br>
>         +<br>
>         +      if (srcs[var2->data.location_frac] == &instr_undef->def) {<br>
>         +         assert(intr2->src[1].is_ssa);<br>
>         +         assert(intr2->src[1].ssa);<br>
>         +<br>
>         +         srcs[var2->data.location_frac] = intr2->src[1].ssa;<br>
>         +      }<br>
>         +   }<br>
>         +<br>
>         +   create_new_store(b, intr, new_var, srcs,<br>
>         new_var->data.location_frac,<br>
>         +                    num_comps);<br>
>         +<br>
>         +   return true;<br>
>         +}<br>
>         +<br>
>         +static bool<br>
>         +cmp_func(const void *data1, const void *data2)<br>
>         +{<br>
>         +   const struct util_dynarray *arr1 = data1;<br>
>         +   const struct util_dynarray *arr2 = data2;<br>
>         +<br>
>         +   const nir_instr *instr1 = *(nir_instr<br>
>         **)util_dynarray_begin(arr1);<br>
>         +   const nir_instr *instr2 = *(nir_instr<br>
>         **)util_dynarray_begin(arr2);<br>
>         +<br>
>         +   return io_access_same_var(instr1, instr2);<br>
>         +}<br>
>         +<br>
>         +#define HASH(hash, data) _mesa_fnv32_1a_accumulate((hash), (data))<br>
>         +<br>
>         +static uint32_t<br>
>         +hash_instr(const nir_instr *instr)<br>
>         +{<br>
>         +   assert(instr->type == nir_instr_type_intrinsic);<br>
>         +<br>
>         +   nir_intrinsic_instr *intr = nir_instr_as_intrinsic(instr);<br>
>         +   nir_variable *var =<br>
>         +      nir_deref_instr_get_variable(nir_src_as_deref(intr->src[0]));<br>
>         +<br>
>         +   uint32_t hash = _mesa_fnv32_1a_offset_bias;<br>
>         +   bool is_load = is_io_load(intr);<br>
>         +<br>
>         +   hash = HASH(hash, var->type);<br>
>         +   hash = HASH(hash, is_load);<br>
>         +   return HASH(hash, var->data.location);<br>
>         +}<br>
>         +<br>
>         +static uint32_t<br>
>         +hash_stack(const void *data)<br>
>         +{<br>
>         +   const struct util_dynarray *stack = data;<br>
>         +   const nir_instr *first = *(nir_instr<br>
>         **)util_dynarray_begin(stack);<br>
>         +   return hash_instr(first);<br>
>         +}<br>
>         +<br>
>         +static struct set *<br>
>         +vec_instr_set_create(void)<br>
>         +{<br>
>         +   return _mesa_set_create(NULL, hash_stack, cmp_func);<br>
>         +}<br>
>         +<br>
>         +static void<br>
>         +vec_instr_set_destroy(struct set *instr_set)<br>
>         +{<br>
>         +   _mesa_set_destroy(instr_set, NULL);<br>
>         +}<br>
>         +<br>
>         +static void<br>
>         +vec_instr_set_add(struct set *instr_set, nir_instr *instr)<br>
>         +{<br>
>         +   if (!instr_can_rewrite(instr))<br>
>         +      return;<br>
>         +<br>
>         +   struct util_dynarray *new_stack =<br>
>         vec_instr_stack_create(instr_set);<br>
>         +   vec_instr_stack_push(new_stack, instr);<br>
>         +<br>
>         +   struct set_entry *entry = _mesa_set_search(instr_set,<br>
>         new_stack);<br>
>         +<br>
>         +   if (entry) {<br>
>         +      ralloc_free(new_stack);<br>
>         +      struct util_dynarray *stack = (struct util_dynarray *)<br>
>         entry->key;<br>
>         +      vec_instr_stack_push(stack, instr);<br>
>         +      return;<br>
>         +   }<br>
>         +<br>
>         +   _mesa_set_add(instr_set, new_stack);<br>
>         +   return;<br>
>         +}<br>
>         +<br>
>         +static bool<br>
>         +vec_instr_set_remove(nir_builder *b, struct set *instr_set,<br>
>         nir_instr *instr,<br>
>         +                     nir_variable<br>
>         *input_vars[MAX_VARYINGS_INCL_PATCH][4],<br>
>         +                     nir_variable<br>
>         *output_vars[MAX_VARYINGS_INCL_PATCH][4])<br>
>         +{<br>
>         +   if (!instr_can_rewrite(instr))<br>
>         +      return false;<br>
>         +<br>
>         +   /*<br>
>         +    * It's pretty unfortunate that we have to do this, but it's<br>
>         a side effect<br>
>         +    * of the hash set interfaces. The hash set assumes that<br>
>         we're only<br>
>         +    * interested in storing one equivalent element at a time,<br>
>         and if we try to<br>
>         +    * insert a duplicate element it will remove the original.<br>
>         We could hack up<br>
>         +    * the comparison function to "know" which input is an<br>
>         instruction we<br>
>         +    * passed in and which is an array that's part of the entry,<br>
>         but that<br>
>         +    * wouldn't work because we need to pass an array to<br>
>         _mesa_set_add() in<br>
>         +    * vec_instr_add() above, and _mesa_set_add() will call our<br>
>         comparison<br>
>         +    * function as well.<br>
>         +    */<br>
>         +   struct util_dynarray *temp = vec_instr_stack_create(instr_set);<br>
>         +   vec_instr_stack_push(temp, instr);<br>
>         +   struct set_entry *entry = _mesa_set_search(instr_set, temp);<br>
>         +   ralloc_free(temp);<br>
>         +<br>
>         +   if (entry) {<br>
>         +      struct util_dynarray *stack = (struct util_dynarray *)<br>
>         entry->key;<br>
>         +      bool progress = vec_instr_stack_pop(b, stack, instr,<br>
>         input_vars,<br>
>         +                                          output_vars);<br>
>         +<br>
>         +      if (!util_dynarray_num_elements(stack, nir_instr *))<br>
>         +         _mesa_set_remove(instr_set, entry);<br>
>         +<br>
>         +      return progress;<br>
>         +   }<br>
>         +<br>
>         +   return false;<br>
>         +}<br>
>         +<br>
>         +static bool<br>
>         +vectorize_block(nir_builder *b, nir_block *block, struct set<br>
>         *instr_set,<br>
>         +                nir_variable<br>
>         *input_vars[MAX_VARYINGS_INCL_PATCH][4],<br>
>         +                nir_variable<br>
>         *output_vars[MAX_VARYINGS_INCL_PATCH][4])<br>
> <br>
> <br>
>     No comments on the actal meat of it yet except one:  Why is it so<br>
>     complicated???  For inputs, just change loads to load the whole<br>
>     vector and insert a swizzle after it to grab the component you<br>
>     want.   CSE will come along and clean up the mess by and large.  For<br>
>     stores, it's a bit more complicated but it still doesn't seem like<br>
>     it needs to be this bad.<br>
> <br>
>         +{<br>
>         +   bool progress = false;<br>
>         +<br>
>         +   nir_foreach_instr_safe(instr, block) {<br>
>         +      vec_instr_set_add(instr_set, instr);<br>
>         +   }<br>
>         +<br>
>         +   for (unsigned i = 0; i < block->num_dom_children; i++) {<br>
>         +      nir_block *child = block->dom_children[i];<br>
>         +      progress |= vectorize_block(b, child, instr_set, input_vars,<br>
>         +                                  output_vars);<br>
>         +   }<br>
>         +<br>
>         +   nir_foreach_instr_reverse_safe(instr, block) {<br>
>         +      progress |= vec_instr_set_remove(b, instr_set, instr,<br>
>         input_vars,<br>
>         +                                       output_vars);<br>
>         +   }<br>
>         +<br>
>         +   return progress;<br>
>         +}<br>
>         +<br>
>         +static void<br>
>         +create_new_io_var(nir_shader *shader,<br>
>         +                  nir_variable *vars[MAX_VARYINGS_INCL_PATCH][4],<br>
>         +                  unsigned location, unsigned comps)<br>
>         +{<br>
>         +   unsigned num_comps = util_bitcount(comps);<br>
> <br>
> <br>
>     Do we need to do anything if num_comps == 1?  Or can we just leave<br>
>     it alone?<br>
> <br>
>         +   unsigned first_comp = u_bit_scan(&comps);<br>
> <br>
> <br>
>     Might be worth a quick comment here that we're taking a component<br>
>     off.  Using u_bit_scan here is very nice from the perspective that<br>
>     it helps the loop below but it's not obviuos.<br>
> <br>
>         +<br>
>         +   nir_variable *var =<br>
>         nir_variable_clone(vars[location][first_comp], shader);<br>
>         +   var->data.location_frac = first_comp;<br>
>         +   var->type = glsl_replace_vector_type(var->type, num_comps);<br>
>         +<br>
>         +   nir_shader_add_variable(shader, var);<br>
>         +<br>
>         +   vars[location][first_comp] = var;<br>
>         +<br>
>         +   while (comps) {<br>
>         +      const int comp = u_bit_scan(&comps);<br>
>         +      vars[location][comp] = var;<br>
>         +   }<br>
>         +}<br>
>         +<br>
>         +static void<br>
>         +create_new_io_vars(nir_shader *shader, struct exec_list *io_list,<br>
>         +                   nir_variable *vars[MAX_VARYINGS_INCL_PATCH][4])<br>
>         +{<br>
>         +   if (exec_list_is_empty(io_list))<br>
>         +      return;<br>
>         +<br>
>         +   nir_foreach_variable(var, io_list) {<br>
>         +      if (var->data.location >= VARYING_SLOT_VAR0) {<br>
> <br>
> <br>
>     Do we want to check that it's a (possibly array of) scalar or<br>
>     anything like that?<br>
> <br>
>         +         unsigned loc = var->data.location - VARYING_SLOT_VAR0;<br>
>         +         vars[loc][var->data.location_frac] = var;<br>
>         +      }<br>
>         +   }<br>
>         +<br>
>         +   /* We don't handle combining vars of different type e.g.<br>
>         different array<br>
>         +    * lengths.<br>
>         +    */<br>
>         +   for (unsigned i = 0; i < MAX_VARYINGS_INCL_PATCH; i++) {<br>
>         +      unsigned comps = 0;<br>
>         +      for (unsigned j = 0; j < 3; j++) {<br>
>         +         if (vars[i][j] && vars[i][j+1] && vars[i][j]->type ==<br>
>         vars[i][j+1]->type) {<br>
>         +            if (j == 2) {<br>
>         +               /* last component so create new variable */<br>
>         +               comps |= 3 << vars[i][j]->data.location_frac;<br>
> <br>
> <br>
>     Why not move setting comps out of the if?  Might make things clearer.<br>
> <br>
>         +               create_new_io_var(shader, vars, i, comps);<br>
>         +            } else {<br>
>         +               /* Set comps */<br>
>         +               comps |= 3 << vars[i][j]->data.location_frac;<br>
>         +            }<br>
>         +         } else {<br>
>         +            if (comps) {<br>
>         +               /* Types didn't match but we have already seen<br>
>         matching types<br>
>         +                * at this location so create a new variable for<br>
>         those<br>
>         +                * components.<br>
>         +                */<br>
>         +               create_new_io_var(shader, vars, i, comps);<br>
>         +               comps = 0;<br>
>         +            }<br>
>         +         }<br>
>         +      }<br>
>         +   }<br>
>         +}<br>
>         +<br>
>         +static bool<br>
>         +nir_opt_vectorize_io_impl(nir_function_impl *impl)<br>
>         +{<br>
>         +   nir_builder b;<br>
>         +   nir_builder_init(&b, impl);<br>
>         +<br>
>         +   nir_metadata_require(impl, nir_metadata_dominance);<br>
>         +<br>
>         +   nir_shader *shader = impl->function->shader;<br>
>         +   nir_variable *input_vars[MAX_VARYINGS_INCL_PATCH][4] = {0};<br>
>         +   nir_variable *output_vars[MAX_VARYINGS_INCL_PATCH][4] = {0};<br>
>         +<br>
>         +   create_new_io_vars(shader, &shader->inputs, input_vars);<br>
>         +   create_new_io_vars(shader, &shader->outputs, output_vars);<br>
>         +<br>
>         +   struct set *instr_set = vec_instr_set_create();<br>
>         +   bool progress = vectorize_block(&b, nir_start_block(impl),<br>
>         instr_set,<br>
>         +                                   input_vars, output_vars);<br>
>         +<br>
>         +   if (progress)<br>
>         +      nir_metadata_preserve(impl, nir_metadata_block_index |<br>
>         +                                  nir_metadata_dominance);<br>
> <br>
> <br>
>     Two lines.  Please use { }.<br>
> <br>
>         +<br>
>         +   vec_instr_set_destroy(instr_set);<br>
>         +   return false;<br>
>         +}<br>
>         +<br>
>         +bool<br>
>         +nir_opt_vectorize_io(nir_shader *shader, bool skip_fs_inputs)<br>
>         +{<br>
>         +   bool progress = false;<br>
>         +<br>
>         +   if (skip_fs_inputs && shader->info.stage ==<br>
>         MESA_SHADER_FRAGMENT)<br>
>         +      return false;<br>
>         +<br>
>         +   nir_foreach_function(function, shader) {<br>
>         +      if (function->impl)<br>
>         +         progress |= nir_opt_vectorize_io_impl(function->impl);<br>
>         +   }<br>
>         +<br>
>         +   return progress;<br>
>         +}<br>
>         -- <br>
>         2.19.1<br>
> <br>
>         _______________________________________________<br>
>         mesa-dev mailing list<br>
>         <a href="mailto:mesa-dev@lists.freedesktop.org" target="_blank">mesa-dev@lists.freedesktop.org</a><br>
>         <mailto:<a href="mailto:mesa-dev@lists.freedesktop.org" target="_blank">mesa-dev@lists.freedesktop.org</a>><br>
>         <a href="https://lists.freedesktop.org/mailman/listinfo/mesa-dev" rel="noreferrer" target="_blank">https://lists.freedesktop.org/mailman/listinfo/mesa-dev</a><br>
> <br>
</blockquote></div></div>