<div dir="ltr"><div class="gmail_quote"><div dir="ltr">On Tue, Aug 21, 2018 at 5:55 PM Caio Marcelo de Oliveira Filho <<a href="mailto:caio.oliveira@intel.com">caio.oliveira@intel.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi,<br> <br> On Sat, Jul 28, 2018 at 10:44:39PM -0700, Jason Ekstrand wrote:<br> > This pass looks for variables with vector or array-of-vector types and<br> > narrows the type to only the components used.<br> > ---<br> > src/compiler/nir/nir.h | 1 +<br> > src/compiler/nir/nir_split_vars.c | 694 ++++++++++++++++++++++++++++++<br> > 2 files changed, 695 insertions(+)<br> <br> My general impression was: could we do it as two separate simpler<br> passes, one for array lengths and other for vector types? I felt the<br> intermix of book-keeping for both cases made things harder than usual<br> to read, but maybe is only because I'm not familiar enough.<br></blockquote><div><br></div><div>We probably could. However, the logic used in the two types of shrinking is basically exactly the same so it made sense to put them together and not repeat everything. At one point, I had array shrinking mixed in with array splitting and that was a terrible idea as they're very different. These two, however, are basically the same only with a max length vs. a component mask.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> > +static void<br> > +find_used_components_impl(nir_function_impl *impl,<br> > + struct hash_table *var_usage_map,<br> > + nir_variable_mode modes,<br> > + void *mem_ctx)<br> > +{<br> > + nir_foreach_block(block, impl) {<br> > + nir_foreach_instr_safe(instr, block) {<br> > + if (instr->type != nir_instr_type_intrinsic)<br> > + continue;<br> > +<br> > + nir_intrinsic_instr *intrin = nir_instr_as_intrinsic(instr);<br> > + switch (intrin->intrinsic) {<br> > + case nir_intrinsic_load_deref:<br> > + mark_deref_used(nir_src_as_deref(intrin->src[0]),<br> > + nir_ssa_def_components_read(&intrin->dest.ssa), 0,<br> > + NULL, var_usage_map, modes, mem_ctx);<br> > + break;<br> > +<br> > + case nir_intrinsic_store_deref:<br> > + mark_deref_used(nir_src_as_deref(intrin->src[0]),<br> > + 0, get_non_self_referential_store_comps(intrin),<br> > + NULL, var_usage_map, modes, mem_ctx);<br> > + break;<br> > +<br> > + case nir_intrinsic_copy_deref: {<br> > + /* Just mark everything used for copies. */<br> > + nir_deref_instr *dst = nir_src_as_deref(intrin->src[0]);<br> > + nir_deref_instr *src = nir_src_as_deref(intrin->src[1]);<br> <br> Should self-copies be ignored?<br></blockquote><div><br></div><div>Maybe? If it's really a self-copy where src == dst, that would be dumb and we should delete it. I'm not sure if any optimization pass already does this or not. If it's a copy from one array element to a different one, we could probably do something better here at least for vector components. I'd have to think about it a bit more to figure out what the optimal thing there would be. In any case, doing the conservative thing is still correct.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> > + mark_deref_used(dst, 0, ~0, src, var_usage_map, modes, mem_ctx);<br> > + mark_deref_used(src, ~0, 0, dst, var_usage_map, modes, mem_ctx);<br> > + break;<br> > + }<br> <br> (...)<br> <br> > +static bool<br> > +shrink_vec_var_list(struct exec_list *vars,<br> > + struct hash_table *var_usage_map)<br> > +{<br> > + /* Initialize the components kept field of each variable. This is the<br> > + * AND of the components written and components read. If a component is<br> > + * written but never read, it's dead. If it is read but never written,<br> > + * then all values read are undefined garbage and we may as well not read<br> > + * them.<br> > + *<br> > + * The same logic applies to the array length. We make the array length<br> > + * the minimum needed required length between read and write and plan to<br> > + * discard any OOB access. The one exception here is indirect writes<br> > + * because we don't know where they will land and we can't shrink an array<br> > + * with indirect writes because previously in-bounds writes may become<br> > + * out-of-bounds and have undefined behavior.<br> > + *<br> > + * Also, if we have a copy that to/from something we can't shrink, we need<br> > + * to leave components and array_len of any wildcards alone.<br> > + */<br> > + nir_foreach_variable(var, vars) {<br> > + struct vec_var_usage *usage =<br> > + get_vec_var_usage(var, var_usage_map, false, NULL);<br> > + if (!usage)<br> > + continue;<br> > +<br> > + const unsigned var_num_components =<br> > + glsl_get_components(glsl_without_array_or_matrix(var->type));<br> > +<br> > + assert(usage->comps_kept == 0);<br> > + if (usage->has_external_copy)<br> > + usage->comps_kept = (1u << var_num_components) - 1;<br> > + else<br> > + usage->comps_kept = usage->comps_read & usage->comps_written;<br> > +<br> > + for (unsigned i = 0; i < usage->num_levels; i++) {<br> > + struct array_level_usage *level = &usage->levels[i];<br> > + assert(level->array_len > 0);<br> > +<br> > + if (level->max_written == UINT_MAX || level->has_external_copy)<br> > + continue; /* Can't shrink */<br> > +<br> > + unsigned max_used = MIN2(level->max_read, level->max_written);<br> <br> Should this be MAX2? What I'm thinking is if we read 3 and write 5,<br> we used up to 5, and the length should be at least 6.<br></blockquote><div><br></div><div>No, it shouldn't. :-) If we only read up to 3 but we write up to 5 then those last two writes are never read so they can be discarded. Similarly, if we read up to 5 but only write up to 3 then those top two reads read garbage and can be replaced with undefs. Similarly, for components, we AND instead of OR for the same reason. See also the comment above this loop.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> > + level->array_len = MIN2(max_used, level->array_len - 1) + 1;<br> > + }<br> > + }<br> <br> (...)<br> <br> > +static void<br> > +shrink_vec_var_access_impl(nir_function_impl *impl,<br> > + struct hash_table *var_usage_map,<br> > + nir_variable_mode modes)<br> > +{<br> > + nir_builder b;<br> > + nir_builder_init(&b, impl);<br> > +<br> > + nir_foreach_block(block, impl) {<br> > + nir_foreach_instr_safe(instr, block) {<br> > + switch (instr->type) {<br> > + case nir_instr_type_deref: {<br> > + nir_deref_instr *deref = nir_instr_as_deref(instr);<br> > + if (!(deref->mode & modes))<br> > + break;<br> > +<br> > + /* Clean up any dead derefs we find lying around. They may refer<br> > + * to variables we've deleted.<br> > + */<br> > + if (nir_deref_instr_remove_if_unused(deref))<br> > + break;<br> > +<br> > + /* We don't need to check if this is one of the derefs we're<br> > + * shrinking because this is a no-op if it isn't. The worst that<br> > + * could happen is that we accidentally fix an invalid deref.<br> > + */<br> <br> On one hand I think this is fine for the reasons you mention (and the<br> fact it avoids us looking up the deref's var into our structures). On<br> the other hand, if we get some pass generating invalid derefs, this<br> will hide the bug.<br></blockquote><div><br></div><div>True. However, it makes the pass simpler and more efficient. We could do extra work to check for things but it wouldn't help anything other than the possible bug hiding you mentioned.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> > + if (deref->deref_type == nir_deref_type_var) {<br> > + deref->type = deref->var->type;<br> > + } else if (deref->deref_type == nir_deref_type_array ||<br> > + deref->deref_type == nir_deref_type_array_wildcard) {<br> > + nir_deref_instr *parent = nir_deref_instr_parent(deref);<br> > + assert(glsl_type_is_array(parent->type) ||<br> > + glsl_type_is_matrix(parent->type));<br> > + deref->type = glsl_get_array_element(parent->type);<br> > + }<br> > + break;<br> > + }<br> <br> (...)<br> <br> > +bool<br> > +nir_shrink_vec_array_vars(nir_shader *shader, nir_variable_mode modes)<br> > +{<br> > + assert((modes & (nir_var_global | nir_var_local)) == modes);<br> > +<br> > + void *mem_ctx = ralloc_context(NULL);<br> > +<br> > + struct hash_table *var_usage_map =<br> > + _mesa_hash_table_create(mem_ctx, _mesa_hash_pointer,<br> > + _mesa_key_pointer_equal);<br> > +<br> > + bool has_vars_to_shrink = false;<br> > + nir_foreach_function(function, shader) {<br> > + if (!function->impl)<br> > + continue;<br> > +<br> > + /* Don't even bother crawling the IR if we don't have any variables.<br> > + * Given that this pass deletes any unused variables, it's likely that<br> > + * we will be in this scenario eventually.<br> > + */<br> <br> The fact that we don't have unused variables doesn't mean we won't<br> have used variables. Maybe I'm missing some context to fully<br> understand this comment block.<br></blockquote><div><br></div><div>Hunh? This is just an early exit for the eventual case where we've managed to delete all the variables. We know if there are no variables that the pass will do nothing so we might as well not bother walking the IR. It's a silly optimization but should be very safe.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> > + if (!exec_list_is_empty(&shader->globals) ||<br> > + !exec_list_is_empty(&function->impl->locals)) {<br> <br> Condition each of those with the modes being enabled? I.e. if we do<br> have globals, but we are only working with locals in the pass, we<br> should skip it.<br></blockquote><div><br></div><div>We could but we eventually lower globals to locals anyway.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> > + has_vars_to_shrink = true;<br> > + find_used_components_impl(function->impl, var_usage_map,<br> > + modes, mem_ctx);<br> > + }<br> > + }<br></blockquote><div><br></div><div>--Jason <br></div></div></div>