[Mesa-dev] [RFC][PATCH] nir: Add a value range propagation pass
Thomas Helland
thomashelland90 at gmail.com
Thu Aug 6 10:22:30 PDT 2015
It is now mostly working, and I've incorporated a bunch
of the feedback I got on the last posting. I promised to
have it out on the list by american office hours today,
so here it goes.
There is a massive issue with the compare_entries function.
I'm hacking on that atm, trying to come up with a solution
I'm comfortable with, that does not cause bugs. There is
also an issue with the way I insert new consts. There's
probably something there that I have not understood.
Below follows a list of changes from last version:
Add pessimation of branching on undef (based on llvm's aproach)
Incorporate some of Connors feedback
Incorporate Matt's feedback
Rework set_float_range API
Reworked to not do a separate initialization of everything
Some cleanups of unneeded variables
Handle loops differently
Add a visit-counter to prevent endless looping
(This has proven to not do anything, could be dropped)
Squash a bug where we where adding ourself as user instead of the user
(Same accidental mistake I had with uses in the LCSSA pass)
Calculate a lot of the things we can trivially compute.
Special case calculation of abs.
We are now detecting bcsel(x, 0, 0) (four cases)
abs(0) (two cases)
Remove support for phi's:
It was buggy and hard to get right, as type-information
was kinda hard to get hold of. A quick statistics gathering
showed phi-nodes to amount to only 0.24% of the ssa-defs,
and so one can assume that getting them right would not give
us any substantial benefit.
Phi-nodes might have some saying in if we are able to remove
complete nodes/blocks? If so, then we should probably revisit
this decision, but for now it seems the return-on-investment
and risk-reward ratio does not justify dealing with them.
Signed-off-by: Thomas Helland <thomashelland90 at gmail.com>
---
src/glsl/Makefile.sources | 1 +
src/glsl/nir/nir.h | 2 +
src/glsl/nir/nir_opt_value_range.c | 1570 ++++++++++++++++++++++++++++++++++++
3 files changed, 1573 insertions(+)
create mode 100644 src/glsl/nir/nir_opt_value_range.c
diff --git a/src/glsl/Makefile.sources b/src/glsl/Makefile.sources
index 4f79fee..0b971c9 100644
--- a/src/glsl/Makefile.sources
+++ b/src/glsl/Makefile.sources
@@ -56,6 +56,7 @@ NIR_FILES = \
nir/nir_opt_peephole_ffma.c \
nir/nir_opt_peephole_select.c \
nir/nir_opt_remove_phis.c \
+ nir/nir_opt_value_range.c \
nir/nir_print.c \
nir/nir_remove_dead_variables.c \
nir/nir_search.c \
diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h
index aba653e..19202a7 100644
--- a/src/glsl/nir/nir.h
+++ b/src/glsl/nir/nir.h
@@ -1707,6 +1707,8 @@ bool nir_opt_peephole_ffma(nir_shader *shader);
bool nir_opt_remove_phis(nir_shader *shader);
+bool nir_opt_value_range(nir_shader *shader);
+
void nir_sweep(nir_shader *shader);
#ifdef __cplusplus
diff --git a/src/glsl/nir/nir_opt_value_range.c b/src/glsl/nir/nir_opt_value_range.c
new file mode 100644
index 0000000..616ec1c
--- /dev/null
+++ b/src/glsl/nir/nir_opt_value_range.c
@@ -0,0 +1,1570 @@
+/*
+ * Copyright © 2015 Thomas Helland
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#include "nir.h"
+#include "nir_ssa_def_worklist.h"
+#include "nir_block_worklist.h"
+#include "nir_constant_expressions.h"
+
+/* This pass implements an extension of
+ * "Constant Propagation with Conditional Branches" by Wegman and Zadeck
+ * that also handles value ranges. This is useful as a lot of shaders have
+ * min/max expressions that can be eliminated, or conditionals that we can
+ * prove to be false or true due to previously applied restrictions on range.
+ * Value range propagation is a superset of tracking constant values, so this
+ * pass eliminates the need for a separate constant propagation pass. The
+ * pass is optimistic, meaning we assume all variables are constant (or have
+ * restricted range) and disprove it. A pessimistic algorithm would assume
+ * all values where undeterminable, and then propagate expressions we know
+ * to be constant through the program. An optimistic algorithm gets better
+ * results than a pessimistic, with the downside that it can not be aborted
+ * halfway through as the information gathered may be wrong.
+ *
+ * The lattice types are:
+ * undefined: Variable may be constant or range-restricted
+ * (We have not yet gotten to them in the pass)
+ * constant: Value is determined to be constant
+ * range: Value is determined to be range-restricted
+ * overdefined: We cannot guarantee the value is constant or range-restricted
+ *
+ * We extend the lattice from the SCCP pass so that constant entries are
+ * changed to inclusive ranges for each vector component. The join rules are:
+ *
+ * undefined join undefined = undefined
+ * undefined join overdefined = overdefined
+ * overdefined join overdefined = overdefined
+ * [low, high] join overdefined = overdefined
+ * [low, high] join undefined = [low, high]
+ * [low1, high1] join [low2, high2] = [min(low1, low2), max(high1, high2)]
+ *
+ * These rules are general pessimistic rules. There may be situations where
+ * we can still determine parts of the range of the variable, even though it
+ * has an overdefined input (i.e. max, min, sat, abs). This is also true for
+ * boolean operations like AND and OR. These can be determined even if we
+ * know only one of the operators.
+ *
+ * We don't preserve the range perfectly for a variable, as we combine two
+ * ranges for a variable into a range of:
+ * [min(low1, low2), max(high1, high2)]
+ * Preserving the non-continuous range information would greatly complicate
+ * the pass, and is therefore not implemented.
+ *
+ * There is one interesting situation that is hard to deal with:
+ * When we find that something is dead code, but it does not become a
+ * constant value. Examples are things like max(sin(x), y) where y > 1.
+ * We know sin(x) is dead code, but the result is not a constant but instead
+ * an ssa-def with a range. We mark this in the range-state so that we can
+ * eliminate it after the pass is done. This means that the pass should be
+ * rerun if we resolve one of these, as we will then have simplified the
+ * program, and new ranges may be resolved.
+ *
+ * When the pass is done all "undefined" values should be determined as
+ * either const, range, or overdefined. (Except for in blocks that are
+ * marked as unreachable, as they may have not been processed)
+ */
+
+/* XXX: Review feedback from Matt
+ * Food for thought: a range isn't really what you want for bools. Maybe we
+ * can think about extending this analysis to know about disjount ranges,
+ * or maybe even just a flag to say "it's either high or low".
+ */
+
+/* XXX:
+ * We need a mechanism for overflow protection for integer and
+ * unsigned integer calculations as we may end up adding two values,
+ * overflowing the datatype, and wreck havoc as low may now be high,
+ * and vice versa.
+ */
+
+#define IS_FLOAT_CONSTANT(const_value, operator, operand, num_components) \
+ ((num_components >= 1) ? \
+ const_value.f[0] operator operand && \
+ ((num_components >= 2) ? \
+ const_value.f[1] operator operand && \
+ ((num_components >= 3) ? \
+ const_value.f[2] operator operand && \
+ ((num_components == 4) ? \
+ const_value.f[3] operator operand : \
+ true) : \
+ true) : \
+ true) : \
+ false)
+
+#define IS_INT_CONSTANT(const_value, operator, operand, num_components) \
+ ((num_components >= 1) ? \
+ const_value.i[0] operator operand && \
+ ((num_components >= 2) ? \
+ const_value.i[1] operator operand && \
+ ((num_components >= 3) ? \
+ const_value.i[2] operator operand && \
+ ((num_components == 4) ? \
+ const_value.i[3] operator operand : \
+ true) : \
+ true) : \
+ true) : \
+ false)
+
+#define IS_UNSIGNED_CONSTANT(const_value, operator, operand, num_components) \
+ ((num_components >= 1) ? \
+ const_value.u[0] operator operand && \
+ ((num_components >= 2) ? \
+ const_value.u[1] operator operand && \
+ ((num_components >= 3) ? \
+ const_value.u[2] operator operand && \
+ ((num_components == 4) ? \
+ const_value.u[3] operator operand : \
+ true) : \
+ true) : \
+ true) : \
+ false)
+
+typedef enum {
+ undefined,
+ range,
+ constant,
+ overdefined
+} lattice_type;
+
+typedef enum {
+ low,
+ high,
+ both
+} range_to_set;
+
+typedef struct {
+ /* Is this entry float, unsigned or something else? */
+ nir_alu_type range_type;
+
+ nir_const_value lo;
+ nir_const_value hi;
+
+ /* What type of lattice is this */
+ lattice_type type;
+
+ nir_ssa_def *ssa_def;
+ bool in_loop;
+
+ /* Visit-counter to prevent infinite looping */
+ unsigned visits;
+} lattice_entry;
+
+typedef struct {
+ /* Corresponds to SSA Work in the original paper */
+ nir_ssa_def_worklist ssa_worklist;
+
+ /* Work list of blocks, corresponding to the papers Flow work list */
+ nir_block_worklist block_worklist;
+
+ /* Keep track of which blocks are reachable */
+ BITSET_WORD *reachable_blocks;
+
+ /* Keep track of which blocks are in loops */
+ BITSET_WORD *blocks_in_loop;
+
+ /* An array of lattice_antries for all the ssa_defs */
+ lattice_entry *entries;
+} value_range_state;
+
+static lattice_entry *
+get_lattice_entry(nir_ssa_def *def, value_range_state *state)
+{
+ lattice_entry *entry = &state->entries[def->index];
+
+ /* On the first run of this function the def may not have been added to
+ * the lattice_entry, so do that here.
+ */
+ if (!entry->ssa_def)
+ entry->ssa_def = def;
+
+ return entry;
+}
+
+static bool
+is_instr_supported(nir_instr_type type)
+{
+ switch (type) {
+ case nir_instr_type_alu:
+ case nir_instr_type_phi:
+ case nir_instr_type_load_const:
+ return true;
+ default:
+ return false;
+ }
+}
+
+/* Returns true if this is a change in status of the entry. This simplifies
+ * checking if users of this entry should be added to the worklist.
+ */
+static bool
+set_as_overdefined(lattice_entry *entry, nir_alu_type type)
+{
+ if (entry->type == overdefined && entry->range_type == type)
+ return false;
+
+ entry->type = overdefined;
+ entry->range_type = type;
+
+ for (unsigned i = 0; i < entry->ssa_def->num_components; i++) {
+ switch (type) {
+ case nir_type_float:
+ entry->hi.f[i] = INFINITY;
+ entry->lo.f[i] = -INFINITY;
+ break;
+
+ case nir_type_int:
+ entry->hi.i[i] = INT32_MAX;
+ entry->lo.i[i] = INT32_MIN;
+ break;
+
+ case nir_type_bool:
+ case nir_type_unsigned:
+ entry->hi.u[i] = UINT32_MAX;
+ entry->lo.u[i] = 0;
+ break;
+ case nir_type_invalid:
+ break;
+ }
+ }
+
+ return true;
+}
+
+static inline void
+set_range_float(lattice_entry *entry, range_to_set to_set, float_t value)
+{
+ for (unsigned i = 0; i < entry->ssa_def->num_components; i++) {
+
+ if (to_set == low || to_set == both)
+ entry->lo.f[i] = value;
+
+ if (to_set == high || to_set == both)
+ entry->hi.f[i] = value;
+
+ if (to_set == both) {
+ entry->type = constant;
+ } else {
+ entry->type = range;
+ }
+ }
+}
+
+static inline void
+set_range_int(lattice_entry *entry, range_to_set to_set, int32_t value)
+{
+ for (unsigned i = 0; i < entry->ssa_def->num_components; i++) {
+
+ if (to_set == low || to_set == both)
+ entry->lo.i[i] = value;
+
+ if (to_set == high || to_set == both)
+ entry->hi.i[i] = value;
+
+ if (to_set == both) {
+ entry->type = constant;
+ } else {
+ entry->type = range;
+ }
+ }
+}
+
+static inline void
+set_range_uint(lattice_entry *entry, range_to_set to_set, uint32_t value)
+{
+ for (unsigned i = 0; i < entry->ssa_def->num_components; i++) {
+
+ if (to_set == low || to_set == both)
+ entry->lo.u[i] = value;
+
+ if (to_set == high || to_set == both)
+ entry->hi.u[i] = value;
+
+ if (to_set == both) {
+ entry->type = constant;
+ } else {
+ entry->type = range;
+ }
+ }
+}
+
+static bool
+is_type_max(nir_const_value *value, nir_alu_type type,
+ unsigned num_components)
+{
+ for (unsigned i = 0; i < num_components; i++) {
+ switch (type) {
+ case nir_type_float:
+ if (value->f[i] != INFINITY)
+ return false;
+ break;
+
+ case nir_type_int:
+ if (value->i[i] != INT32_MAX)
+ return false;
+ break;
+
+ case nir_type_bool:
+ case nir_type_unsigned:
+ if (value->u[i] != UINT32_MAX)
+ return false;
+ break;
+
+ case nir_type_invalid:
+ unreachable("not reached");
+ }
+ }
+
+ return true;
+}
+
+static bool
+is_type_min(nir_const_value *value, nir_alu_type type,
+ unsigned num_components)
+{
+ for (unsigned i = 0; i < num_components; i++) {
+ switch (type) {
+ case nir_type_float:
+ if (value->f[i] != -INFINITY)
+ return false;
+ break;
+
+ case nir_type_int:
+ if (value->i[i] != INT32_MIN)
+ return false;
+ break;
+
+ case nir_type_bool:
+ case nir_type_unsigned:
+ if (value->u[i] != 0)
+ return false;
+ break;
+
+ case nir_type_invalid:
+ unreachable("not reached");
+ }
+ }
+
+ return true;
+}
+
+static nir_const_value
+per_component_max(nir_const_value src0, nir_const_value src1,
+ nir_alu_type type, unsigned num_components)
+{
+ nir_const_value value;
+ for (unsigned i = 0; i < num_components; i++) {
+ switch (type) {
+ case nir_type_float:
+ value.f[i] = MAX2(src0.f[i], src1.f[i]);
+ break;
+ case nir_type_int:
+ value.i[i] = MAX2(src0.i[i], src1.i[i]);
+ break;
+ case nir_type_bool:
+ case nir_type_unsigned:
+ value.u[i] = MAX2(src0.u[i], src1.u[i]);
+ break;
+ case nir_type_invalid:
+ range_print("Nir type invalid in max \n");
+ unreachable("not reached");
+ }
+ }
+
+ return value;
+}
+
+static nir_const_value
+per_component_min(nir_const_value src0, nir_const_value src1,
+ nir_alu_type type, unsigned num_components)
+{
+ nir_const_value value;
+ for (unsigned i = 0; i < num_components; i++) {
+ switch (type) {
+ case nir_type_float:
+ value.f[i] = MIN2(src0.f[i], src1.f[i]);
+ break;
+ case nir_type_int:
+ value.i[i] = MIN2(src0.i[i], src1.i[i]);
+ break;
+ case nir_type_bool:
+ case nir_type_unsigned:
+ value.u[i] = MIN2(src0.u[i], src1.u[i]);
+ break;
+ case nir_type_invalid:
+ unreachable("not reached");
+ }
+ }
+
+ return value;
+}
+
+typedef enum {
+ LESS,
+ LESS_EQUAL,
+ EQUAL,
+ GREATER_EQUAL,
+ GREATER,
+ MIXED,
+ COMPARE_FAIL
+} compare_range;
+
+static compare_range
+compare_entries(nir_alu_src *a, nir_alu_src *b, nir_alu_type type,
+ unsigned num_components, value_range_state *state)
+{
+ lattice_entry *entry_a = get_lattice_entry(a->src.ssa, state);
+ lattice_entry *entry_b = get_lattice_entry(b->src.ssa, state);
+
+ bool foundless = false;
+ bool found_low_equal = false;
+ bool foundequal = false;
+ bool found_high_equal = false;
+ bool foundgreater = false;
+
+ for (unsigned i = 0; i < num_components; i++) {
+ switch (type) {
+ case nir_type_float:
+ if (entry_a->hi.f[a->swizzle[i]] < entry_b->lo.f[b->swizzle[i]])
+ foundless = true;
+
+ if (entry_a->hi.f[a->swizzle[i]] == entry_b->lo.f[b->swizzle[i]])
+ found_low_equal = true;
+
+ if (entry_a->lo.f[a->swizzle[i]] > entry_b->hi.f[b->swizzle[i]])
+ foundgreater = true;
+
+ if (entry_a->lo.f[a->swizzle[i]] == entry_b->hi.f[b->swizzle[i]])
+ found_high_equal = true;
+
+ if (entry_a->hi.f[a->swizzle[i]] == entry_b->lo.f[b->swizzle[i]] &&
+ entry_a->lo.f[a->swizzle[i]] == entry_b->hi.f[b->swizzle[i]])
+ foundequal = true;
+ break;
+ case nir_type_int:
+ if (entry_a->hi.i[a->swizzle[i]] < entry_b->lo.i[b->swizzle[i]])
+ foundless = true;
+ if (entry_a->lo.i[a->swizzle[i]] > entry_b->hi.i[b->swizzle[i]])
+ foundgreater = true;
+ if (entry_a->lo.i[a->swizzle[i]] == entry_b->hi.i[b->swizzle[i]] &&
+ entry_a->hi.i[a->swizzle[i]] == entry_b->lo.i[b->swizzle[i]])
+ foundequal = true;
+ break;
+ case nir_type_unsigned:
+ case nir_type_bool:
+ if (entry_a->hi.u[a->swizzle[i]] < entry_b->lo.u[b->swizzle[i]])
+ foundless = true;
+ if (entry_a->lo.u[a->swizzle[i]] > entry_b->hi.u[b->swizzle[i]])
+ foundgreater = true;
+ if (entry_a->lo.u[a->swizzle[i]] == entry_b->hi.u[b->swizzle[i]] &&
+ entry_a->hi.u[a->swizzle[i]] == entry_b->lo.u[b->swizzle[i]])
+ foundequal = true;
+ break;
+ }
+ }
+
+ if (foundequal) {
+ if (foundless)
+ return LESS_EQUAL;
+ if (foundgreater)
+ return GREATER_EQUAL;
+ return EQUAL;
+ }
+
+ if (foundless)
+ return LESS;
+ if (foundgreater)
+ return GREATER;
+
+ return COMPARE_FAIL;
+}
+
+static bool
+is_entry_overdefined(lattice_entry *entry)
+{
+ if (entry->type == overdefined)
+ return true;
+
+ if (entry->type == constant)
+ return false;
+
+ if (entry->type == undefined)
+ return false;
+
+ /* This checks high and low to find out if the range is indeeed maximum
+ * and mininum of the type, and therefore in fact is overdefined. This
+ * can happen in a very trivial case like phi(a, b) where a = abs(x) and
+ * b = neg(abs(y)) and we don't know the range of either x or y. However,
+ * if we don't have a valid range type then we bail early.
+ */
+ if (entry->range_type == nir_type_invalid)
+ return true;
+
+ if (is_type_max(&entry->hi, entry->range_type,
+ entry->ssa_def->num_components) &&
+ is_type_min(&entry->lo, entry->range_type,
+ entry->ssa_def->num_components))
+ return true;
+
+ return false;
+}
+
+static bool
+is_component_false(lattice_entry *entry, unsigned component)
+{
+ /* This can be simplified to check for 0x00000000 */
+ return entry->lo.i[component] == 0x00000000 &&
+ entry->hi.i[component] == 0x00000000; //XXX: This is not correct. Floats also have negative zero. Is this also false?
+/*
+ switch (entry->range_type) {
+ case nir_type_int:
+ return entry->lo.i[component] == 0 &&
+ entry->hi.i[component] == 0;
+ case nir_type_unsigned:
+ case nir_type_bool:
+ return entry->lo.u[component] == 0 &&
+ entry->hi.u[component] == 0;
+ case nir_type_float:
+ return entry->lo.f[component] == 0.0f &&
+ entry->hi.f[component] == 0.0f;
+ case nir_type_invalid:
+ unreachable("not reached");
+ }
+*/
+ return false;
+}
+
+static bool
+is_entry_false(lattice_entry *entry)
+{
+ bool is_false = true;
+
+ if (is_entry_overdefined(entry))
+ return false;
+
+ for (uint8_t i = 0; i < entry->ssa_def->num_components; i++)
+ is_false = is_false && is_component_false(entry, i);
+
+ return is_false;
+}
+
+static bool
+is_component_true(lattice_entry *entry, unsigned component)
+{
+
+ switch (entry->range_type) {
+ case nir_type_int:
+ return entry->lo.i[component] > 0 ||
+ entry->hi.i[component] < 0;
+ case nir_type_unsigned:
+ case nir_type_bool:
+ return entry->lo.u[component] > 0;
+ case nir_type_float:
+ return entry->lo.f[component] > 0.0f ||
+ entry->hi.f[component] < 0.0f;
+ case nir_type_invalid:
+ return false; // XXX: This is an issue. Should probably check the user for it's type, and use that here.
+ }
+
+ return false;
+}
+
+static bool
+is_entry_true(lattice_entry *entry)
+{
+ bool is_true = true;
+
+ if (is_entry_overdefined(entry))
+ return false;
+
+ for (uint8_t i = 0; i < entry->ssa_def->num_components; i++)
+ is_true = is_true && is_component_true(entry, i);
+
+ return is_true;
+}
+
+static bool
+is_entry_constant(lattice_entry *entry, unsigned num_components)
+{
+ if (entry->type == constant)
+ return true;
+
+ if (entry->type == overdefined)
+ return false;
+
+ for (uint8_t i = 0; i < num_components; i++) {
+ if (entry->lo.u[i] != entry->hi.u[i])
+ return false;
+ }
+
+ entry->type = constant;
+ return true;
+}
+
+static void
+mark_block_reachable(nir_block *block, value_range_state *state)
+{
+ BITSET_SET(state->reachable_blocks, block->index);
+}
+
+static bool
+is_block_reachable(nir_block *block, value_range_state *state)
+{
+ return BITSET_TEST(state->reachable_blocks, block->index);
+}
+
+static bool
+special_case_alu(nir_ssa_def *def, value_range_state *state)
+{
+ lattice_entry *entry = get_lattice_entry(def, state);
+
+ switch(nir_instr_as_alu(def->parent_instr)->op) {
+ case nir_op_fsat:
+ set_range_float(entry, low, 0.0f);
+ set_range_float(entry, high, 1.0f);
+ return true;
+ case nir_op_fsin:
+ case nir_op_fcos:
+ case nir_op_fsign:
+ set_range_float(entry, low, -1.0f);
+ set_range_float(entry, high, 1.0f);
+ return true;
+ case nir_op_fabs:
+ case nir_op_fexp2:
+ case nir_op_fpow:
+ case nir_op_frsq:
+ case nir_op_fsqrt:
+ set_range_float(entry, low, 0.0f);
+ set_range_float(entry, high, INFINITY);
+ return true;
+ case nir_op_iabs:
+ set_range_int(entry, low, 0);
+ set_range_int(entry, high, INT32_MAX);
+ return true;
+ case nir_op_isign:
+ set_range_int(entry, low, -1);
+ set_range_int(entry, high, 1);
+ return true;
+
+ /* We have limited type-information for these, and they "falsely"
+ * advertise that they are integer oerations (?) causing us to set
+ * wrong values for the higher and lower bound. Set them as invalid
+ * so that we can infer the type-information from their uses later
+ * on if necessary.
+ */
+ case nir_op_vec2:
+ case nir_op_vec3:
+ case nir_op_vec4:
+ set_as_overdefined(entry, nir_type_invalid);
+ return true;
+ default:
+ return false;
+ }
+}
+
+static bool
+is_trivially_computable(nir_op op)
+{
+ switch (op) {
+
+ /* These just extract values, so are safe to compute */
+ case nir_op_fmax: case nir_op_imax: case nir_op_umax:
+ case nir_op_fmin: case nir_op_imin: case nir_op_umin:
+
+ /* We can have the following situation:
+ *
+ * vec4 ssa_59 = txl ssa_58 (coord), ssa_0 (lod), sampler0 (sampler)
+ * vec1 ssa_60 = imov ssa_59
+ * vec1 ssa_61 = imov ssa_59.y
+ * vec1 ssa_62 = imov ssa_59.z
+ * vec1 ssa_63 = fmul ssa_59.y, ssa_7
+ *
+ * We don't know the type of the txl operation, but we know that imow
+ * is an integer operation, and so we set ssa_59 and ssa_60 to
+ * [i_min, i_max]. That is not correct however, as we find out in ssa_63.
+ * The values are instead floats. So we convert the range of ssa_62 to
+ * float-min and max, as we know it is overdefined, and so it works out.
+ * This is a bit fragile, but should work.
+ */
+ case nir_op_fmov: case nir_op_imov:
+
+ /* Conditional selects also just extract values */
+ case nir_op_bcsel: case nir_op_fcsel:
+
+ /* These also just extract values, should be safe. However, we have
+ * troubles with type information, and so we compute wrong values */
+// case nir_op_vec2: case nir_op_vec3: case nir_op_vec4:
+
+ /* These just limit the range, or flip it, and so are safe to compute */
+ case nir_op_fsat:
+ case nir_op_fsign: case nir_op_isign:
+ case nir_op_fneg: case nir_op_ineg:
+
+ /* These are not safe, as you can have a case where the input is
+ * [-inf, inf] which becomes [inf, inf] which is clearly wrong. This
+ * is dealth with in a special-casing in the compute-range section.
+ */
+ case nir_op_fabs: case nir_op_iabs:
+
+ /* These are also not safe. If we have X = [-inf, inf] and Y = [0, 0]
+ * it will never match, and so an is-equal comparison will always return
+ * false, although that is obviously wrong since it might very well match.
+ * This should also be handled separately. This can be checked by utilizing
+ * the compare_entries function. XXX: Fix this!
+ */
+// case nir_op_feq: case nir_op_ieq: case nir_op_seq:
+// case nir_op_fge: case nir_op_ige: case nir_op_sge: case nir_op_uge:
+// case nir_op_flt: case nir_op_ilt: case nir_op_slt: case nir_op_ult:
+// case nir_op_fne: case nir_op_ine: case nir_op_sne:
+
+ /* Logical not is safe. Most likely we can add "and" and "or" too */
+ case nir_op_fnot: case nir_op_inot:
+
+ return true;
+
+ default:
+ return false;
+
+ }
+}
+
+static void
+evaluate_alu_instr(nir_alu_instr *alu, value_range_state *state)
+{
+ lattice_entry *entry = get_lattice_entry(&alu->dest.dest.ssa, state);
+ lattice_entry *src[4];
+ bool all_constant = true;
+
+ /* XXX: How many times should we maximally visit an alu? This seems to never occur */
+ if (entry->visits > 3) {
+ set_as_overdefined(entry, nir_op_infos[alu->op].output_type);
+ return;
+ }
+
+ for (unsigned i = 0; i < nir_op_infos[alu->op].num_inputs; i++) {
+
+ src[i] = get_lattice_entry(alu->src[i].src.ssa, state);
+
+ /* Check if any of the sources are overdefined. If one of them are then
+ * the result of this entry will also be overdefined. We may however
+ * have cases where we only select one of the operands(max,csel), or we
+ * are limiting the range of the operand (abs, sat). These can still be
+ * safely computed, as we will not get undefined results.
+ */
+ if (is_entry_overdefined(src[i]) && !is_trivially_computable(alu->op)) {
+
+ if (!special_case_alu(entry->ssa_def, state))
+ set_as_overdefined(entry, nir_op_infos[alu->op].output_type);
+
+ entry->visits++;
+ return;
+ }
+
+ if (src[i]->type == undefined)
+ return;
+
+ unsigned num_components = nir_op_infos[alu->op].input_sizes[i];
+
+ if (num_components == 0)
+ num_components = alu->dest.dest.ssa.num_components;
+
+ all_constant = all_constant && is_entry_constant(src[i], num_components);
+ }
+
+ /**
+ * XXX: Snippet from the comment in nir.h
+ *
+ * For each input component, says which component of the register it is
+ * chosen from. Note that which elements of the swizzle are used and which
+ * are ignored are based on the write mask for most opcodes - for example,
+ * a statement like "foo.xzw = bar.zyx" would have a writemask of 1101b and
+ * a swizzle of {2, x, 1, 0} where x means "don't care."
+ *
+ * XXX: We handle swizzles at least partly correct now. However, we do not
+ * deal with writemasks yet, and so we are probably doing wrong things here.
+ */
+
+ if (all_constant) {
+ nir_const_value const_src[4];
+
+ for (unsigned i = 0; i < nir_op_infos[alu->op].num_inputs; i++) {
+ for (unsigned j = 0; j < entry->ssa_def->num_components; j++) {
+ const_src[i].f[j] = src[i]->lo.f[alu->src[i].swizzle[j]];
+ }
+ }
+
+ nir_const_value dest =
+ nir_eval_const_opcode(alu->op,
+ alu->dest.dest.ssa.num_components,
+ const_src);
+
+ entry->visits++;
+ entry->type = constant;
+ entry->lo = dest;
+ entry->hi = dest;
+ entry->range_type = nir_op_infos[alu->op].output_type;
+ return;
+ }
+
+ /* Check if the ssa-def is in a loop */
+ if (BITSET_TEST(state->blocks_in_loop,
+ entry->ssa_def->parent_instr->block->index)) {
+
+ /* We have for the first time detected that the ssa-def is in a loop.
+ * Initialize to the best of our knowledge and mark it as in loop
+ */
+ if (!special_case_alu(entry->ssa_def, state))
+ set_as_overdefined(entry, nir_op_infos[alu->op].output_type);
+
+ /* Mark it so that we don't visit it anymore */
+ entry->in_loop = true;
+ return;
+ }
+
+ /* If all inputs are of range or constant type we can calculate the range
+ * of the operation instead of hand-rolling this ourselves. We can only
+ * do this for a select amount of operations. (Basically those that only
+ * select/filter the operands, or that restricts the range). In cases
+ * like mul, add, etc we can not compute the range trivially, as we can
+ * get situations where we multiply -inf and inf and produce nan, or we
+ * overflow integer types.
+ *
+ * This will allow things like sat to possibly give us constants. It will
+ * calculate the constant corresponding to the upper and lower value. If
+ * these are found to be the same in the "is-def-constant" function then
+ * it will get marked as constant, so there is no need to "special-case"
+ * things like sat(a) where a < 0.0
+ */
+ if (is_trivially_computable(alu->op)) {
+ nir_const_value lo_consts[4];
+ nir_const_value hi_consts[4];
+
+ for (unsigned i = 0; i < nir_op_infos[alu->op].num_inputs; i++) {
+ /* We may have set one of the operands to overdefined without knowing
+ * what type it was going to be. Now we know that the operand is used
+ * in an operation with a give alu-type, and so we can set the
+ * operand as overdef again, but with the right alu-type.
+ */
+ if (is_entry_overdefined(src[i]) &&
+ src[i]->range_type != nir_op_infos[alu->op].input_types[i])
+ set_as_overdefined(src[i], nir_op_infos[alu->op].input_types[i]);
+
+ for (unsigned j = 0; j < entry->ssa_def->num_components; j++) {
+ lo_consts[i].f[j] = src[i]->lo.f[alu->src[i].swizzle[j]];
+ hi_consts[i].f[j] = src[i]->hi.f[alu->src[i].swizzle[j]];
+ }
+ }
+
+ bool first = true;
+ nir_const_value temp_const[4];
+ nir_const_value computed_low;
+ nir_const_value computed_high;
+
+ /* Possible combinations we can make with high and low for num_inputs */
+ unsigned combinations = 1 << nir_op_infos[alu->op].num_inputs;
+
+ for (unsigned i = 0; i < combinations; i++) {
+
+ /* Do a copy of low or high values according to mask */
+ for (unsigned j = 0; j < nir_op_infos[alu->op].num_inputs; j++)
+ temp_const[j] = i & (1 << j) ? hi_consts[j] : lo_consts[j];
+
+ nir_const_value temp =
+ nir_eval_const_opcode(alu->op,
+ alu->dest.dest.ssa.num_components,
+ temp_const);
+
+ if (first) {
+ /* If this is the first run then set an initial value */
+ computed_low = computed_high = temp;
+ first = false;
+ } else {
+ /* Do a per_component min/max to get the worst case scenario */
+ computed_low = per_component_min(computed_low, temp,
+ nir_op_infos[alu->op].output_type,
+ entry->ssa_def->num_components);
+ computed_high = per_component_max(computed_high, temp,
+ nir_op_infos[alu->op].output_type,
+ entry->ssa_def->num_components);
+ }
+ }
+
+ /* If the operations was an abs-operation we will have wrongfully
+ * computed the lower value for the abs. Take a component-wise min
+ * of the low-ranges of both operands. If that is higher than zero
+ * then set that as the range, if not, set zero as lowest value.
+ */
+ if (alu->op == nir_op_fabs) {
+ nir_const_value temp =
+ per_component_min(lo_consts[0], lo_consts[1],
+ nir_type_float,
+ entry->ssa_def->num_components);
+
+ if (IS_FLOAT_CONSTANT(temp, >, 0.0f,
+ entry->ssa_def->num_components)) {
+ computed_low = temp;
+ } else {
+ computed_low.f[0] = 0.0f; computed_low.f[1] = 0.0f;
+ computed_low.f[2] = 0.0f; computed_low.f[3] = 0.0f;
+ }
+ }
+
+ if (alu->op == nir_op_iabs) {
+ nir_const_value temp =
+ per_component_min(lo_consts[0], lo_consts[1],
+ nir_type_int,
+ entry->ssa_def->num_components);
+
+ if (IS_INT_CONSTANT(temp, >, 0,
+ entry->ssa_def->num_components)) {
+ computed_low = temp;
+ } else {
+ computed_low.i[0] = 0; computed_low.i[1] = 0;
+ computed_low.i[2] = 0; computed_low.i[3] = 0;
+ }
+ }
+
+ entry->visits++;
+ entry->type = range;
+ entry->lo = computed_low;
+ entry->hi = computed_high;
+ entry->range_type = nir_op_infos[alu->op].output_type;
+ return;
+ }
+
+ if (special_case_alu(&alu->dest.dest.ssa, state))
+ return;
+
+ entry->visits++;
+ set_as_overdefined(entry, nir_op_infos[alu->op].output_type);
+ return;
+}
+
+static bool
+evaluate_entry(lattice_entry *entry, value_range_state *state)
+{
+ lattice_type old_type = entry->type;
+ nir_const_value old_hi;
+ nir_const_value old_lo;
+
+ /* If it is already overdefined then that can not change */
+ if (entry->type == overdefined)
+ return false;
+
+ /* We have already marked load_consts as constant, no use evaluating
+ *
+ * XXX: This is most likely not correct according to the papers. We can
+ * still go from constant and down to overdefined in some cases, so we
+ * should allow "analyzing" of constants
+ */
+// if (is_entry_constant(entry))
+// return false;
+
+ for (unsigned i = 0; i < 4; i++) {
+ old_hi.f[i] = entry->hi.f[i];
+ old_lo.f[i] = entry->lo.f[i];
+ }
+
+ switch (entry->ssa_def->parent_instr->type) {
+ case nir_instr_type_load_const: {
+ /* First time we encounter a load_const, mark as constant */
+ nir_load_const_instr *instr =
+ nir_instr_as_load_const(entry->ssa_def->parent_instr);
+ entry->type = constant;
+ entry->lo = instr->value;
+ entry->hi = instr->value;
+ entry->range_type = nir_type_invalid;
+ break;
+ }
+
+ case nir_instr_type_alu: {
+ nir_alu_instr *alu = nir_instr_as_alu(entry->ssa_def->parent_instr);
+ evaluate_alu_instr(alu, state);
+ break;
+ }
+
+ default:
+ return set_as_overdefined(entry, nir_type_invalid);
+ }
+
+ /* If we find that the entry has become overdefined without marking it as
+ * such that means we have calculated the range to +/- inf. That means we
+ * have changed the status of the entry,
+ */
+ if (is_entry_overdefined(entry)) {
+ set_as_overdefined(entry, entry->range_type);
+ return true;
+ }
+
+ /* Now we check if the information for the instruction has changed.
+ * If it has then we return true, so that we can evaluate the users.
+ */
+ if (entry->type != old_type)
+ return true;
+
+ for (unsigned i = 0; i < 4; i++) {
+ if (old_hi.f[i] != entry->hi.f[i] ||
+ old_lo.f[i] != entry->lo.f[i])
+ return true;
+ }
+
+ return false;
+}
+
+/* Coordinates finding the uses of the ssa_def corresponding to the entry
+ * and sticking them in the ssa_worklist.
+ * Should be run on every lattice_entry that we change the information of.
+ */
+static void
+coordinate_uses(lattice_entry *entry, value_range_state *state)
+{
+ nir_foreach_use_safe(entry->ssa_def, src) {
+ nir_instr *user = src->parent_instr;
+
+ /* No point in checking the use if it is not yet found reachable */
+ if (!is_block_reachable(user->block, state))
+ continue;
+
+ /* We don't want to deal with unsupported instructions. This should
+ * already be marked overdefined on the first visit of the block it
+ * is in, so no point in doing anything special
+ */
+ if (!is_instr_supported(user->type))
+ continue;
+
+ nir_ssa_def *def = nir_instr_get_dest_ssa_def(user);
+
+ /* If it is overdefined we want to push it to head of the list.
+ * That way we will propagate those faster, avoiding visiting
+ * ssa-defs with overdefined sources multiple times. */
+ if (is_entry_overdefined(entry)) {
+ nir_ssa_def_worklist_push_head(&state->ssa_worklist, def);
+ } else {
+ nir_ssa_def_worklist_push_tail(&state->ssa_worklist, def);
+ }
+ }
+
+ nir_foreach_if_use_safe(entry->ssa_def, src) {
+ /* If the condition was used for an if then we should do something
+ * about the if to "push our range" into the then and else branch by
+ * inserting a copy in each of the blocks where we "apply" the range
+ * implied by the if-statement.
+ */
+ nir_if *if_statement = src->parent_if;
+
+ nir_cf_node *then_node = nir_if_first_then_node(if_statement);
+ nir_cf_node *else_node = nir_if_first_else_node(if_statement);
+
+ nir_block *then_block = nir_cf_node_as_block(then_node);
+ nir_block *else_block = nir_cf_node_as_block(else_node);
+
+ if (is_entry_true(entry)) {
+ nir_block_worklist_push_tail(&state->block_worklist, then_block);
+ continue;
+ }
+
+ if (is_entry_false(entry)) {
+ nir_block_worklist_push_tail(&state->block_worklist, else_block);
+ continue;
+ }
+
+ if (entry->type == undefined)
+ continue;
+
+ nir_block_worklist_push_tail(&state->block_worklist, then_block);
+ nir_block_worklist_push_tail(&state->block_worklist, else_block);
+ }
+}
+
+static bool
+handle_unsupported_def(nir_ssa_def *def, void *void_state)
+{
+ value_range_state *state = void_state;
+ lattice_entry *entry = get_lattice_entry(def, state);
+ set_as_overdefined(entry, nir_type_invalid);
+ coordinate_uses(entry, state);
+ return true;
+}
+
+/* On the first run of a block we want to always check all the uses
+ * of the instructions that we process.
+ */
+static void
+evaluate_block(nir_block *block, value_range_state *state)
+{
+ nir_foreach_instr_safe(block, instr) {
+
+ /* Skip doing anything to unsupported instructions */
+ if (!is_instr_supported(instr->type)) {
+ nir_foreach_ssa_def(instr, handle_unsupported_def, state);
+ continue;
+ }
+
+ nir_ssa_def *def = nir_instr_get_dest_ssa_def(instr);
+ lattice_entry *entry = get_lattice_entry(def, state);
+
+ evaluate_entry(entry, state);
+ coordinate_uses(entry, state);
+ }
+}
+
+static void
+mark_cf_node_loop(nir_cf_node *node, value_range_state *state, bool in_loop)
+{
+ switch (node->type) {
+ case nir_cf_node_block:
+
+ if (in_loop)
+ BITSET_SET(state->blocks_in_loop, nir_cf_node_as_block(node)->index);
+
+ return;
+ case nir_cf_node_if: {
+ nir_if *if_stmt = nir_cf_node_as_if(node);
+ foreach_list_typed(nir_cf_node, nested_node, node, &if_stmt->then_list)
+ mark_cf_node_loop(nested_node, state, in_loop);
+ foreach_list_typed(nir_cf_node, nested_node, node, &if_stmt->else_list)
+ mark_cf_node_loop(nested_node, state, in_loop);
+ return;
+ }
+ case nir_cf_node_loop: {
+ nir_loop *loop = nir_cf_node_as_loop(node);
+ foreach_list_typed(nir_cf_node, nested_node, node, &loop->body)
+ mark_cf_node_loop(nested_node, state, true);
+ break;
+ }
+ default:
+ unreachable("unknown cf node type");
+ }
+}
+
+static bool
+nir_opt_value_range_impl(nir_function_impl *impl)
+{
+ /* We might want to run a pass to insert "pi-nodes" into the
+ * ssa-tree before running the pass. This is essentially just
+ * a mov x2 = x1 that we use to have something to "store" the
+ * range implied by things like if's.
+ *
+ * This will also lead to a need of inserting more phi-nodes,
+ * as one gets variables that diverge and then converge.
+ *
+ * x1 = ....; [-unlimited, unlimited]
+ * if (x1 < 7)
+ * x2 = x1; [-unlimited, 7]
+ * |
+ * |
+ * else
+ * x3 = x1; [7, unlimited]
+ * |
+ * |
+ * x4 = phi(x2, x3);
+ */
+
+ /* We want to recalculate the ssa-indexes first, to reduce our memory
+ * consumption and amount of "empty" value-range-variables
+ */
+ nir_index_ssa_defs(impl);
+ nir_index_blocks(impl);
+
+ /* No use in running the pass if there are no ssa-defs */
+ if (impl->ssa_alloc == 0)
+ return false;
+
+ bool progress = false;
+
+ void *mem_ctx = ralloc_context(NULL);
+ value_range_state state;
+
+ state.entries = rzalloc_array(mem_ctx, lattice_entry,
+ impl->ssa_alloc);
+
+ nir_block_worklist_init(&state.block_worklist, impl->num_blocks,
+ mem_ctx);
+
+ nir_ssa_def_worklist_init(&state.ssa_worklist, impl->ssa_alloc,
+ mem_ctx);
+
+ state.reachable_blocks = rzalloc_array(mem_ctx, BITSET_WORD,
+ BITSET_WORDS(impl->num_blocks));
+
+ state.blocks_in_loop = rzalloc_array(mem_ctx, BITSET_WORD,
+ BITSET_WORDS(impl->num_blocks));
+
+ /* Mark the blocks that are inside a loop so we can avoid them */
+ foreach_list_typed(nir_cf_node, node, node, &impl->body) {
+ if (node->type == nir_cf_node_loop)
+ mark_cf_node_loop(node, &state, true);
+ else
+ mark_cf_node_loop(node, &state, false);
+ }
+
+ nir_block_worklist_push_head(&state.block_worklist, impl->start_block);
+
+ /* Process the work lists until they are empty */
+ while (state.block_worklist.count > 0 ||
+ state.ssa_worklist.count > 0) {
+
+ /* Process the basic block work list */
+ while (state.block_worklist.count > 0) {
+ nir_block *block = nir_block_worklist_pop_head(&state.block_worklist);
+
+ /* Since we have our "coordinate_uses" function that also handles
+ * phi nodes we can skip the block if it is already set reachable,
+ * as the phi's will get automagically added to the ssa-worklist
+ * as they are users of the defs.
+ */
+ if (is_block_reachable(block, &state))
+ continue;
+
+ /* Block has been determined to be reachable, mark it */
+ mark_block_reachable(block, &state);
+
+ /* Evaluate all phi's and expressions of the block. */
+ evaluate_block(block, &state);
+
+ /* If the block has only one successor then add it to the worklist */
+ if ((block->successors[0] != NULL) &&
+ (block->successors[1] == NULL)) {
+ nir_block_worklist_push_tail(&state.block_worklist,
+ block->successors[0]);
+ continue;
+ }
+ }
+
+ /* Process the instruction work list
+ * This doesn't do things exactly like in the paper.
+ * Instead of storing the instructions that have changed and processing
+ * each user we are instead adding to the list all the users of
+ * instructions that have changed. In practice there is no difference,
+ * apart from dealing with uses is moved out to a separate function.
+ */
+ while (state.ssa_worklist.count > 0) {
+ /* All instructions in the list are here because
+ * we got new information about the range of an operand.
+ */
+ nir_ssa_def *def = nir_ssa_def_worklist_pop_head(&state.ssa_worklist);
+ lattice_entry *entry = get_lattice_entry(def, &state);
+
+ /* If the def is in a loop we don't want to do anything.
+ * (The instruction is initialized as best we can when we mark it.)
+ * When the block it's in is added to the worklist the entry will
+ * get processed, and so we will evaluate its users.
+ */
+ if (entry->in_loop)
+ continue;
+
+ /* Evaluate the ssa_def. If it has changed then add users to list */
+ if (evaluate_entry(entry, &state))
+ coordinate_uses(entry, &state);
+ }
+
+ /* If both lists are empty that means we are done. However, we assume
+ * branches on undef values can not reach any of their successors.
+ * This is not a safe assumption, and can lead to eliminating things
+ * we shouldn't. Do a trip over all the ssa-defs to find unresolved
+ * branches. If one is found then add the "then"-branch to the block
+ * worklist. That way we will find the rest of the CFG, and only
+ * slightly pessimizes the analysis results.
+ *
+ * XXX: It might be of interest to also check the what operations
+ * use undefined values as sources, and resolve those. Some operations
+ * should return undef when there's a undef operand, while others
+ * should return i.e. 0 to be conservative.
+ *
+ * This part is implemented in a "resolvedUndefsIn" function in the
+ * LLVM SCCP pass, and should give a good idea of what should be done.
+ */
+ if (state.block_worklist.count == 0 &&
+ state.ssa_worklist.count == 0) {
+
+ for (unsigned i = 0; i < impl->ssa_alloc; i++) {
+ lattice_entry *entry = &state.entries[i];
+
+ if (entry->type != undefined)
+ continue;
+
+ if (!entry->ssa_def)
+ continue;
+
+ if (!list_empty(&entry->ssa_def->if_uses)) {
+ nir_foreach_if_use_safe(entry->ssa_def, src) {
+ nir_cf_node *node = nir_if_first_then_node(src->parent_if);
+ /* First node in the branch should be a block */
+ assert(node->type == nir_cf_node_block);
+ nir_block *block = nir_cf_node_as_block(node);
+ nir_block_worklist_push_tail(&state.block_worklist, block);
+ }
+ }
+ }
+ }
+ }
+
+ /* We can now traverse over blocks and delete those that
+ * are still marked as unreachable. If we delete a basic block
+ * we need to first rewrite the phi's that use the results from
+ * the BB.
+ *
+ * For now we want to leave the blocks in place, as when the
+ * conditional for the block that is unreachable is set as a constant
+ * Connor's pass for removing dead control flow will come along
+ * and clean up the blocks that can not be reached.
+ */
+
+ /* Check for all values that are proved to be constant,
+ * and replace them with their constant instruction counterpart. */
+ for (unsigned i = 0; i < (impl->ssa_alloc - 1); i++) {
+ lattice_entry *entry = &(state.entries[i]);
+
+ /* If it's a constant thats not a load_const then make a load_const
+ * instruction and replace the uses of the old instr with that.
+ */
+ if (is_entry_constant(entry, entry->ssa_def->num_components) &&
+ (entry->ssa_def->parent_instr->type != nir_instr_type_load_const)) {
+
+ /* XXX: Doing the below replacement causes crashes because there is
+ * something I have not yet understood about how to rewrite things.
+ */
+/*
+ nir_load_const_instr *new_instr =
+ nir_load_const_instr_create(impl->overload->function->shader,
+ entry->ssa_def->num_components);
+
+ new_instr->value = entry->hi;
+
+ nir_ssa_def_init(&new_instr->instr, &new_instr->def,
+ entry->ssa_def->num_components,
+ "Constant propagated const");
+
+ nir_instr_insert_before(entry->ssa_def->parent_instr, &new_instr->instr);
+
+ nir_ssa_def_rewrite_uses(entry->ssa_def, nir_src_for_ssa(&new_instr->def),
+ impl);
+
+ nir_instr_remove(entry->ssa_def->parent_instr);
+ ralloc_free(entry->ssa_def->parent_instr);
+*/
+
+ continue;
+ }
+
+ if (entry->ssa_def->parent_instr->type != nir_instr_type_alu)
+ continue;
+
+ nir_alu_instr *alu = nir_instr_as_alu(entry->ssa_def->parent_instr);
+
+ if (nir_op_infos[alu->op].num_inputs == 2 &&
+ !(is_entry_overdefined(get_lattice_entry(alu->src[0].src.ssa, &state)) ||
+ is_entry_overdefined(get_lattice_entry(alu->src[1].src.ssa, &state)))) {
+
+ compare_range result = MIXED;
+
+ switch (alu->op) {
+ case nir_op_fmax: case nir_op_fmin:
+ case nir_op_imax: case nir_op_imin:
+ case nir_op_umax: case nir_op_umin:
+
+ result = compare_entries(&alu->src[0], &alu->src[1],
+ nir_op_infos[alu->op].output_type,
+ entry->ssa_def->num_components, &state);
+ break;
+ default:
+ break;
+ }
+
+ if (result == LESS || result == LESS_EQUAL) {
+ if (alu->op == nir_op_fmax ||
+ alu->op == nir_op_imax ||
+ alu->op == nir_op_umax)
+ nir_ssa_def_rewrite_uses(entry->ssa_def, alu->src[1].src, mem_ctx);
+ else
+ nir_ssa_def_rewrite_uses(entry->ssa_def, alu->src[0].src, mem_ctx);
+
+ progress = true;
+ continue;
+ }
+
+ if (result == GREATER || result == GREATER_EQUAL) {
+ if (alu->op == nir_op_fmax ||
+ alu->op == nir_op_imax ||
+ alu->op == nir_op_umax)
+ nir_ssa_def_rewrite_uses(entry->ssa_def, alu->src[0].src, mem_ctx);
+ else
+ nir_ssa_def_rewrite_uses(entry->ssa_def, alu->src[1].src, mem_ctx);
+
+ progress = true;
+ continue;
+ }
+ }
+
+ /* XXX: Here we also want to add checks for undefined behaviour
+ *
+ * We also want to detect lo = 0, hi = 1 and replace with sat.
+ * We also want to use the compare_entries function to check these
+ */
+
+
+ /* Check if the result of a comparison can be determined */
+ if (nir_op_infos[alu->op].num_inputs == 2 &&
+ !(is_entry_overdefined(get_lattice_entry(alu->src[0].src.ssa, &state)) ||
+ is_entry_overdefined(get_lattice_entry(alu->src[1].src.ssa, &state)))) {
+
+ compare_range result = MIXED;
+
+ switch (alu->op) {
+ case nir_op_feq: case nir_op_ieq: case nir_op_seq:
+ case nir_op_fge: case nir_op_ige: case nir_op_sge: case nir_op_uge:
+ case nir_op_flt: case nir_op_ilt: case nir_op_slt: case nir_op_ult:
+ case nir_op_fne: case nir_op_ine: case nir_op_sne:
+
+ result = compare_entries(&alu->src[0], &alu->src[1],
+ nir_op_infos[alu->op].input_types[0],
+ entry->ssa_def->num_components, &state);
+ break;
+ default:
+ break;
+ }
+
+ if (alu->op == nir_op_feq ||
+ alu->op == nir_op_ieq ||
+ alu->op == nir_op_seq) {
+ if (result == EQUAL) {
+ /* We should insert a true */
+ }
+
+ if (result == LESS || result == GREATER) {
+ /* We should insert a false */
+ }
+ }
+
+ if (alu->op == nir_op_fne ||
+ alu->op == nir_op_ine ||
+ alu->op == nir_op_sne) {
+ if (result == EQUAL) {
+ /* We should insert a false */
+ }
+
+ if (result == LESS || result == GREATER) {
+ /* We should insert a true */
+ }
+ }
+
+ if (alu->op == nir_op_flt || alu->op == nir_op_ilt ||
+ alu->op == nir_op_slt || alu->op == nir_op_ult) {
+ if (result == LESS) {
+ /* We should insert a true */
+ }
+
+ if (result == GREATER) {
+ /* We should insert a false */
+ }
+ }
+
+ if (alu->op == nir_op_fge || alu->op == nir_op_ige ||
+ alu->op == nir_op_sge || alu->op == nir_op_uge) {
+ if (result == GREATER) {
+ /* We should insert a true */
+ }
+
+ if (result == LESS) {
+ /* We should insert a false */
+ }
+ }
+// progress = true;
+// continue;
+ }
+
+
+ bool set_false = false;
+ bool set_true = false;
+ /* Detect that "overdefined && false" == false, as we would have bailed
+ * on this earlier in the pass since we have an overdefined source
+ */
+ if (alu->op == nir_op_fand || alu->op == nir_op_iand) {
+ for (int i = 0; i < 2; i++) {
+ lattice_entry *e = get_lattice_entry(alu->src[i].src.ssa, &state);
+ if (is_entry_false(e)) {
+ set_false = true;
+ }
+ }
+ }
+
+ /* Detect that "overdefined || true" == true, as we would have bailed
+ * on this earlier in the pass since we have an overdefined source
+ */
+ if (alu->op == nir_op_for || alu->op == nir_op_ior) {
+ for (int i = 0; i < 2; i++) {
+ lattice_entry *e = get_lattice_entry(alu->src[i].src.ssa, &state);
+ if (is_entry_true(e)) {
+ set_true = true;
+ }
+ }
+ }
+
+ if (set_true || set_false) {
+ nir_const_value dest;
+
+ if (set_false) {
+ /* We can replace entry with "false". 0.0f == 0x00000000 which
+ * should be tha same as signed integer zero, so this is safe
+ */
+ dest.f[0] = dest.f[1] = dest.f[2] = dest.f[3] = 0.0f;
+ }
+
+ if (set_true) {
+ /* We can replace entry with 1.0f which is also "true" for int */
+ dest.f[0] = dest.f[1] = dest.f[2] = dest.f[3] = 1.0f;
+ }
+/*
+ nir_load_const_instr *new_instr =
+ nir_load_const_instr_create(impl->overload->function->shader,
+ alu->dest.dest.ssa.num_components);
+
+ new_instr->value = dest;
+
+ nir_instr_insert_before(&alu->instr, &new_instr->instr);
+
+ nir_ssa_def_rewrite_uses(&alu->dest.dest.ssa, nir_src_for_ssa(&new_instr->def),
+ impl);
+
+ nir_instr_remove(&alu->instr);
+ ralloc_free(alu);
+*/
+ }
+
+ }
+
+ ralloc_free(mem_ctx);
+
+ return progress;
+}
+
+bool
+nir_opt_value_range(nir_shader *shader)
+{
+ bool progress = false;
+
+ nir_foreach_overload(shader, overload) {
+ if (overload->impl && nir_opt_value_range_impl(overload->impl))
+ progress = true;
+ }
+
+ return progress;
+}
--
2.4.4
More information about the mesa-dev
mailing list