[Mesa-dev] [RFC][PATCH] nir: Add a value range propagation pass

Thomas Helland thomashelland90 at gmail.com
Thu Aug 6 10:22:30 PDT 2015


It is now mostly working, and I've incorporated a bunch
of the feedback I got on the last posting. I promised to
have it out on the list by american office hours today,
so here it goes.

There is a massive issue with the compare_entries function.
I'm hacking on that atm, trying to come up with a solution
I'm comfortable with, that does not cause bugs. There is
also an issue with the way I insert new consts. There's
probably something there that I have not understood.

Below follows a list of changes from last version:
Add pessimation of branching on undef (based on llvm's aproach)
Incorporate some of Connors feedback
Incorporate Matt's feedback
Rework set_float_range API
Reworked to not do a separate initialization of everything
Some cleanups of unneeded variables
Handle loops differently
Add a visit-counter to prevent endless looping
(This has proven to not do anything, could be dropped)
Squash a bug where we where adding ourself as user instead of the user
(Same accidental mistake I had with uses in the LCSSA pass)
Calculate a lot of the things we can trivially compute.
Special case calculation of abs.

We are now detecting bcsel(x, 0, 0) (four cases)
                     abs(0) (two cases)

Remove support for phi's:

It was buggy and hard to get right, as type-information
was kinda hard to get hold of. A quick statistics gathering
showed phi-nodes to amount to only 0.24% of the ssa-defs,
and so one can assume that getting them right would not give
us any substantial benefit.

Phi-nodes might have some saying in if we are able to remove
complete nodes/blocks? If so, then we should probably revisit
this decision, but for now it seems the return-on-investment
and risk-reward ratio does not justify dealing with them.

Signed-off-by: Thomas Helland <thomashelland90 at gmail.com>
---
 src/glsl/Makefile.sources          |    1 +
 src/glsl/nir/nir.h                 |    2 +
 src/glsl/nir/nir_opt_value_range.c | 1570 ++++++++++++++++++++++++++++++++++++
 3 files changed, 1573 insertions(+)
 create mode 100644 src/glsl/nir/nir_opt_value_range.c

diff --git a/src/glsl/Makefile.sources b/src/glsl/Makefile.sources
index 4f79fee..0b971c9 100644
--- a/src/glsl/Makefile.sources
+++ b/src/glsl/Makefile.sources
@@ -56,6 +56,7 @@ NIR_FILES = \
 	nir/nir_opt_peephole_ffma.c \
 	nir/nir_opt_peephole_select.c \
 	nir/nir_opt_remove_phis.c \
+	nir/nir_opt_value_range.c \
 	nir/nir_print.c \
 	nir/nir_remove_dead_variables.c \
 	nir/nir_search.c \
diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h
index aba653e..19202a7 100644
--- a/src/glsl/nir/nir.h
+++ b/src/glsl/nir/nir.h
@@ -1707,6 +1707,8 @@ bool nir_opt_peephole_ffma(nir_shader *shader);
 
 bool nir_opt_remove_phis(nir_shader *shader);
 
+bool nir_opt_value_range(nir_shader *shader);
+
 void nir_sweep(nir_shader *shader);
 
 #ifdef __cplusplus
diff --git a/src/glsl/nir/nir_opt_value_range.c b/src/glsl/nir/nir_opt_value_range.c
new file mode 100644
index 0000000..616ec1c
--- /dev/null
+++ b/src/glsl/nir/nir_opt_value_range.c
@@ -0,0 +1,1570 @@
+/*
+ * Copyright © 2015 Thomas Helland
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#include "nir.h"
+#include "nir_ssa_def_worklist.h"
+#include "nir_block_worklist.h"
+#include "nir_constant_expressions.h"
+
+/* This pass implements an extension of
+ * "Constant Propagation with Conditional Branches" by Wegman and Zadeck
+ * that also handles value ranges. This is useful as a lot of shaders have
+ * min/max expressions that can be eliminated, or conditionals that we can
+ * prove to be false or true due to previously applied restrictions on range.
+ * Value range propagation is a superset of tracking constant values, so this
+ * pass eliminates the need for a separate constant propagation pass. The
+ * pass is optimistic, meaning we assume all variables are constant (or have
+ * restricted range) and disprove it. A pessimistic algorithm would assume
+ * all values where undeterminable, and then propagate expressions we know
+ * to be constant through the program. An optimistic algorithm gets better
+ * results than a pessimistic, with the downside that it can not be aborted
+ * halfway through as the information gathered may be wrong.
+ *
+ * The lattice types are:
+ * undefined: Variable may be constant or range-restricted
+ *             (We have not yet gotten to them in the pass)
+ * constant: Value is determined to be constant
+ * range: Value is determined to be range-restricted
+ * overdefined: We cannot guarantee the value is constant or range-restricted
+ *
+ * We extend the lattice from the SCCP pass so that constant entries are
+ * changed to inclusive ranges for each vector component. The join rules are:
+ *
+ * undefined join undefined = undefined
+ * undefined join overdefined = overdefined
+ * overdefined join overdefined = overdefined
+ * [low, high] join overdefined = overdefined
+ * [low, high] join undefined = [low, high]
+ * [low1, high1] join [low2, high2] = [min(low1, low2), max(high1, high2)]
+ *
+ * These rules are general pessimistic rules. There may be situations where
+ * we can still determine parts of the range of the variable, even though it
+ * has an overdefined input (i.e. max, min, sat, abs). This is also true for
+ * boolean operations like AND and OR. These can be determined even if we
+ * know only one of the operators.
+ *
+ * We don't preserve the range perfectly for a variable, as we combine two
+ * ranges for a variable into a range of:
+ *    [min(low1, low2), max(high1, high2)]
+ * Preserving the non-continuous range information would greatly complicate
+ * the pass, and is therefore not implemented.
+ *
+ * There is one interesting situation that is hard to deal with:
+ * When we find that something is dead code, but it does not become a
+ * constant value. Examples are things like max(sin(x), y) where y > 1.
+ * We know sin(x) is dead code, but the result is not a constant but instead
+ * an ssa-def with a range. We mark this in the range-state so that we can
+ * eliminate it after the pass is done. This means that the pass should be
+ * rerun if we resolve one of these, as we will then have simplified the
+ * program, and new ranges may be resolved.
+ *
+ * When the pass is done all "undefined" values should be determined as
+ * either const, range, or overdefined. (Except for in blocks that are
+ * marked as unreachable, as they may have not been processed)
+ */
+
+/* XXX: Review feedback from Matt
+ * Food for thought: a range isn't really what you want for bools. Maybe we
+ * can think about extending this analysis to know about disjount ranges,
+ * or maybe even just a flag to say "it's either high or low".
+ */
+
+/* XXX:
+ * We need a mechanism for overflow protection for integer and
+ * unsigned integer calculations as we may end up adding two values,
+ * overflowing the datatype, and wreck havoc as low may now be high,
+ * and vice versa.
+ */
+
+#define IS_FLOAT_CONSTANT(const_value, operator, operand, num_components)    \
+   ((num_components >= 1) ?                                                  \
+      const_value.f[0] operator operand &&                                   \
+      ((num_components >= 2) ?                                               \
+         const_value.f[1] operator operand &&                                \
+         ((num_components >= 3) ?                                            \
+            const_value.f[2] operator operand &&                             \
+            ((num_components == 4) ?                                         \
+               const_value.f[3] operator operand :                           \
+               true) :                                                       \
+            true) :                                                          \
+         true) :                                                             \
+      false)
+
+#define IS_INT_CONSTANT(const_value, operator, operand, num_components)      \
+   ((num_components >= 1) ?                                                  \
+      const_value.i[0] operator operand &&                                   \
+      ((num_components >= 2) ?                                               \
+         const_value.i[1] operator operand &&                                \
+         ((num_components >= 3) ?                                            \
+            const_value.i[2] operator operand &&                             \
+            ((num_components == 4) ?                                         \
+               const_value.i[3] operator operand :                           \
+               true) :                                                       \
+            true) :                                                          \
+         true) :                                                             \
+      false)
+
+#define IS_UNSIGNED_CONSTANT(const_value, operator, operand, num_components) \
+   ((num_components >= 1) ?                                                  \
+      const_value.u[0] operator operand &&                                   \
+      ((num_components >= 2) ?                                               \
+         const_value.u[1] operator operand &&                                \
+         ((num_components >= 3) ?                                            \
+            const_value.u[2] operator operand &&                             \
+            ((num_components == 4) ?                                         \
+               const_value.u[3] operator operand :                           \
+               true) :                                                       \
+            true) :                                                          \
+         true) :                                                             \
+      false)
+
+typedef enum {
+   undefined,
+   range,
+   constant,
+   overdefined
+} lattice_type;
+
+typedef enum {
+   low,
+   high,
+   both
+} range_to_set;
+
+typedef struct {
+   /* Is this entry float, unsigned or something else? */
+   nir_alu_type range_type;
+
+   nir_const_value lo;
+   nir_const_value hi;
+
+   /* What type of lattice is this */
+   lattice_type type;
+
+   nir_ssa_def *ssa_def;
+   bool in_loop;
+
+   /* Visit-counter to prevent infinite looping */
+   unsigned visits;
+} lattice_entry;
+
+typedef struct {
+   /* Corresponds to SSA Work in the original paper */
+   nir_ssa_def_worklist ssa_worklist;
+
+   /* Work list of blocks, corresponding to the papers Flow work list */
+   nir_block_worklist block_worklist;
+
+   /* Keep track of which blocks are reachable */
+   BITSET_WORD *reachable_blocks;
+
+   /* Keep track of which blocks are in loops */
+   BITSET_WORD *blocks_in_loop;
+
+   /* An array of lattice_antries for all the ssa_defs */
+   lattice_entry *entries;
+} value_range_state;
+
+static lattice_entry *
+get_lattice_entry(nir_ssa_def *def, value_range_state *state)
+{
+   lattice_entry *entry = &state->entries[def->index];
+
+   /* On the first run of this function the def may not have been added to
+    * the lattice_entry, so do that here.
+    */
+   if (!entry->ssa_def)
+      entry->ssa_def = def;
+
+   return entry;
+}
+
+static bool
+is_instr_supported(nir_instr_type type)
+{
+   switch (type) {
+   case nir_instr_type_alu:
+   case nir_instr_type_phi:
+   case nir_instr_type_load_const:
+      return true;
+   default:
+      return false;
+   }
+}
+
+/* Returns true if this is a change in status of the entry. This simplifies
+ * checking if users of this entry should be added to the worklist.
+ */
+static bool
+set_as_overdefined(lattice_entry *entry, nir_alu_type type)
+{
+   if (entry->type == overdefined && entry->range_type == type)
+      return false;
+
+   entry->type = overdefined;
+   entry->range_type = type;
+
+   for (unsigned i = 0; i < entry->ssa_def->num_components; i++) {
+      switch (type) {
+      case nir_type_float:
+         entry->hi.f[i] = INFINITY;
+         entry->lo.f[i] = -INFINITY;
+         break;
+
+      case nir_type_int:
+         entry->hi.i[i] = INT32_MAX;
+         entry->lo.i[i] = INT32_MIN;
+         break;
+
+      case nir_type_bool:
+      case nir_type_unsigned:
+         entry->hi.u[i] = UINT32_MAX;
+         entry->lo.u[i] = 0;
+         break;
+      case nir_type_invalid:
+         break;
+      }
+   }
+
+   return true;
+}
+
+static inline void
+set_range_float(lattice_entry *entry, range_to_set to_set, float_t value)
+{
+   for (unsigned i = 0; i < entry->ssa_def->num_components; i++) {
+
+      if (to_set == low || to_set == both)
+         entry->lo.f[i] = value;
+
+      if (to_set == high || to_set == both)
+         entry->hi.f[i] = value;
+
+      if (to_set == both) {
+         entry->type = constant;
+      } else {
+         entry->type = range;
+      }
+   }
+}
+
+static inline void
+set_range_int(lattice_entry *entry, range_to_set to_set, int32_t value)
+{
+   for (unsigned i = 0; i < entry->ssa_def->num_components; i++) {
+
+      if (to_set == low || to_set == both)
+         entry->lo.i[i] = value;
+
+      if (to_set == high || to_set == both)
+         entry->hi.i[i] = value;
+
+      if (to_set == both) {
+         entry->type = constant;
+      } else {
+         entry->type = range;
+      }
+   }
+}
+
+static inline void
+set_range_uint(lattice_entry *entry, range_to_set to_set, uint32_t value)
+{
+   for (unsigned i = 0; i < entry->ssa_def->num_components; i++) {
+
+      if (to_set == low || to_set == both)
+         entry->lo.u[i] = value;
+
+      if (to_set == high || to_set == both)
+         entry->hi.u[i] = value;
+
+      if (to_set == both) {
+         entry->type = constant;
+      } else {
+         entry->type = range;
+      }
+   }
+}
+
+static bool
+is_type_max(nir_const_value *value, nir_alu_type type,
+            unsigned num_components)
+{
+   for (unsigned i = 0; i < num_components; i++) {
+      switch (type) {
+      case nir_type_float:
+         if (value->f[i] != INFINITY)
+            return false;
+         break;
+
+      case nir_type_int:
+         if (value->i[i] != INT32_MAX)
+            return false;
+         break;
+
+      case nir_type_bool:
+      case nir_type_unsigned:
+         if (value->u[i] != UINT32_MAX)
+            return false;
+         break;
+
+      case nir_type_invalid:
+         unreachable("not reached");
+      }
+   }
+
+   return true;
+}
+
+static bool
+is_type_min(nir_const_value *value, nir_alu_type type,
+            unsigned num_components)
+{
+   for (unsigned i = 0; i < num_components; i++) {
+      switch (type) {
+      case nir_type_float:
+         if (value->f[i] != -INFINITY)
+            return false;
+         break;
+
+      case nir_type_int:
+         if (value->i[i] != INT32_MIN)
+            return false;
+         break;
+
+      case nir_type_bool:
+      case nir_type_unsigned:
+         if (value->u[i] != 0)
+            return false;
+         break;
+
+      case nir_type_invalid:
+         unreachable("not reached");
+      }
+   }
+
+   return true;
+}
+
+static nir_const_value
+per_component_max(nir_const_value src0, nir_const_value src1,
+                  nir_alu_type type, unsigned num_components)
+{
+   nir_const_value value;
+   for (unsigned i = 0; i < num_components; i++) {
+      switch (type) {
+      case nir_type_float:
+         value.f[i] = MAX2(src0.f[i], src1.f[i]);
+         break;
+      case nir_type_int:
+         value.i[i] = MAX2(src0.i[i], src1.i[i]);
+         break;
+      case nir_type_bool:
+      case nir_type_unsigned:
+         value.u[i] = MAX2(src0.u[i], src1.u[i]);
+         break;
+      case nir_type_invalid:
+         range_print("Nir type invalid in max \n");
+         unreachable("not reached");
+      }
+   }
+
+   return value;
+}
+
+static nir_const_value
+per_component_min(nir_const_value src0, nir_const_value src1,
+                  nir_alu_type type, unsigned num_components)
+{
+   nir_const_value value;
+   for (unsigned i = 0; i < num_components; i++) {
+      switch (type) {
+      case nir_type_float:
+         value.f[i] = MIN2(src0.f[i], src1.f[i]);
+         break;
+      case nir_type_int:
+         value.i[i] = MIN2(src0.i[i], src1.i[i]);
+         break;
+      case nir_type_bool:
+      case nir_type_unsigned:
+         value.u[i] = MIN2(src0.u[i], src1.u[i]);
+         break;
+      case nir_type_invalid:
+         unreachable("not reached");
+      }
+   }
+
+   return value;
+}
+
+typedef enum {
+   LESS,
+   LESS_EQUAL,
+   EQUAL,
+   GREATER_EQUAL,
+   GREATER,
+   MIXED,
+   COMPARE_FAIL
+} compare_range;
+
+static compare_range
+compare_entries(nir_alu_src *a, nir_alu_src *b, nir_alu_type type,
+                unsigned num_components, value_range_state *state)
+{
+   lattice_entry *entry_a = get_lattice_entry(a->src.ssa, state);
+   lattice_entry *entry_b = get_lattice_entry(b->src.ssa, state);
+
+   bool foundless = false;
+   bool found_low_equal = false;
+   bool foundequal = false;
+   bool found_high_equal = false;
+   bool foundgreater = false;
+
+   for (unsigned i = 0; i < num_components; i++) {
+      switch (type) {
+      case nir_type_float:
+         if (entry_a->hi.f[a->swizzle[i]] < entry_b->lo.f[b->swizzle[i]])
+            foundless = true;
+
+         if (entry_a->hi.f[a->swizzle[i]] == entry_b->lo.f[b->swizzle[i]])
+            found_low_equal = true;
+
+         if (entry_a->lo.f[a->swizzle[i]] > entry_b->hi.f[b->swizzle[i]])
+            foundgreater = true;
+
+         if (entry_a->lo.f[a->swizzle[i]] == entry_b->hi.f[b->swizzle[i]])
+            found_high_equal = true;
+
+         if (entry_a->hi.f[a->swizzle[i]] == entry_b->lo.f[b->swizzle[i]] &&
+             entry_a->lo.f[a->swizzle[i]] == entry_b->hi.f[b->swizzle[i]])
+            foundequal = true;
+         break;
+      case nir_type_int:
+         if (entry_a->hi.i[a->swizzle[i]] < entry_b->lo.i[b->swizzle[i]])
+            foundless = true;
+         if (entry_a->lo.i[a->swizzle[i]] > entry_b->hi.i[b->swizzle[i]])
+            foundgreater = true;
+         if (entry_a->lo.i[a->swizzle[i]] == entry_b->hi.i[b->swizzle[i]] &&
+             entry_a->hi.i[a->swizzle[i]] == entry_b->lo.i[b->swizzle[i]])
+            foundequal = true;
+         break;
+      case nir_type_unsigned:
+      case nir_type_bool:
+         if (entry_a->hi.u[a->swizzle[i]] < entry_b->lo.u[b->swizzle[i]])
+            foundless = true;
+         if (entry_a->lo.u[a->swizzle[i]] > entry_b->hi.u[b->swizzle[i]])
+            foundgreater = true;
+         if (entry_a->lo.u[a->swizzle[i]] == entry_b->hi.u[b->swizzle[i]] &&
+             entry_a->hi.u[a->swizzle[i]] == entry_b->lo.u[b->swizzle[i]])
+            foundequal = true;
+         break;
+      }
+   }
+
+   if (foundequal) {
+      if (foundless)
+         return LESS_EQUAL;
+      if (foundgreater)
+         return GREATER_EQUAL;
+      return EQUAL;
+   }
+
+   if (foundless)
+      return LESS;
+   if (foundgreater)
+      return GREATER;
+
+   return COMPARE_FAIL;
+}
+
+static bool
+is_entry_overdefined(lattice_entry *entry)
+{
+   if (entry->type == overdefined)
+      return true;
+
+   if (entry->type == constant)
+      return false;
+
+   if (entry->type == undefined)
+      return false;
+
+   /* This checks high and low to find out if the range is indeeed maximum
+    * and mininum of the type, and therefore in fact is overdefined. This
+    * can happen in a very trivial case like phi(a, b) where a = abs(x) and
+    * b = neg(abs(y)) and we don't know the range of either x or y. However,
+    * if we don't have a valid range type then we bail early.
+    */
+   if (entry->range_type == nir_type_invalid)
+      return true;
+
+   if (is_type_max(&entry->hi, entry->range_type,
+                   entry->ssa_def->num_components) &&
+       is_type_min(&entry->lo, entry->range_type,
+                   entry->ssa_def->num_components))
+      return true;
+
+   return false;
+}
+
+static bool
+is_component_false(lattice_entry *entry, unsigned component)
+{
+   /* This can be simplified to check for 0x00000000 */
+   return entry->lo.i[component] == 0x00000000 &&
+          entry->hi.i[component] == 0x00000000; //XXX: This is not correct. Floats also have negative zero. Is this also false?
+/*
+   switch (entry->range_type) {
+   case nir_type_int:
+      return entry->lo.i[component] == 0 &&
+             entry->hi.i[component] == 0;
+   case nir_type_unsigned:
+   case nir_type_bool:
+      return entry->lo.u[component] == 0 &&
+             entry->hi.u[component] == 0;
+   case nir_type_float:
+      return entry->lo.f[component] == 0.0f &&
+             entry->hi.f[component] == 0.0f;
+   case nir_type_invalid:
+      unreachable("not reached");
+   }
+*/
+   return false;
+}
+
+static bool
+is_entry_false(lattice_entry *entry)
+{
+   bool is_false = true;
+
+   if (is_entry_overdefined(entry))
+         return false;
+
+   for (uint8_t i = 0; i < entry->ssa_def->num_components; i++)
+      is_false = is_false && is_component_false(entry, i);
+
+   return is_false;
+}
+
+static bool
+is_component_true(lattice_entry *entry, unsigned component)
+{
+
+   switch (entry->range_type) {
+   case nir_type_int:
+      return entry->lo.i[component] > 0 ||
+             entry->hi.i[component] < 0;
+   case nir_type_unsigned:
+   case nir_type_bool:
+      return entry->lo.u[component] > 0;
+   case nir_type_float:
+      return entry->lo.f[component] > 0.0f ||
+             entry->hi.f[component] < 0.0f;
+   case nir_type_invalid:
+      return false; // XXX: This is an issue. Should probably check the user for it's type, and use that here.
+   }
+
+   return false;
+}
+
+static bool
+is_entry_true(lattice_entry *entry)
+{
+   bool is_true = true;
+
+   if (is_entry_overdefined(entry))
+      return false;
+
+   for (uint8_t i = 0; i < entry->ssa_def->num_components; i++)
+      is_true = is_true && is_component_true(entry, i);
+
+   return is_true;
+}
+
+static bool
+is_entry_constant(lattice_entry *entry, unsigned num_components)
+{
+   if (entry->type == constant)
+      return true;
+
+   if (entry->type == overdefined)
+      return false;
+
+   for (uint8_t i = 0; i < num_components; i++) {
+      if (entry->lo.u[i] != entry->hi.u[i])
+         return false;
+   }
+
+   entry->type = constant;
+   return true;
+}
+
+static void
+mark_block_reachable(nir_block *block, value_range_state *state)
+{
+   BITSET_SET(state->reachable_blocks, block->index);
+}
+
+static bool
+is_block_reachable(nir_block *block, value_range_state *state)
+{
+   return BITSET_TEST(state->reachable_blocks, block->index);
+}
+
+static bool
+special_case_alu(nir_ssa_def *def, value_range_state *state)
+{
+   lattice_entry *entry = get_lattice_entry(def, state);
+
+   switch(nir_instr_as_alu(def->parent_instr)->op) {
+   case nir_op_fsat:
+      set_range_float(entry, low, 0.0f);
+      set_range_float(entry, high, 1.0f);
+      return true;
+   case nir_op_fsin:
+   case nir_op_fcos:
+   case nir_op_fsign:
+      set_range_float(entry, low, -1.0f);
+      set_range_float(entry, high, 1.0f);
+      return true;
+   case nir_op_fabs:
+   case nir_op_fexp2:
+   case nir_op_fpow:
+   case nir_op_frsq:
+   case nir_op_fsqrt:
+      set_range_float(entry, low, 0.0f);
+      set_range_float(entry, high, INFINITY);
+      return true;
+   case nir_op_iabs:
+      set_range_int(entry, low, 0);
+      set_range_int(entry, high, INT32_MAX);
+      return true;
+   case nir_op_isign:
+      set_range_int(entry, low, -1);
+      set_range_int(entry, high, 1);
+      return true;
+
+   /* We have limited type-information for these, and they "falsely"
+    * advertise that they are integer oerations (?) causing us to set
+    * wrong values for the higher and lower bound. Set them as invalid
+    * so that we can infer the type-information from their uses later
+    * on if necessary.
+    */
+   case nir_op_vec2:
+   case nir_op_vec3:
+   case nir_op_vec4:
+      set_as_overdefined(entry, nir_type_invalid);
+      return true;
+   default:
+      return false;
+   }
+}
+
+static bool
+is_trivially_computable(nir_op op)
+{
+   switch (op) {
+
+   /* These just extract values, so are safe to compute */
+   case nir_op_fmax:   case nir_op_imax:   case nir_op_umax:
+   case nir_op_fmin:   case nir_op_imin:   case nir_op_umin:
+
+   /* We can have the following situation:
+    *
+    *  vec4 ssa_59 = txl ssa_58 (coord), ssa_0 (lod), sampler0 (sampler)
+    *  vec1 ssa_60 = imov ssa_59
+    *  vec1 ssa_61 = imov ssa_59.y
+    *  vec1 ssa_62 = imov ssa_59.z
+    *  vec1 ssa_63 = fmul ssa_59.y, ssa_7
+    *
+    *  We don't know the type of the txl operation, but we know that imow
+    *  is an integer operation, and so we set ssa_59 and ssa_60 to
+    *  [i_min, i_max]. That is not correct however, as we find out in ssa_63.
+    *  The values are instead floats. So we convert the range of ssa_62 to
+    *  float-min and max, as we know it is overdefined, and so it works out.
+    *  This is a bit fragile, but should work.
+    */
+   case nir_op_fmov:   case nir_op_imov:
+
+   /* Conditional selects also just extract values */
+   case nir_op_bcsel:  case nir_op_fcsel:
+
+   /* These also just extract values, should be safe. However, we have
+    * troubles with type information, and so we compute wrong values */
+//   case nir_op_vec2:   case nir_op_vec3:   case nir_op_vec4:
+
+   /* These just limit the range, or flip it, and so are safe to compute */
+   case nir_op_fsat:
+   case nir_op_fsign:  case nir_op_isign:
+   case nir_op_fneg:   case nir_op_ineg:
+
+   /* These are not safe, as you can have a case where the input is
+    * [-inf, inf] which becomes [inf, inf] which is clearly wrong. This
+    * is dealth with in a special-casing in the compute-range section.
+    */
+   case nir_op_fabs:   case nir_op_iabs:
+
+   /* These are also not safe. If we have X = [-inf, inf] and Y = [0, 0]
+    * it will never match, and so an is-equal comparison will always return
+    * false, although that is obviously wrong since it might very well match.
+    * This should also be handled separately. This can be checked by utilizing
+    * the compare_entries function. XXX: Fix this!
+    */
+//   case nir_op_feq:    case nir_op_ieq:   case nir_op_seq:
+//   case nir_op_fge:    case nir_op_ige:   case nir_op_sge:   case nir_op_uge:
+//   case nir_op_flt:    case nir_op_ilt:   case nir_op_slt:   case nir_op_ult:
+//   case nir_op_fne:    case nir_op_ine:   case nir_op_sne:
+
+   /* Logical not is safe. Most likely we can add "and" and "or" too */
+   case nir_op_fnot:   case nir_op_inot:
+
+      return true;
+
+   default:
+      return false;
+
+   }
+}
+
+static void
+evaluate_alu_instr(nir_alu_instr *alu, value_range_state *state)
+{
+   lattice_entry *entry = get_lattice_entry(&alu->dest.dest.ssa, state);
+   lattice_entry *src[4];
+   bool all_constant = true;
+
+   /* XXX: How many times should we maximally visit an alu? This seems to never occur */
+   if (entry->visits > 3) {
+      set_as_overdefined(entry, nir_op_infos[alu->op].output_type);
+      return;
+   }
+
+   for (unsigned i = 0; i < nir_op_infos[alu->op].num_inputs; i++) {
+
+      src[i] = get_lattice_entry(alu->src[i].src.ssa, state);
+
+      /* Check if any of the sources are overdefined. If one of them are then
+       * the result of this entry will also be overdefined. We may however
+       * have cases where we only select one of the operands(max,csel), or we
+       * are limiting the range of the operand (abs, sat). These can still be
+       * safely computed, as we will not get undefined results.
+       */
+      if (is_entry_overdefined(src[i]) && !is_trivially_computable(alu->op)) {
+
+         if (!special_case_alu(entry->ssa_def, state))
+            set_as_overdefined(entry, nir_op_infos[alu->op].output_type);
+
+         entry->visits++;
+         return;
+      }
+
+      if (src[i]->type == undefined)
+         return;
+
+      unsigned num_components = nir_op_infos[alu->op].input_sizes[i];
+
+      if (num_components == 0)
+         num_components = alu->dest.dest.ssa.num_components;
+
+      all_constant = all_constant && is_entry_constant(src[i], num_components);
+   }
+
+   /**
+    * XXX: Snippet from the comment in nir.h
+    *
+    * For each input component, says which component of the register it is
+    * chosen from. Note that which elements of the swizzle are used and which
+    * are ignored are based on the write mask for most opcodes - for example,
+    * a statement like "foo.xzw = bar.zyx" would have a writemask of 1101b and
+    * a swizzle of {2, x, 1, 0} where x means "don't care."
+    *
+    * XXX: We handle swizzles at least partly correct now. However, we do not
+    * deal with writemasks yet, and so we are probably doing wrong things here.
+    */
+
+   if (all_constant) {
+      nir_const_value const_src[4];
+
+      for (unsigned i = 0; i < nir_op_infos[alu->op].num_inputs; i++) {
+         for (unsigned j = 0; j < entry->ssa_def->num_components; j++) {
+            const_src[i].f[j] = src[i]->lo.f[alu->src[i].swizzle[j]];
+         }
+      }
+
+      nir_const_value dest =
+               nir_eval_const_opcode(alu->op,
+                                     alu->dest.dest.ssa.num_components,
+                                     const_src);
+
+      entry->visits++;
+      entry->type = constant;
+      entry->lo = dest;
+      entry->hi = dest;
+      entry->range_type = nir_op_infos[alu->op].output_type;
+      return;
+   }
+
+   /* Check if the ssa-def is in a loop */
+   if (BITSET_TEST(state->blocks_in_loop,
+                   entry->ssa_def->parent_instr->block->index)) {
+
+      /* We have for the first time detected that the ssa-def is in a loop.
+       * Initialize to the best of our knowledge and mark it as in loop
+       */
+      if (!special_case_alu(entry->ssa_def, state))
+         set_as_overdefined(entry, nir_op_infos[alu->op].output_type);
+
+      /* Mark it so that we don't visit it anymore */
+      entry->in_loop = true;
+      return;
+   }
+
+   /* If all inputs are of range or constant type we can calculate the range
+    * of the operation instead of hand-rolling this ourselves. We can only
+    * do this for a select amount of operations. (Basically those that only
+    * select/filter the operands, or that restricts the range). In cases
+    * like mul, add, etc we can not compute the range trivially, as we can
+    * get situations where we multiply -inf and inf and produce nan, or we
+    * overflow integer types.
+    *
+    * This will allow things like sat to possibly give us constants. It will
+    * calculate the constant corresponding to the upper and lower value. If
+    * these are found to be the same in the "is-def-constant" function then
+    * it will get marked as constant, so there is no need to "special-case"
+    * things like sat(a) where a < 0.0
+    */
+   if (is_trivially_computable(alu->op)) {
+      nir_const_value lo_consts[4];
+      nir_const_value hi_consts[4];
+
+      for (unsigned i = 0; i < nir_op_infos[alu->op].num_inputs; i++) {
+         /* We may have set one of the operands to overdefined without knowing
+          * what type it was going to be. Now we know that the operand is used
+          * in an operation with a give alu-type, and so we can set the
+          * operand as overdef again, but with the right alu-type.
+          */
+         if (is_entry_overdefined(src[i]) &&
+             src[i]->range_type != nir_op_infos[alu->op].input_types[i])
+            set_as_overdefined(src[i], nir_op_infos[alu->op].input_types[i]);
+
+         for (unsigned j = 0; j < entry->ssa_def->num_components; j++) {
+            lo_consts[i].f[j] = src[i]->lo.f[alu->src[i].swizzle[j]];
+            hi_consts[i].f[j] = src[i]->hi.f[alu->src[i].swizzle[j]];
+         }
+      }
+
+      bool first = true;
+      nir_const_value temp_const[4];
+      nir_const_value computed_low;
+      nir_const_value computed_high;
+
+      /* Possible combinations we can make with high and low for num_inputs */
+      unsigned combinations = 1 << nir_op_infos[alu->op].num_inputs;
+
+      for (unsigned i = 0; i < combinations; i++) {
+
+         /* Do a copy of low or high values according to mask */
+         for (unsigned j = 0; j < nir_op_infos[alu->op].num_inputs; j++)
+            temp_const[j] = i & (1 << j) ? hi_consts[j] : lo_consts[j];
+
+         nir_const_value temp =
+               nir_eval_const_opcode(alu->op,
+                                     alu->dest.dest.ssa.num_components,
+                                     temp_const);
+
+         if (first) {
+            /* If this is the first run then set an initial value */
+            computed_low = computed_high = temp;
+            first = false;
+         } else {
+            /* Do a per_component min/max to get the worst case scenario */
+            computed_low = per_component_min(computed_low, temp,
+                                             nir_op_infos[alu->op].output_type,
+                                             entry->ssa_def->num_components);
+            computed_high = per_component_max(computed_high, temp,
+                                              nir_op_infos[alu->op].output_type,
+                                              entry->ssa_def->num_components);
+         }
+      }
+
+      /* If the operations was an abs-operation we will have wrongfully
+       * computed the lower value for the abs. Take a component-wise min
+       * of the low-ranges of both operands. If that is higher than zero
+       * then set that as the range, if not, set zero as lowest value.
+       */
+      if (alu->op == nir_op_fabs) {
+         nir_const_value temp =
+               per_component_min(lo_consts[0], lo_consts[1],
+                                 nir_type_float,
+                                 entry->ssa_def->num_components);
+
+         if (IS_FLOAT_CONSTANT(temp, >, 0.0f,
+                               entry->ssa_def->num_components)) {
+            computed_low = temp;
+         } else {
+            computed_low.f[0] = 0.0f;    computed_low.f[1] = 0.0f;
+            computed_low.f[2] = 0.0f;    computed_low.f[3] = 0.0f;
+         }
+      }
+
+      if (alu->op == nir_op_iabs) {
+         nir_const_value temp =
+               per_component_min(lo_consts[0], lo_consts[1],
+                                 nir_type_int,
+                                 entry->ssa_def->num_components);
+
+         if (IS_INT_CONSTANT(temp, >, 0,
+                             entry->ssa_def->num_components)) {
+            computed_low = temp;
+         } else {
+            computed_low.i[0] = 0;    computed_low.i[1] = 0;
+            computed_low.i[2] = 0;    computed_low.i[3] = 0;
+         }
+      }
+
+      entry->visits++;
+      entry->type = range;
+      entry->lo = computed_low;
+      entry->hi = computed_high;
+      entry->range_type = nir_op_infos[alu->op].output_type;
+      return;
+   }
+
+   if (special_case_alu(&alu->dest.dest.ssa, state))
+      return;
+
+   entry->visits++;
+   set_as_overdefined(entry, nir_op_infos[alu->op].output_type);
+   return;
+}
+
+static bool
+evaluate_entry(lattice_entry *entry, value_range_state *state)
+{
+   lattice_type old_type = entry->type;
+   nir_const_value old_hi;
+   nir_const_value old_lo;
+
+   /* If it is already overdefined then that can not change */
+   if (entry->type == overdefined)
+      return false;
+
+   /* We have already marked load_consts as constant, no use evaluating
+    *
+    * XXX: This is most likely not correct according to the papers. We can
+    * still go from constant and down to overdefined in some cases, so we
+    * should allow "analyzing" of constants
+    */
+//   if (is_entry_constant(entry))
+//      return false;
+
+   for (unsigned i = 0; i < 4; i++) {
+      old_hi.f[i] = entry->hi.f[i];
+      old_lo.f[i] = entry->lo.f[i];
+   }
+
+   switch (entry->ssa_def->parent_instr->type) {
+   case nir_instr_type_load_const: {
+      /* First time we encounter a load_const, mark as constant */
+      nir_load_const_instr *instr =
+         nir_instr_as_load_const(entry->ssa_def->parent_instr);
+      entry->type = constant;
+      entry->lo = instr->value;
+      entry->hi = instr->value;
+      entry->range_type = nir_type_invalid;
+      break;
+   }
+
+   case nir_instr_type_alu: {
+      nir_alu_instr *alu = nir_instr_as_alu(entry->ssa_def->parent_instr);
+      evaluate_alu_instr(alu, state);
+      break;
+   }
+
+   default:
+      return set_as_overdefined(entry, nir_type_invalid);
+   }
+
+   /* If we find that the entry has become overdefined without marking it as
+    * such that means we have calculated the range to +/- inf. That means we
+    * have changed the status of the entry,
+    */
+   if (is_entry_overdefined(entry)) {
+      set_as_overdefined(entry, entry->range_type);
+      return true;
+   }
+
+   /* Now we check if the information for the instruction has changed.
+    * If it has then we return true, so that we can evaluate the users.
+    */
+   if (entry->type != old_type)
+      return true;
+
+   for (unsigned i = 0; i < 4; i++) {
+      if (old_hi.f[i] != entry->hi.f[i] ||
+          old_lo.f[i] != entry->lo.f[i])
+      return true;
+   }
+
+   return false;
+}
+
+/* Coordinates finding the uses of the ssa_def corresponding to the entry
+ * and sticking them in the ssa_worklist.
+ * Should be run on every lattice_entry that we change the information of.
+ */
+static void
+coordinate_uses(lattice_entry *entry, value_range_state *state)
+{
+   nir_foreach_use_safe(entry->ssa_def, src) {
+      nir_instr *user = src->parent_instr;
+
+      /* No point in checking the use if it is not yet found reachable */
+      if (!is_block_reachable(user->block, state))
+         continue;
+
+      /* We don't want to deal with unsupported instructions. This should
+       * already be marked overdefined on the first visit of the block it
+       * is in, so no point in doing anything special
+       */
+      if (!is_instr_supported(user->type))
+         continue;
+
+      nir_ssa_def *def = nir_instr_get_dest_ssa_def(user);
+
+      /* If it is overdefined we want to push it to head of the list.
+       * That way we will propagate those faster, avoiding visiting
+       * ssa-defs with overdefined sources multiple times. */
+      if (is_entry_overdefined(entry)) {
+         nir_ssa_def_worklist_push_head(&state->ssa_worklist, def);
+      } else {
+         nir_ssa_def_worklist_push_tail(&state->ssa_worklist, def);
+      }
+   }
+
+   nir_foreach_if_use_safe(entry->ssa_def, src) {
+      /* If the condition was used for an if then we should do something
+       * about the if to "push our range" into the then and else branch by
+       * inserting a copy in each of the blocks where we "apply" the range
+       * implied by the if-statement.
+       */
+      nir_if *if_statement = src->parent_if;
+
+      nir_cf_node *then_node = nir_if_first_then_node(if_statement);
+      nir_cf_node *else_node = nir_if_first_else_node(if_statement);
+
+      nir_block *then_block = nir_cf_node_as_block(then_node);
+      nir_block *else_block = nir_cf_node_as_block(else_node);
+
+      if (is_entry_true(entry)) {
+         nir_block_worklist_push_tail(&state->block_worklist, then_block);
+         continue;
+      }
+
+      if (is_entry_false(entry)) {
+         nir_block_worklist_push_tail(&state->block_worklist, else_block);
+         continue;
+      }
+
+      if (entry->type == undefined)
+         continue;
+
+      nir_block_worklist_push_tail(&state->block_worklist, then_block);
+      nir_block_worklist_push_tail(&state->block_worklist, else_block);
+   }
+}
+
+static bool
+handle_unsupported_def(nir_ssa_def *def, void *void_state)
+{
+   value_range_state *state = void_state;
+   lattice_entry *entry = get_lattice_entry(def, state);
+   set_as_overdefined(entry, nir_type_invalid);
+   coordinate_uses(entry, state);
+   return true;
+}
+
+/* On the first run of a block we want to always check all the uses
+ * of the instructions that we process.
+ */
+static void
+evaluate_block(nir_block *block, value_range_state *state)
+{
+   nir_foreach_instr_safe(block, instr) {
+
+      /* Skip doing anything to unsupported instructions */
+      if (!is_instr_supported(instr->type)) {
+         nir_foreach_ssa_def(instr, handle_unsupported_def, state);
+         continue;
+      }
+
+      nir_ssa_def *def = nir_instr_get_dest_ssa_def(instr);
+      lattice_entry *entry = get_lattice_entry(def, state);
+
+      evaluate_entry(entry, state);
+      coordinate_uses(entry, state);
+   }
+}
+
+static void
+mark_cf_node_loop(nir_cf_node *node, value_range_state *state, bool in_loop)
+{
+   switch (node->type) {
+   case nir_cf_node_block:
+
+      if (in_loop)
+         BITSET_SET(state->blocks_in_loop, nir_cf_node_as_block(node)->index);
+
+      return;
+   case nir_cf_node_if: {
+      nir_if *if_stmt = nir_cf_node_as_if(node);
+      foreach_list_typed(nir_cf_node, nested_node, node, &if_stmt->then_list)
+         mark_cf_node_loop(nested_node, state, in_loop);
+      foreach_list_typed(nir_cf_node, nested_node, node, &if_stmt->else_list)
+         mark_cf_node_loop(nested_node, state, in_loop);
+      return;
+   }
+   case nir_cf_node_loop: {
+      nir_loop *loop = nir_cf_node_as_loop(node);
+      foreach_list_typed(nir_cf_node, nested_node, node, &loop->body)
+         mark_cf_node_loop(nested_node, state, true);
+      break;
+   }
+   default:
+      unreachable("unknown cf node type");
+   }
+}
+
+static bool
+nir_opt_value_range_impl(nir_function_impl *impl)
+{
+   /* We might want to run a pass to insert "pi-nodes" into the
+    * ssa-tree before running the pass. This is essentially just
+    * a mov x2 = x1 that we use to have something to "store" the
+    * range implied by things like if's.
+    *
+    * This will also lead to a need of inserting more phi-nodes,
+    * as one gets variables that diverge and then converge.
+    *
+    * x1 = ....; [-unlimited, unlimited]
+    * if (x1 < 7)
+    *    x2 = x1; [-unlimited, 7]
+    *    |
+    *    |
+    * else
+    *    x3 = x1; [7, unlimited]
+    *    |
+    *    |
+    * x4 = phi(x2, x3);
+    */
+
+   /* We want to recalculate the ssa-indexes first, to reduce our memory
+    * consumption and amount of "empty" value-range-variables
+    */
+   nir_index_ssa_defs(impl);
+   nir_index_blocks(impl);
+
+   /* No use in running the pass if there are no ssa-defs */
+   if (impl->ssa_alloc == 0)
+      return false;
+
+   bool progress = false;
+
+   void *mem_ctx = ralloc_context(NULL);
+   value_range_state state;
+
+   state.entries = rzalloc_array(mem_ctx, lattice_entry,
+                                         impl->ssa_alloc);
+
+   nir_block_worklist_init(&state.block_worklist, impl->num_blocks,
+                           mem_ctx);
+
+   nir_ssa_def_worklist_init(&state.ssa_worklist, impl->ssa_alloc,
+                             mem_ctx);
+
+   state.reachable_blocks = rzalloc_array(mem_ctx, BITSET_WORD,
+                                          BITSET_WORDS(impl->num_blocks));
+
+   state.blocks_in_loop = rzalloc_array(mem_ctx, BITSET_WORD,
+                                        BITSET_WORDS(impl->num_blocks));
+
+   /* Mark the blocks that are inside a loop so we can avoid them */
+   foreach_list_typed(nir_cf_node, node, node, &impl->body) {
+      if (node->type == nir_cf_node_loop)
+         mark_cf_node_loop(node, &state, true);
+      else
+         mark_cf_node_loop(node, &state, false);
+   }
+
+   nir_block_worklist_push_head(&state.block_worklist, impl->start_block);
+
+   /* Process the work lists until they are empty */
+   while (state.block_worklist.count > 0 ||
+          state.ssa_worklist.count > 0) {
+
+      /* Process the basic block work list */
+      while (state.block_worklist.count > 0) {
+         nir_block *block = nir_block_worklist_pop_head(&state.block_worklist);
+
+         /* Since we have our "coordinate_uses" function that also handles
+          * phi nodes we can skip the block if it is already set reachable,
+          * as the phi's will get automagically added to the ssa-worklist
+          * as they are users of the defs.
+          */
+         if (is_block_reachable(block, &state))
+            continue;
+
+         /* Block has been determined to be reachable, mark it */
+         mark_block_reachable(block, &state);
+
+         /* Evaluate all phi's and expressions of the block. */
+         evaluate_block(block, &state);
+
+         /* If the block has only one successor then add it to the worklist */
+         if ((block->successors[0] != NULL) &&
+             (block->successors[1] == NULL)) {
+            nir_block_worklist_push_tail(&state.block_worklist,
+                                         block->successors[0]);
+            continue;
+         }
+      }
+
+      /* Process the instruction work list
+       * This doesn't do things exactly like in the paper.
+       * Instead of storing the instructions that have changed and processing
+       * each user we are instead adding to the list all the users of
+       * instructions that have changed. In practice there is no difference,
+       * apart from dealing with uses is moved out to a separate function.
+       */
+      while (state.ssa_worklist.count > 0) {
+         /* All instructions in the list are here because
+          * we got new information about the range of an operand.
+          */
+         nir_ssa_def *def = nir_ssa_def_worklist_pop_head(&state.ssa_worklist);
+         lattice_entry *entry = get_lattice_entry(def, &state);
+
+         /* If the def is in a loop we don't want to do anything.
+          * (The instruction is initialized as best we can when we mark it.)
+          * When the block it's in is added to the worklist the entry will
+          * get processed, and so we will evaluate its users.
+          */
+         if (entry->in_loop)
+            continue;
+
+         /* Evaluate the ssa_def. If it has changed then add users to list */
+         if (evaluate_entry(entry, &state))
+            coordinate_uses(entry, &state);
+      }
+
+      /* If both lists are empty that means we are done. However, we assume
+       * branches on undef values can not reach any of their successors.
+       * This is not a safe assumption, and can lead to eliminating things
+       * we shouldn't. Do a trip over all the ssa-defs to find unresolved
+       * branches. If one is found then add the "then"-branch to the block
+       * worklist. That way we will find the rest of the CFG, and only
+       * slightly pessimizes the analysis results.
+       *
+       * XXX: It might be of interest to also check the what operations
+       * use undefined values as sources, and resolve those. Some operations
+       * should return undef when there's a undef operand, while others
+       * should return i.e. 0 to be conservative.
+       *
+       * This part is implemented in a "resolvedUndefsIn" function in the
+       * LLVM SCCP pass, and should give a good idea of what should be done.
+       */
+      if (state.block_worklist.count == 0 &&
+          state.ssa_worklist.count == 0) {
+
+         for (unsigned i = 0; i < impl->ssa_alloc; i++) {
+            lattice_entry *entry = &state.entries[i];
+
+            if (entry->type != undefined)
+               continue;
+
+            if (!entry->ssa_def)
+               continue;
+
+            if (!list_empty(&entry->ssa_def->if_uses)) {
+               nir_foreach_if_use_safe(entry->ssa_def, src) {
+                  nir_cf_node *node = nir_if_first_then_node(src->parent_if);
+                  /* First node in the branch should be a block */
+                  assert(node->type == nir_cf_node_block);
+                  nir_block *block = nir_cf_node_as_block(node);
+                  nir_block_worklist_push_tail(&state.block_worklist, block);
+               }
+            }
+         }
+      }
+   }
+
+   /* We can now traverse over blocks and delete those that
+    * are still marked as unreachable. If we delete a basic block
+    * we need to first rewrite the phi's that use the results from
+    * the BB.
+    *
+    * For now we want to leave the blocks in place, as when the
+    * conditional for the block that is unreachable is set as a constant
+    * Connor's pass for removing dead control flow will come along
+    * and clean up the blocks that can not be reached.
+    */
+
+   /* Check for all values that are proved to be constant,
+    * and replace them with their constant instruction counterpart. */
+   for (unsigned i = 0; i < (impl->ssa_alloc - 1); i++) {
+      lattice_entry *entry = &(state.entries[i]);
+
+      /* If it's a constant thats not a load_const then make a load_const
+       * instruction and replace the uses of the old instr with that.
+       */
+      if (is_entry_constant(entry, entry->ssa_def->num_components) &&
+          (entry->ssa_def->parent_instr->type != nir_instr_type_load_const)) {
+
+         /* XXX: Doing the below replacement causes crashes because there is
+          * something I have not yet understood about how to rewrite things.
+          */
+/*
+         nir_load_const_instr *new_instr =
+            nir_load_const_instr_create(impl->overload->function->shader,
+                                        entry->ssa_def->num_components);
+
+         new_instr->value = entry->hi;
+
+         nir_ssa_def_init(&new_instr->instr, &new_instr->def,
+                          entry->ssa_def->num_components,
+                          "Constant propagated const");
+
+         nir_instr_insert_before(entry->ssa_def->parent_instr, &new_instr->instr);
+
+         nir_ssa_def_rewrite_uses(entry->ssa_def, nir_src_for_ssa(&new_instr->def),
+                                  impl);
+
+         nir_instr_remove(entry->ssa_def->parent_instr);
+         ralloc_free(entry->ssa_def->parent_instr);
+*/
+
+         continue;
+      }
+
+      if (entry->ssa_def->parent_instr->type != nir_instr_type_alu)
+         continue;
+
+      nir_alu_instr *alu = nir_instr_as_alu(entry->ssa_def->parent_instr);
+
+      if (nir_op_infos[alu->op].num_inputs == 2 &&
+          !(is_entry_overdefined(get_lattice_entry(alu->src[0].src.ssa, &state)) ||
+            is_entry_overdefined(get_lattice_entry(alu->src[1].src.ssa, &state)))) {
+
+         compare_range result = MIXED;
+
+         switch (alu->op) {
+         case nir_op_fmax:   case nir_op_fmin:
+         case nir_op_imax:   case nir_op_imin:
+         case nir_op_umax:   case nir_op_umin:
+
+            result = compare_entries(&alu->src[0], &alu->src[1],
+                                     nir_op_infos[alu->op].output_type,
+                                     entry->ssa_def->num_components, &state);
+            break;
+         default:
+            break;
+         }
+
+         if (result == LESS || result == LESS_EQUAL) {
+            if (alu->op == nir_op_fmax ||
+                alu->op == nir_op_imax ||
+                alu->op == nir_op_umax)
+               nir_ssa_def_rewrite_uses(entry->ssa_def, alu->src[1].src, mem_ctx);
+            else
+               nir_ssa_def_rewrite_uses(entry->ssa_def, alu->src[0].src, mem_ctx);
+
+            progress = true;
+            continue;
+         }
+
+         if (result == GREATER || result == GREATER_EQUAL) {
+            if (alu->op == nir_op_fmax ||
+                alu->op == nir_op_imax ||
+                alu->op == nir_op_umax)
+               nir_ssa_def_rewrite_uses(entry->ssa_def, alu->src[0].src, mem_ctx);
+            else
+               nir_ssa_def_rewrite_uses(entry->ssa_def, alu->src[1].src, mem_ctx);
+
+            progress = true;
+            continue;
+         }
+      }
+
+      /* XXX: Here we also want to add checks for undefined behaviour
+       *
+       * We also want to detect lo = 0, hi = 1 and replace with sat.
+       * We also want to use the compare_entries function to check these
+       */
+
+
+      /* Check if the result of a comparison can be determined */
+      if (nir_op_infos[alu->op].num_inputs == 2 &&
+          !(is_entry_overdefined(get_lattice_entry(alu->src[0].src.ssa, &state)) ||
+            is_entry_overdefined(get_lattice_entry(alu->src[1].src.ssa, &state)))) {
+
+         compare_range result = MIXED;
+
+         switch (alu->op) {
+         case nir_op_feq:  case nir_op_ieq:  case nir_op_seq:
+         case nir_op_fge:  case nir_op_ige:  case nir_op_sge:  case nir_op_uge:
+         case nir_op_flt:  case nir_op_ilt:  case nir_op_slt:  case nir_op_ult:
+         case nir_op_fne:  case nir_op_ine:  case nir_op_sne:
+
+            result = compare_entries(&alu->src[0], &alu->src[1],
+                                     nir_op_infos[alu->op].input_types[0],
+                                     entry->ssa_def->num_components, &state);
+            break;
+         default:
+            break;
+         }
+
+         if (alu->op == nir_op_feq ||
+             alu->op == nir_op_ieq ||
+             alu->op == nir_op_seq) {
+            if (result == EQUAL) {
+               /* We should insert a true */
+            }
+
+            if (result == LESS || result == GREATER) {
+               /* We should insert a false */
+            }
+         }
+
+         if (alu->op == nir_op_fne ||
+             alu->op == nir_op_ine ||
+             alu->op == nir_op_sne) {
+            if (result == EQUAL) {
+               /* We should insert a false */
+            }
+
+            if (result == LESS || result == GREATER) {
+               /* We should insert a true */
+            }
+         }
+
+         if (alu->op == nir_op_flt || alu->op == nir_op_ilt ||
+             alu->op == nir_op_slt || alu->op == nir_op_ult) {
+            if (result == LESS) {
+               /* We should insert a true */
+            }
+
+            if (result == GREATER) {
+               /* We should insert a false */
+            }
+         }
+
+         if (alu->op == nir_op_fge || alu->op == nir_op_ige ||
+             alu->op == nir_op_sge || alu->op == nir_op_uge) {
+            if (result == GREATER) {
+               /* We should insert a true */
+            }
+
+            if (result == LESS) {
+               /* We should insert a false */
+            }
+         }
+//         progress = true;
+//         continue;
+      }
+
+
+      bool set_false = false;
+      bool set_true = false;
+      /* Detect that "overdefined && false" == false, as we would have bailed
+       * on this earlier in the pass since we have an overdefined source
+       */
+      if (alu->op == nir_op_fand || alu->op == nir_op_iand) {
+         for (int i = 0; i < 2; i++) {
+            lattice_entry *e = get_lattice_entry(alu->src[i].src.ssa, &state);
+            if (is_entry_false(e)) {
+               set_false = true;
+            }
+         }
+      }
+
+      /* Detect that "overdefined || true" == true, as we would have bailed
+       * on this earlier in the pass since we have an overdefined source
+       */
+      if (alu->op == nir_op_for || alu->op == nir_op_ior) {
+         for (int i = 0; i < 2; i++) {
+            lattice_entry *e = get_lattice_entry(alu->src[i].src.ssa, &state);
+            if (is_entry_true(e)) {
+               set_true = true;
+            }
+         }
+      }
+
+      if (set_true || set_false) {
+         nir_const_value dest;
+
+         if (set_false) {
+            /* We can replace entry with "false". 0.0f == 0x00000000 which
+             * should be tha same as signed integer zero, so this is safe
+             */
+            dest.f[0] = dest.f[1] = dest.f[2] = dest.f[3] = 0.0f;
+         }
+
+         if (set_true) {
+            /* We can replace entry with 1.0f which is also "true" for int */
+            dest.f[0] = dest.f[1] = dest.f[2] = dest.f[3] = 1.0f;
+         }
+/*
+         nir_load_const_instr *new_instr =
+            nir_load_const_instr_create(impl->overload->function->shader,
+                                        alu->dest.dest.ssa.num_components);
+
+         new_instr->value = dest;
+
+         nir_instr_insert_before(&alu->instr, &new_instr->instr);
+
+         nir_ssa_def_rewrite_uses(&alu->dest.dest.ssa, nir_src_for_ssa(&new_instr->def),
+                                  impl);
+
+         nir_instr_remove(&alu->instr);
+         ralloc_free(alu);
+*/
+      }
+
+   }
+
+   ralloc_free(mem_ctx);
+
+   return progress;
+}
+
+bool
+nir_opt_value_range(nir_shader *shader)
+{
+   bool progress = false;
+
+   nir_foreach_overload(shader, overload) {
+      if (overload->impl && nir_opt_value_range_impl(overload->impl))
+         progress = true;
+   }
+
+   return progress;
+}
-- 
2.4.4



More information about the mesa-dev mailing list