[Mesa-dev] [PATCH] mesa/st: provide native integers implementation of ir_unop_any
Ilia Mirkin
imirkin at alum.mit.edu
Thu May 8 06:18:26 PDT 2014
Previously, ir_unop_any was implemented via a dot-product call, which
uses floating point multiplication and addition. The multiplication was
completely pointless, and the addition can just as well be done with an
or. Since we know that the inputs are booleans, they must already be in
canonical 0/~0 format, and the final SNE can also be avoided.
Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu>
---
I need to take this through a full piglit run, but the basic tests seem to
work out as expected. This is the result of a compilation of
fs-op-eq-mat4-mat4:
FRAG
PROPERTY FS_COLOR0_WRITES_ALL_CBUFS 1
DCL OUT[0], COLOR
DCL CONST[0..7]
DCL TEMP[0..4], LOCAL
IMM[0] FLT32 { 0.0000, 1.0000, 0.0000, 0.0000}
0: MOV TEMP[0].yzw, IMM[0].xxxx
1: FSNE TEMP[1], CONST[4], CONST[0]
2: OR TEMP[1].x, TEMP[1].xxxx, TEMP[1].yyyy
3: OR TEMP[1].y, TEMP[1].zzzz, TEMP[1].wwww
4: OR TEMP[1].x, TEMP[1].xxxx, TEMP[1].yyyy
5: FSNE TEMP[2], CONST[5], CONST[1]
6: OR TEMP[2].x, TEMP[2].xxxx, TEMP[2].yyyy
7: OR TEMP[2].y, TEMP[2].zzzz, TEMP[2].wwww
8: OR TEMP[2].x, TEMP[2].xxxx, TEMP[2].yyyy
9: FSNE TEMP[3], CONST[6], CONST[2]
10: OR TEMP[3].x, TEMP[3].xxxx, TEMP[3].yyyy
11: OR TEMP[3].y, TEMP[3].zzzz, TEMP[3].wwww
12: OR TEMP[3].x, TEMP[3].xxxx, TEMP[3].yyyy
13: FSNE TEMP[4], CONST[7], CONST[3]
14: OR TEMP[4].x, TEMP[4].xxxx, TEMP[4].yyyy
15: OR TEMP[4].y, TEMP[4].zzzz, TEMP[4].wwww
16: OR TEMP[4].x, TEMP[4].xxxx, TEMP[4].yyyy
17: OR TEMP[1].x, TEMP[1].xxxx, TEMP[4].xxxx <---
18: OR TEMP[1].x, TEMP[1], TEMP[3].xxxx <---
19: OR TEMP[1].x, TEMP[1], TEMP[2].xxxx <---
20: NOT TEMP[1].x, TEMP[1].xxxx
21: AND TEMP[0].x, TEMP[1].xxxx, IMM[0].yyyy
22: MOV OUT[0], TEMP[0]
23: END
The three instructions with arrows are the result of my new logic. I wonder if
it's cause for concern that I'm not setting a swizzle mask on the
src... probably a bit, but it works out here. Is there a "writemask ->
swizzle" converter somewhere? The old instructions would have been
DP4 TEMP[1], TEMP[1], TEMP[1]
SNE TEMP[1], TEMP[1], IMM[0] ( == 0.0)
Or something along those lines. While 1 instruction less in TGSI, at least
nv50/nvc0 are scalar and would have had to implement DP4 as
mul
mul-add
mul-add
mul-add
versus the much more scalar-friendly OR's (in addition to the final SNE being
gone).
src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 75 ++++++++++++++++++++----------
1 file changed, 51 insertions(+), 24 deletions(-)
diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
index bdee1f4..2afd8fb 100644
--- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
+++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
@@ -1671,30 +1671,57 @@ glsl_to_tgsi_visitor::visit(ir_expression *ir)
case ir_unop_any: {
assert(ir->operands[0]->type->is_vector());
- /* After the dot-product, the value will be an integer on the
- * range [0,4]. Zero stays zero, and positive values become 1.0.
- */
- glsl_to_tgsi_instruction *const dp =
- emit_dp(ir, result_dst, op[0], op[0],
- ir->operands[0]->type->vector_elements);
- if (this->prog->Target == GL_FRAGMENT_PROGRAM_ARB &&
- result_dst.type == GLSL_TYPE_FLOAT) {
- /* The clamping to [0,1] can be done for free in the fragment
- * shader with a saturate.
- */
- dp->saturate = true;
- } else if (result_dst.type == GLSL_TYPE_FLOAT) {
- /* Negating the result of the dot-product gives values on the range
- * [-4, 0]. Zero stays zero, and negative values become 1.0. This
- * is achieved using SLT.
- */
- st_src_reg slt_src = result_src;
- slt_src.negate = ~slt_src.negate;
- emit(ir, TGSI_OPCODE_SLT, result_dst, slt_src, st_src_reg_for_float(0.0));
- }
- else {
- /* Use SNE 0 if integers are being used as boolean values. */
- emit(ir, TGSI_OPCODE_SNE, result_dst, result_src, st_src_reg_for_int(0));
+ if (native_integers) {
+ st_src_reg accum = op[0];
+ accum.swizzle = SWIZZLE_XXXX;
+ /* OR all the components together, since they should be either 0 or ~0
+ */
+ assert(ir->operands[0]->type->is_boolean());
+ switch (ir->operands[0]->type->vector_elements) {
+ case 4:
+ op[0].swizzle = SWIZZLE_WWWW;
+ emit(ir, TGSI_OPCODE_OR, result_dst, accum, op[0]);
+ accum = st_src_reg(result_dst);
+ /* fallthrough */
+ case 3:
+ op[0].swizzle = SWIZZLE_ZZZZ;
+ emit(ir, TGSI_OPCODE_OR, result_dst, accum, op[0]);
+ accum = st_src_reg(result_dst);
+ /* fallthrough */
+ case 2:
+ op[0].swizzle = SWIZZLE_YYYY;
+ emit(ir, TGSI_OPCODE_OR, result_dst, accum, op[0]);
+ break;
+ default:
+ assert(!"Unexpected vector size");
+ break;
+ }
+ } else {
+ /* After the dot-product, the value will be an integer on the
+ * range [0,4]. Zero stays zero, and positive values become 1.0.
+ */
+ glsl_to_tgsi_instruction *const dp =
+ emit_dp(ir, result_dst, op[0], op[0],
+ ir->operands[0]->type->vector_elements);
+ if (this->prog->Target == GL_FRAGMENT_PROGRAM_ARB &&
+ result_dst.type == GLSL_TYPE_FLOAT) {
+ /* The clamping to [0,1] can be done for free in the fragment
+ * shader with a saturate.
+ */
+ dp->saturate = true;
+ } else if (result_dst.type == GLSL_TYPE_FLOAT) {
+ /* Negating the result of the dot-product gives values on the range
+ * [-4, 0]. Zero stays zero, and negative values become 1.0. This
+ * is achieved using SLT.
+ */
+ st_src_reg slt_src = result_src;
+ slt_src.negate = ~slt_src.negate;
+ emit(ir, TGSI_OPCODE_SLT, result_dst, slt_src, st_src_reg_for_float(0.0));
+ }
+ else {
+ /* Use SNE 0 if integers are being used as boolean values. */
+ emit(ir, TGSI_OPCODE_SNE, result_dst, result_src, st_src_reg_for_int(0));
+ }
}
break;
}
--
1.8.3.2
More information about the mesa-dev
mailing list