[Mesa-dev] [RFC v1] i965/vec4: add a swizzle check on the uniform bail condition
Alejandro PiƱeiro
apinheiro at igalia.com
Wed Sep 23 12:29:16 PDT 2015
Without this commit, an uniform is bailed out if the instruction
has 3 sources. This commit allow to go on if the combined
swizzle is a single value.
WIP: it can be clearly cleaned, as the condition is somewhat
ugly right now.
Shader-db results for vec4 programs on Haswell:
total instructions in shared programs: 1666043 -> 1648378 (-1.06%)
instructions in affected programs: 812755 -> 795090 (-2.17%)
helped: 7930
HURT: 0
---
Some background: in order to find places to optimize the instruction
count, I'm running shader-db comparing the outcome using IR vs using
NIR. Currently one of the HURT shaders is
unity/57-DeferredDirectional.shader_test, that I simplified to the
following shader with the same issue:
uniform float myFloat;
void main ()
{
gl_Position = mix (gl_Color, gl_Vertex, vec4(myFloat));
}
Note that it is an uniform, and a float, used to recreate a vec4 on an
lrp. That example doesn't make too much sense, but again, it is a
synthetic shader that shows the same issue that the original one that
made more sense.
This shader requires one instruction more on NIR. There is one
major difference at the starting point. On IR we have this:
0: lrp vgrf0.0:F, u0.xxxx:F, attr0.xyzw:F, attr3.xyzw:F
1: mov m2:D, 0D
2: mov m3:F, vgrf0.xyzw:F
3: vs_urb_write (null):UD
While on NIR we have this:
0: mov vgrf0.0:F, attr3.xyzw:F
1: mov vgrf1.0:F, attr0.xyzw:F
2: mov vgrf2.0:UD, u0.xyzw:UD
3: lrp vgrf3.0:F, vgrf2.xxxx:F, vgrf1.xyzw:F, vgrf0.xyzw:F
4: mov m2:D, 0D
5: mov m3:F, vgrf3.xyzw:F
6: vs_urb_write (null):UD
So NIR initially emits three extra MOVs. Two of them are not a problem
because are properly propagated and dead-code eliminated. But the third
one no, as it get bailed out from the original condition, that bailed
out when it was an uniform on a 3-sourced instruction.
After that I need to confess that I got to the solution based on the
assumption that this MOV can be copy-propagated (as IR doesn't have
it) and the outcome when I just removed that condition. When I just
hacked out that condition, one assertion at
brw_eu_emit.c::get_3src_subreg_nr was triggered:
assert(brw_is_single_value_swizzle(reg.dw1.bits.swizzle));
That "inspired me" to add that check on the condition. But at this
point I can't clearly justify it, even if it doesn't show any piglit
regression, and shows a shader-db numbers so positive.
But when I was about to start to research this, I remembered that
meanwhile I have focused myself on trying to help on the backend side,
other people, like Matt Turner, Eduardo Lima, and Jason Ekstrand have
been working on the NIR side, including commits like
"nir/lower_vec_to_movs: Don't emit unneeded movs". So perhaps the way
to solve this issue is going to the root, and avoid those extra MOVs
as originally IR did, and someone is already working on that. So
before deeping on researching why this is working. Is someone else
working to solve this issue on the NIR side? Do someone thinks that
this should be solved there (or even on the MOV emission)?
src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
index d3f0ddd..400a13a 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
@@ -325,7 +325,8 @@ try_copy_propagate(const struct brw_device_info *devinfo,
inst->opcode == SHADER_OPCODE_GEN4_SCRATCH_WRITE)
return false;
- if (inst->is_3src() && value.file == UNIFORM)
+ if (inst->is_3src() && value.file == UNIFORM &&
+ !brw_is_single_value_swizzle(brw_compose_swizzle(inst->src[arg].swizzle, value.swizzle)))
return false;
if (inst->is_send_from_grf())
--
2.1.4
More information about the mesa-dev
mailing list