<p dir="ltr"><br>
On Sep 23, 2015 12:29, "Alejandro Piñeiro" <<a href="mailto:apinheiro@igalia.com">apinheiro@igalia.com</a>> wrote:<br>
><br>
> Without this commit, an uniform is bailed out if the instruction<br>
> has 3 sources. This commit allow to go on if the combined<br>
> swizzle is a single value.</p>
<p dir="ltr">This is interesting and, in general, I like it.</p>
<p dir="ltr">> WIP: it can be clearly cleaned, as the condition is somewhat<br>
> ugly right now.<br>
><br>
> Shader-db results for vec4 programs on Haswell:<br>
> total instructions in shared programs: 1666043 -> 1648378 (-1.06%)<br>
> instructions in affected programs: 812755 -> 795090 (-2.17%)<br>
> helped: 7930<br>
> HURT: 0</p>
<p dir="ltr">I assume these are NIR numbers. Does it help with the old IR?</p>
<p dir="ltr">> ---<br>
><br>
> Some background: in order to find places to optimize the instruction<br>
> count, I'm running shader-db comparing the outcome using IR vs using<br>
> NIR. Currently one of the HURT shaders is<br>
> unity/57-DeferredDirectional.shader_test, that I simplified to the<br>
> following shader with the same issue:<br>
><br>
> uniform float myFloat;<br>
><br>
> void main ()<br>
> {<br>
> gl_Position = mix (gl_Color, gl_Vertex, vec4(myFloat));<br>
> }<br>
><br>
> Note that it is an uniform, and a float, used to recreate a vec4 on an<br>
> lrp. That example doesn't make too much sense, but again, it is a<br>
> synthetic shader that shows the same issue that the original one that<br>
> made more sense.<br>
><br>
> This shader requires one instruction more on NIR. There is one<br>
> major difference at the starting point. On IR we have this:<br>
><br>
> 0: lrp vgrf0.0:F, u0.xxxx:F, attr0.xyzw:F, attr3.xyzw:F<br>
> 1: mov m2:D, 0D<br>
> 2: mov m3:F, vgrf0.xyzw:F<br>
> 3: vs_urb_write (null):UD<br>
><br>
> While on NIR we have this:<br>
><br>
> 0: mov vgrf0.0:F, attr3.xyzw:F<br>
> 1: mov vgrf1.0:F, attr0.xyzw:F<br>
> 2: mov vgrf2.0:UD, u0.xyzw:UD<br>
> 3: lrp vgrf3.0:F, vgrf2.xxxx:F, vgrf1.xyzw:F, vgrf0.xyzw:F<br>
> 4: mov m2:D, 0D<br>
> 5: mov m3:F, vgrf3.xyzw:F<br>
> 6: vs_urb_write (null):UD<br>
><br>
> So NIR initially emits three extra MOVs. Two of them are not a problem<br>
> because are properly propagated and dead-code eliminated. But the third<br>
> one no, as it get bailed out from the original condition, that bailed<br>
> out when it was an uniform on a 3-sourced instruction.<br>
><br>
> After that I need to confess that I got to the solution based on the<br>
> assumption that this MOV can be copy-propagated (as IR doesn't have<br>
> it) and the outcome when I just removed that condition. When I just<br>
> hacked out that condition, one assertion at<br>
> brw_eu_emit.c::get_3src_subreg_nr was triggered:<br>
><br>
> assert(brw_is_single_value_swizzle(reg.dw1.bits.swizzle));<br>
><br>
> That "inspired me" to add that check on the condition. But at this<br>
> point I can't clearly justify it, even if it doesn't show any piglit<br>
> regression, and shows a shader-db numbers so positive.<br>
><br>
> But when I was about to start to research this, I remembered that<br>
> meanwhile I have focused myself on trying to help on the backend side,<br>
> other people, like Matt Turner, Eduardo Lima, and Jason Ekstrand have<br>
> been working on the NIR side, including commits like<br>
> "nir/lower_vec_to_movs: Don't emit unneeded movs". So perhaps the way<br>
> to solve this issue is going to the root, and avoid those extra MOVs<br>
> as originally IR did, and someone is already working on that. So<br>
> before deeping on researching why this is working. Is someone else<br>
> working to solve this issue on the NIR side? Do someone thinks that<br>
> this should be solved there (or even on the MOV emission)?<br>
><br>
><br>
> src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp | 3 ++-<br>
> 1 file changed, 2 insertions(+), 1 deletion(-)<br>
><br>
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp<br>
> index d3f0ddd..400a13a 100644<br>
> --- a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp<br>
> +++ b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp<br>
> @@ -325,7 +325,8 @@ try_copy_propagate(const struct brw_device_info *devinfo,<br>
> inst->opcode == SHADER_OPCODE_GEN4_SCRATCH_WRITE)<br>
> return false;<br>
><br>
> - if (inst->is_3src() && value.file == UNIFORM)<br>
> + if (inst->is_3src() && value.file == UNIFORM &&<br>
> + !brw_is_single_value_swizzle(brw_compose_swizzle(inst->src[arg].swizzle, value.swizzle)))</p>
<p dir="ltr">This is a case where we probably want to do the check after we've created the "final" value. The question that's being asked is "is this final value allowed?"</p>
<p dir="ltr">Also, we emitted it before with the is_visitor code. What checks are we doing there to make sure we only emitted the uniform directly if it was OK to do so?</p>
<p dir="ltr">> return false;<br>
><br>
> if (inst->is_send_from_grf())<br>
> --<br>
> 2.1.4<br>
><br>
</p>