[Mesa-dev] [RFC v1] i965/vec4: add a swizzle check on the uniform bail condition

Wed Sep 23 13:12:40 PDT 2015

On Wed, Sep 23, 2015 at 12:29 PM, Alejandro Piñeiro
<apinheiro at igalia.com> wrote:
> Without this commit, an uniform is bailed out if the instruction

"an uniform" -> "a uniform" (because the 'u' actually makes a consonant sound)

I'm not sure what is meant by "bailed out"

> has 3 sources. This commit allow to go on if the combined
> swizzle is a single value.

Nice find! This is spot-on.

Indeed, 3-src instructions can access a scalar value and we were
mistakenly ignoring that capability.

> WIP: it can be clearly cleaned, as the condition is somewhat
>      ugly right now.
>
> Shader-db results for vec4 programs on Haswell:
> total instructions in shared programs: 1666043 -> 1648378 (-1.06%)
> instructions in affected programs:     812755 -> 795090 (-2.17%)
> helped:                                7930
> HURT:                                  0

I get

Without NIR:

total instructions in shared programs: 6717457 -> 6713423 (-0.06%)
instructions in affected programs:     296630 -> 292596 (-1.36%)
helped:                                2533

With NIR:

total instructions in shared programs: 6529329 -> 6511760 (-0.27%)
instructions in affected programs:     811929 -> 794360 (-2.16%)
helped:                                7865

I find providing filtered results (i.e., only results from vertex
shaders) to be confusing, especially when that fact is withheld.

> ---
>
> Some background: in order to find places to optimize the instruction
> count, I'm running shader-db comparing the outcome using IR vs using
> NIR. Currently one of the HURT shaders is
> unity/57-DeferredDirectional.shader_test, that I simplified to the
> following shader with the same issue:
>
>    uniform float myFloat;
>
>    void main ()
>    {
>      gl_Position = mix (gl_Color, gl_Vertex, vec4(myFloat));
>    }
>
> Note that it is an uniform, and a float, used to recreate a vec4 on an
> lrp.  That example doesn't make too much sense, but again, it is a
> synthetic shader that shows the same issue that the original one that
> made more sense.
>
> This shader requires one instruction more on NIR. There is one
> major difference at the starting point. On IR we have this:
>
>    0: lrp vgrf0.0:F, u0.xxxx:F, attr0.xyzw:F, attr3.xyzw:F
>    1: mov m2:D, 0D
>    2: mov m3:F, vgrf0.xyzw:F
>    3: vs_urb_write (null):UD
>
> While on NIR we have this:
>
>    0: mov vgrf0.0:F, attr3.xyzw:F
>    1: mov vgrf1.0:F, attr0.xyzw:F
>    2: mov vgrf2.0:UD, u0.xyzw:UD
>    3: lrp vgrf3.0:F, vgrf2.xxxx:F, vgrf1.xyzw:F, vgrf0.xyzw:F
>    4: mov m2:D, 0D
>    5: mov m3:F, vgrf3.xyzw:F
>    6: vs_urb_write (null):UD
>
> So NIR initially emits three extra MOVs. Two of them are not a problem
> because are properly propagated and dead-code eliminated. But the third
> one no, as it get bailed out from the original condition, that bailed
> out when it was an uniform on a 3-sourced instruction.
>
> After that I need to confess that I got to the solution based on the
> assumption that this MOV can be copy-propagated (as IR doesn't have
> it) and the outcome when I just removed that condition. When I just
> hacked out that condition, one assertion at
> brw_eu_emit.c::get_3src_subreg_nr was triggered:
>
>       assert(brw_is_single_value_swizzle(reg.dw1.bits.swizzle));
>
> That "inspired me" to add that check on the condition. But at this
> point I can't clearly justify it, even if it doesn't show any piglit
> regression, and shows a shader-db numbers so positive.
>
> But when I was about to start to research this, I remembered that
> meanwhile I have focused myself on trying to help on the backend side,
> other people, like Matt Turner, Eduardo Lima, and Jason Ekstrand have
> been working on the NIR side, including commits like
> "nir/lower_vec_to_movs: Don't emit unneeded movs". So perhaps the way
> to solve this issue is going to the root, and avoid those extra MOVs
> as originally IR did, and someone is already working on that. So
> before deeping on researching why this is working. Is someone else
> working to solve this issue on the NIR side? Do someone thinks that
> this should be solved there (or even on the MOV emission)?
>
>
>  src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
> index d3f0ddd..400a13a 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
> @@ -325,7 +325,8 @@ try_copy_propagate(const struct brw_device_info *devinfo,
>         inst->opcode == SHADER_OPCODE_GEN4_SCRATCH_WRITE)
>        return false;
>
> -   if (inst->is_3src() && value.file == UNIFORM)
> +   if (inst->is_3src() && value.file == UNIFORM &&
> +       !brw_is_single_value_swizzle(brw_compose_swizzle(inst->src[arg].swizzle, value.swizzle)))

I think just linewrap this and remove the WIP from the commit message, and

Reviewed-by: Matt Turner <mattst88 at gmail.com>

Excellent work!