[Mesa-dev] [RFC v1] i965/vec4: add a swizzle check on the uniform bail condition

Jason Ekstrand jason at jlekstrand.net
Wed Sep 23 13:07:26 PDT 2015


On Sep 23, 2015 12:29, "Alejandro PiƱeiro" <apinheiro at igalia.com> wrote:
>
> Without this commit, an uniform is bailed out if the instruction
> has 3 sources. This commit allow to go on if the combined
> swizzle is a single value.

This is interesting and, in general, I like it.

> WIP: it can be clearly cleaned, as the condition is somewhat
>      ugly right now.
>
> Shader-db results for vec4 programs on Haswell:
> total instructions in shared programs: 1666043 -> 1648378 (-1.06%)
> instructions in affected programs:     812755 -> 795090 (-2.17%)
> helped:                                7930
> HURT:                                  0

I assume these are NIR numbers. Does it help with the old IR?

> ---
>
> Some background: in order to find places to optimize the instruction
> count, I'm running shader-db comparing the outcome using IR vs using
> NIR. Currently one of the HURT shaders is
> unity/57-DeferredDirectional.shader_test, that I simplified to the
> following shader with the same issue:
>
>    uniform float myFloat;
>
>    void main ()
>    {
>      gl_Position = mix (gl_Color, gl_Vertex, vec4(myFloat));
>    }
>
> Note that it is an uniform, and a float, used to recreate a vec4 on an
> lrp.  That example doesn't make too much sense, but again, it is a
> synthetic shader that shows the same issue that the original one that
> made more sense.
>
> This shader requires one instruction more on NIR. There is one
> major difference at the starting point. On IR we have this:
>
>    0: lrp vgrf0.0:F, u0.xxxx:F, attr0.xyzw:F, attr3.xyzw:F
>    1: mov m2:D, 0D
>    2: mov m3:F, vgrf0.xyzw:F
>    3: vs_urb_write (null):UD
>
> While on NIR we have this:
>
>    0: mov vgrf0.0:F, attr3.xyzw:F
>    1: mov vgrf1.0:F, attr0.xyzw:F
>    2: mov vgrf2.0:UD, u0.xyzw:UD
>    3: lrp vgrf3.0:F, vgrf2.xxxx:F, vgrf1.xyzw:F, vgrf0.xyzw:F
>    4: mov m2:D, 0D
>    5: mov m3:F, vgrf3.xyzw:F
>    6: vs_urb_write (null):UD
>
> So NIR initially emits three extra MOVs. Two of them are not a problem
> because are properly propagated and dead-code eliminated. But the third
> one no, as it get bailed out from the original condition, that bailed
> out when it was an uniform on a 3-sourced instruction.
>
> After that I need to confess that I got to the solution based on the
> assumption that this MOV can be copy-propagated (as IR doesn't have
> it) and the outcome when I just removed that condition. When I just
> hacked out that condition, one assertion at
> brw_eu_emit.c::get_3src_subreg_nr was triggered:
>
>       assert(brw_is_single_value_swizzle(reg.dw1.bits.swizzle));
>
> That "inspired me" to add that check on the condition. But at this
> point I can't clearly justify it, even if it doesn't show any piglit
> regression, and shows a shader-db numbers so positive.
>
> But when I was about to start to research this, I remembered that
> meanwhile I have focused myself on trying to help on the backend side,
> other people, like Matt Turner, Eduardo Lima, and Jason Ekstrand have
> been working on the NIR side, including commits like
> "nir/lower_vec_to_movs: Don't emit unneeded movs". So perhaps the way
> to solve this issue is going to the root, and avoid those extra MOVs
> as originally IR did, and someone is already working on that. So
> before deeping on researching why this is working. Is someone else
> working to solve this issue on the NIR side? Do someone thinks that
> this should be solved there (or even on the MOV emission)?
>
>
>  src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
> index d3f0ddd..400a13a 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
> @@ -325,7 +325,8 @@ try_copy_propagate(const struct brw_device_info
*devinfo,
>         inst->opcode == SHADER_OPCODE_GEN4_SCRATCH_WRITE)
>        return false;
>
> -   if (inst->is_3src() && value.file == UNIFORM)
> +   if (inst->is_3src() && value.file == UNIFORM &&
> +
 !brw_is_single_value_swizzle(brw_compose_swizzle(inst->src[arg].swizzle,
value.swizzle)))

This is a case where we probably want to do the check after we've created
the "final" value.  The question that's being asked is "is this final value
allowed?"

Also, we emitted it before with the is_visitor code. What checks are we
doing there to make sure we only emitted the uniform directly if it was OK
to do so?

>        return false;
>
>     if (inst->is_send_from_grf())
> --
> 2.1.4
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20150923/ed588d80/attachment-0001.html>


More information about the mesa-dev mailing list