[Mesa-dev] [RFC v1] i965/vec4: add a swizzle check on the uniform bail condition
Matt Turner
mattst88 at gmail.com
Wed Sep 23 13:12:40 PDT 2015
On Wed, Sep 23, 2015 at 12:29 PM, Alejandro PiƱeiro
<apinheiro at igalia.com> wrote:
> Without this commit, an uniform is bailed out if the instruction
"an uniform" -> "a uniform" (because the 'u' actually makes a consonant sound)
I'm not sure what is meant by "bailed out"
> has 3 sources. This commit allow to go on if the combined
> swizzle is a single value.
Nice find! This is spot-on.
Indeed, 3-src instructions can access a scalar value and we were
mistakenly ignoring that capability.
> WIP: it can be clearly cleaned, as the condition is somewhat
> ugly right now.
>
> Shader-db results for vec4 programs on Haswell:
> total instructions in shared programs: 1666043 -> 1648378 (-1.06%)
> instructions in affected programs: 812755 -> 795090 (-2.17%)
> helped: 7930
> HURT: 0
I get
Without NIR:
total instructions in shared programs: 6717457 -> 6713423 (-0.06%)
instructions in affected programs: 296630 -> 292596 (-1.36%)
helped: 2533
With NIR:
total instructions in shared programs: 6529329 -> 6511760 (-0.27%)
instructions in affected programs: 811929 -> 794360 (-2.16%)
helped: 7865
I find providing filtered results (i.e., only results from vertex
shaders) to be confusing, especially when that fact is withheld.
> ---
>
> Some background: in order to find places to optimize the instruction
> count, I'm running shader-db comparing the outcome using IR vs using
> NIR. Currently one of the HURT shaders is
> unity/57-DeferredDirectional.shader_test, that I simplified to the
> following shader with the same issue:
>
> uniform float myFloat;
>
> void main ()
> {
> gl_Position = mix (gl_Color, gl_Vertex, vec4(myFloat));
> }
>
> Note that it is an uniform, and a float, used to recreate a vec4 on an
> lrp. That example doesn't make too much sense, but again, it is a
> synthetic shader that shows the same issue that the original one that
> made more sense.
>
> This shader requires one instruction more on NIR. There is one
> major difference at the starting point. On IR we have this:
>
> 0: lrp vgrf0.0:F, u0.xxxx:F, attr0.xyzw:F, attr3.xyzw:F
> 1: mov m2:D, 0D
> 2: mov m3:F, vgrf0.xyzw:F
> 3: vs_urb_write (null):UD
>
> While on NIR we have this:
>
> 0: mov vgrf0.0:F, attr3.xyzw:F
> 1: mov vgrf1.0:F, attr0.xyzw:F
> 2: mov vgrf2.0:UD, u0.xyzw:UD
> 3: lrp vgrf3.0:F, vgrf2.xxxx:F, vgrf1.xyzw:F, vgrf0.xyzw:F
> 4: mov m2:D, 0D
> 5: mov m3:F, vgrf3.xyzw:F
> 6: vs_urb_write (null):UD
>
> So NIR initially emits three extra MOVs. Two of them are not a problem
> because are properly propagated and dead-code eliminated. But the third
> one no, as it get bailed out from the original condition, that bailed
> out when it was an uniform on a 3-sourced instruction.
>
> After that I need to confess that I got to the solution based on the
> assumption that this MOV can be copy-propagated (as IR doesn't have
> it) and the outcome when I just removed that condition. When I just
> hacked out that condition, one assertion at
> brw_eu_emit.c::get_3src_subreg_nr was triggered:
>
> assert(brw_is_single_value_swizzle(reg.dw1.bits.swizzle));
>
> That "inspired me" to add that check on the condition. But at this
> point I can't clearly justify it, even if it doesn't show any piglit
> regression, and shows a shader-db numbers so positive.
>
> But when I was about to start to research this, I remembered that
> meanwhile I have focused myself on trying to help on the backend side,
> other people, like Matt Turner, Eduardo Lima, and Jason Ekstrand have
> been working on the NIR side, including commits like
> "nir/lower_vec_to_movs: Don't emit unneeded movs". So perhaps the way
> to solve this issue is going to the root, and avoid those extra MOVs
> as originally IR did, and someone is already working on that. So
> before deeping on researching why this is working. Is someone else
> working to solve this issue on the NIR side? Do someone thinks that
> this should be solved there (or even on the MOV emission)?
>
>
> src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
> index d3f0ddd..400a13a 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
> @@ -325,7 +325,8 @@ try_copy_propagate(const struct brw_device_info *devinfo,
> inst->opcode == SHADER_OPCODE_GEN4_SCRATCH_WRITE)
> return false;
>
> - if (inst->is_3src() && value.file == UNIFORM)
> + if (inst->is_3src() && value.file == UNIFORM &&
> + !brw_is_single_value_swizzle(brw_compose_swizzle(inst->src[arg].swizzle, value.swizzle)))
I think just linewrap this and remove the WIP from the commit message, and
Reviewed-by: Matt Turner <mattst88 at gmail.com>
Excellent work!
More information about the mesa-dev
mailing list