[Mesa-dev] [PATCH 37/59] i965/fs: generalize SIMD16 interference workaround

Wed May 4 23:20:02 UTC 2016

On Curro's suggestion, I tried reproducing the problem that the patch
fixes. Not only could I not reproduce it, but it seems like the
original workaround that Jason added back when he reworked the fs
backend ages ago is unnecessary at least on my ivybridge and broadwell
machines. Since reproducing it involves specific register-allocation
choices, I hacked the fs visitor to skip emit_nir_code() and instead
produce a specific program with some hardcoded register assignments,
then picked a random piglit test
(tests/shaders/glsl-fs-abs-neg.shader_test) and ran it with
INTEL_DEBUG=no8. The resulting assembly looked like:

Native code for unnamed fragment shader GLSL3
SIMD16 shader: 7 instructions. 0 loops. 58 cycles. 0:0 spills:fills.
Promoted 0 constants. Compacted 112 to 96 bytes (14%)
   START B0
mov(16)         g10<1>F         -1F                             { align1 1H };
mov(16)         g122<1>F        1F                              { align1 1H };
mov(16)         g124<1>F        [0F, 0F, 0F, 0F]VF              {
align1 1H compacted };
mov(16)         g126<1>F        1F                              { align1 1H };
add(16)         g11<1>F         g10<8,8,1>F     1F              { align1 1H };
mov(16)         g120<1>F        g11<8,8,1>F                     {
align1 1H compacted };
sendc(16)       null<1>UW       g120<8,8,1>F
                            render RT write SIMD16 LastRT Surface = 0
mlen 8 rlen 0 { align1 1H EOT };
   END B0

If the comment Jason added is correct, then

   add(16)         g11<1>F         g10<8,8,1>F     1F              {
align1 1H };

will get expanded into

   add(8)         g11<1>F         g10<8,8,1>F     1F              { align1 1Q };
   add(8)         g12<1>F         g11<8,8,1>F     1F              { align1 2Q };

And the add will happen twice for g11, making some of the pixels
yellow instead of green. But on the machines I tested, this resulted
in pure green. I also tried a similar thing for fp64, but there was
still no problem. I Also tested the similar problem described in
fs_inst::has_source_and_destination_hazard() with this program:

Native code for unnamed fragment shader GLSL3
SIMD16 shader: 7 instructions. 0 loops. 56 cycles. 0:0 spills:fills.
Promoted 0 constants. Compacted 112 to 96 bytes (14%)
   START B0
mov(1)          g10<1>F         -1F                             {
align1 WE_all };
mov(16)         g122<1>F        1F                              { align1 1H };
mov(16)         g124<1>F        [0F, 0F, 0F, 0F]VF              {
align1 1H compacted };
mov(16)         g126<1>F        1F                              { align1 1H };
add(16)         g10<1>F         g10<0,1,0>F     1F              { align1 1H };
mov(16)         g120<1>F        g10<8,8,1>F                     {
align1 1H compacted };
sendc(16)       null<1>UW       g120<8,8,1>F
                            render RT write SIMD16 LastRT Surface = 0
mlen 8 rlen 0 { align1 1H EOT };
   END B0

and again the comment seems to be wrong. Also, there were no piglit
regressions with the patch reverted, and the only tests broken with
the original workarounds reverted are ones that use math instructions
which we split to SIMD8 in the generator (but probably shouldn't be).
So for now, we can probably drop this patch, and if we can do more
testing, then we can probably drop the original workaround or restrict
it to older gens.

If anyone wants to test this on other platforms than broadwell and
ivybridge, the hacks I made are available on my fdo repo on the
test-compressed-wa and test-fp64-compressed-wa (the former has two
commits, go back to the first to produce the first assembly).

On Fri, Apr 29, 2016 at 7:29 AM, Samuel Iglesias Gonsálvez
<siglesias at igalia.com> wrote:
> From: Connor Abbott <connor.w.abbott at intel.com>
>
> Work based on registers read/written instead of dispatch_width, so that
> the interferences are added for 64-bit sources/destinations as well.
> ---
>  src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp | 67 ++++++++++++++++-------
>  1 file changed, 48 insertions(+), 19 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
> index 2347cd5..f0e96f9 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
> @@ -541,6 +541,34 @@ setup_mrf_hack_interference(fs_visitor *v, struct ra_graph *g,
>     }
>  }
>
> +static unsigned
> +get_reg_node(const fs_reg& reg, unsigned first_payload_node)
> +{
> +   switch (reg.file) {
> +   case VGRF:
> +      return reg.nr;
> +   case ARF:
> +   case FIXED_GRF:
> +      return first_payload_node + reg.nr;
> +   case MRF:
> +   default:
> +      unreachable("unhandled register file");
> +   }
> +
> +   return 0;
> +}
> +
> +static void
> +add_reg_interference(struct ra_graph *g, const fs_reg& reg1,
> +                     const fs_reg& reg2, unsigned first_payload_node)
> +{
> +   if ((reg1.file == VGRF || reg1.file == ARF || reg1.file == FIXED_GRF) &&
> +       (reg2.file == VGRF || reg2.file == ARF || reg2.file == FIXED_GRF)) {
> +      ra_add_node_interference(g, get_reg_node(reg1, first_payload_node),
> +                               get_reg_node(reg2, first_payload_node));
> +   }
> +}
> +
>  bool
>  fs_visitor::assign_regs(bool allow_spilling)
>  {
> @@ -643,26 +671,27 @@ fs_visitor::assign_regs(bool allow_spilling)
>        }
>     }
>
> -   if (dispatch_width > 8) {
> -      /* In 16-wide dispatch we have an issue where a compressed
> -       * instruction is actually two instructions executed simultaneiously.
> -       * It's actually ok to have the source and destination registers be
> -       * the same.  In this case, each instruction over-writes its own
> -       * source and there's no problem.  The real problem here is if the
> -       * source and destination registers are off by one.  Then you can end
> -       * up in a scenario where the first instruction over-writes the
> -       * source of the second instruction.  Since the compiler doesn't know
> -       * about this level of granularity, we simply make the source and
> -       * destination interfere.
> -       */
> -      foreach_block_and_inst(block, fs_inst, inst, cfg) {
> -         if (inst->dst.file != VGRF)
> -            continue;
> +   /* When instructions both read/write more than a single SIMD8 register, we
> +    * have an issue where an instruction is actually two instructions executed
> +    * simultaneiously.  It's actually ok to have the source and destination
> +    * registers be the same.  In this case, each instruction over-writes its
> +    * own source and there's no problem.  The real problem here is if the
> +    * source and destination registers are off by one.  Then you can end up in
> +    * a scenario where the first instruction over-writes the source of the
> +    * second instruction.  Since the compiler doesn't know about this level of
> +    * granularity, we simply make the source and destination interfere.
> +    */
> +   foreach_block_and_inst(block, fs_inst, inst, cfg) {
> +      if (inst->dst.file != VGRF &&
> +          inst->dst.file != ARF && inst->dst.file != FIXED_GRF)
> +         continue;
>
> -         for (int i = 0; i < inst->sources; ++i) {
> -            if (inst->src[i].file == VGRF) {
> -               ra_add_node_interference(g, inst->dst.nr, inst->src[i].nr);
> -            }
> +      if (inst->regs_written <= 1)
> +         continue;
> +
> +      for (int i = 0; i < inst->sources; ++i) {
> +         if (inst->regs_read(i) > 1) {
> +            add_reg_interference(g, inst->dst, inst->src[i], first_payload_node);
>           }
>        }
>     }
> --
> 2.5.0
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev