[Mesa-dev] [PATCH] i965/fs: New peephole optimization to flatten IF/BREAK/ENDIF.

Paul Berry stereotype441 at gmail.com
Tue Nov 5 22:34:38 PST 2013


On 5 November 2013 14:58, Paul Berry <stereotype441 at gmail.com> wrote:

> On 1 November 2013 17:31, Matt Turner <mattst88 at gmail.com> wrote:
>
>> total instructions in shared programs: 1649485 -> 1649157 (-0.02%)
>> instructions in affected programs:     7823 -> 7495 (-4.19%)
>> ---
>>  src/mesa/drivers/dri/i965/Makefile.sources         |  1 +
>>  src/mesa/drivers/dri/i965/brw_fs.cpp               |  1 +
>>  src/mesa/drivers/dri/i965/brw_fs.h                 |  1 +
>>  .../dri/i965/brw_fs_peephole_predicated_break.cpp  | 96
>> ++++++++++++++++++++++
>>  4 files changed, 99 insertions(+)
>>  create mode 100644
>> src/mesa/drivers/dri/i965/brw_fs_peephole_predicated_break.cpp
>>
>> diff --git a/src/mesa/drivers/dri/i965/Makefile.sources
>> b/src/mesa/drivers/dri/i965/Makefile.sources
>> index 14b2f61..fcc6fdb 100644
>> --- a/src/mesa/drivers/dri/i965/Makefile.sources
>> +++ b/src/mesa/drivers/dri/i965/Makefile.sources
>> @@ -59,6 +59,7 @@ i965_FILES = \
>>         brw_fs_fp.cpp \
>>         brw_fs_generator.cpp \
>>         brw_fs_live_variables.cpp \
>> +       brw_fs_peephole_predicated_break.cpp \
>>         brw_fs_reg_allocate.cpp \
>>         brw_fs_vector_splitting.cpp \
>>         brw_fs_visitor.cpp \
>> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp
>> b/src/mesa/drivers/dri/i965/brw_fs.cpp
>> index 7064910..5c8443f 100644
>> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
>> @@ -3149,6 +3149,7 @@ fs_visitor::run()
>>          progress = opt_algebraic() || progress;
>>          progress = opt_cse() || progress;
>>          progress = opt_copy_propagate() || progress;
>> +         progress = opt_peephole_predicated_break() || progress;
>>          progress = dead_code_eliminate() || progress;
>>          progress = dead_code_eliminate_local() || progress;
>>          progress = register_coalesce() || progress;
>> diff --git a/src/mesa/drivers/dri/i965/brw_fs.h
>> b/src/mesa/drivers/dri/i965/brw_fs.h
>> index 43e4761..09e5174 100644
>> --- a/src/mesa/drivers/dri/i965/brw_fs.h
>> +++ b/src/mesa/drivers/dri/i965/brw_fs.h
>> @@ -362,6 +362,7 @@ public:
>>     bool try_emit_saturate(ir_expression *ir);
>>     bool try_emit_mad(ir_expression *ir, int mul_arg);
>>     void try_replace_with_sel();
>> +   bool opt_peephole_predicated_break();
>>     void emit_bool_to_cond_code(ir_rvalue *condition);
>>     void emit_if_gen6(ir_if *ir);
>>     void emit_unspill(fs_inst *inst, fs_reg reg, uint32_t spill_offset,
>> diff --git
>> a/src/mesa/drivers/dri/i965/brw_fs_peephole_predicated_break.cpp
>> b/src/mesa/drivers/dri/i965/brw_fs_peephole_predicated_break.cpp
>> new file mode 100644
>> index 0000000..7845113
>> --- /dev/null
>> +++ b/src/mesa/drivers/dri/i965/brw_fs_peephole_predicated_break.cpp
>> @@ -0,0 +1,96 @@
>> +/*
>> + * Copyright © 2013 Intel Corporation
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining
>> a
>> + * copy of this software and associated documentation files (the
>> "Software"),
>> + * to deal in the Software without restriction, including without
>> limitation
>> + * the rights to use, copy, modify, merge, publish, distribute,
>> sublicense,
>> + * and/or sell copies of the Software, and to permit persons to whom the
>> + * Software is furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice (including the
>> next
>> + * paragraph) shall be included in all copies or substantial portions of
>> the
>> + * Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
>> EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
>> MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT
>> SHALL
>> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
>> OTHER
>> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>> ARISING
>> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
>> DEALINGS
>> + * IN THE SOFTWARE.
>> + */
>> +
>> +#include "brw_fs.h"
>> +#include "brw_cfg.h"
>> +
>> +/** @file brw_fs_peephole_predicated_break.cpp
>> + *
>> + * Loops are often structured as
>> + *
>> + * loop:
>> + *    CMP.f0
>> + *    (+f0) IF
>> + *    BREAK
>> + *    ENDIF
>> + *    ...
>> + *    WHILE loop
>> + *
>> + * This peephole pass removes the IF and ENDIF instructions and
>> predicates the
>> + * BREAK, dropping two instructions from the loop body.
>>
>
> Instead of this special-purpose peephole optimization, I'm wondering if it
> would be more useful to create a more general optimization pass that just
> converts the entire IF block to predicated instructions, assuming certain
> conditions hold.  Those conditions would be something like:
>
> - If any instruction but the last writes to the flag register, then don't
> do the optimization.
> - If there's nested control flow, then don't do the optimization.
> - If any instruction inside the IF block is incompatible with predication
> (are there any such instructions?  I don't know) then don't do the
> optimization.
> - If the block is larger than a certain heuristically-determined size,
> then don't do the optimization.
>
> Such an optimization could in principle even handle IF/ELSE blocks, by
> inverting the predicate in the ELSE section.
>

Not to pressure you or anything, but I happened to be looking at the
geometry shader for orbital-explorer this evening, and there are a bunch of
examples of code like this:

0x00000140: cmp.l.f0(8)     null            g51<4,4,1>.wF   g1<0,4,1>.xF
{ align16 WE_normal 1Q switch };
0x00000150: (+f0) if(8) 0 0                 null            0x00000000UD
{ align16 WE_normal 1Q switch };
0x00000160: add(8)          g45<1>.xD       g45<4,4,1>.xD   1D
{ align16 WE_normal 1Q };
0x00000170: endif(8) 2                      null            0x00000002UD
{ align16 WE_normal 1Q switch };

which would be a perfect candidate for the more general-purpose
optimization I was proposing.

(granted it's a geometry shader, so it's less interesting to optimize since
most programs are fragment limited).


>
>
> Having said all that, I think this optimization pass does what you intend,
> so with the exception of one comment I've noted below, the patch is:
>
> Reviewed-by: Paul Berry <stereotype441 at gmail.com>
>
>
>> + */
>> +
>> +bool
>> +fs_visitor::opt_peephole_predicated_break()
>> +{
>> +   bool progress = false;
>> +
>> +   cfg_t cfg(this);
>> +
>> +   for (int b = 0; b < cfg.num_blocks; b++) {
>> +      bblock_t *block = cfg.blocks[b];
>> +
>> +      /* IF instructions, by definition, can only be found at the ends of
>> +       * basic blocks.
>> +       */
>> +      fs_inst *if_inst = (fs_inst *) block->end;
>> +      if (if_inst->opcode != BRW_OPCODE_IF)
>> +         continue;
>> +
>> +      fs_inst *inst = (fs_inst *) if_inst->next;
>> +      if (inst->opcode != BRW_OPCODE_BREAK && inst->opcode !=
>> BRW_OPCODE_CONTINUE)
>> +         continue;
>> +
>> +      fs_inst *endif_inst = (fs_inst *) inst->next;
>> +      if (endif_inst->opcode != BRW_OPCODE_ENDIF)
>> +         continue;
>> +
>> +      /* For Sandybridge with IF with embedded comparison we need to
>> emit an
>> +       * instruction to set the flag register.
>> +       */
>> +      if (brw->gen == 6 && if_inst->conditional_mod) {
>> +         fs_inst *cmp_inst = CMP(reg_null_d, if_inst->src[0],
>> if_inst->src[1],
>> +                                 if_inst->conditional_mod);
>> +         if_inst->insert_before(cmp_inst);
>> +         inst->predicate = BRW_PREDICATE_NORMAL;
>> +      } else {
>> +         inst->predicate = if_inst->predicate;
>> +         inst->predicate_inverse = if_inst->predicate_inverse;
>> +      }
>> +
>> +      if_inst->remove();
>> +      endif_inst->remove();
>> +
>> +      /* By removing the ENDIF instruction we removed a basic block.
>> Skip over
>> +       * it for the next iteration.
>> +       */
>> +      b++;
>>
>
> The correctness of this code depends on basic blocks being generated in
> order (something which could conceivably change in the future).  Just for
> the sake of safety, can we have an assertion or two here just to verify
> that the basic block being skipped is the same basic block that was removed?
>
>
>> +
>> +      progress = true;
>> +   }
>> +
>> +   if (progress)
>> +      invalidate_live_intervals();
>> +
>> +   return progress;
>> +}
>> --
>> 1.8.3.2
>>
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev at lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20131105/5b54a48b/attachment-0001.html>


More information about the mesa-dev mailing list