[Mesa-dev] [PATCH] i965/fs_nir: Put the immediate in src1 for commutative ALU ops

Fri May 8 11:05:11 PDT 2015

On Friday, May 08, 2015 10:18:44 AM Matt Turner wrote:
> On Fri, May 8, 2015 at 10:04 AM, Jason Ekstrand <jason at jlekstrand.net> wrote:
> > Shader-db results for fragment shaders on Broadwell:
> >
> >    total instructions in shared programs: 4310987 -> 4310663 (-0.01%)
> >    instructions in affected programs:     43242 -> 42918 (-0.75%)
> >    helped:                                142
> >    HURT:                                  0
> >
> > Shader-db results for vertex shaders on Broadwell:
> >
> >    total instructions in shared programs: 2889581 -> 2844309 (-1.57%)
> >    instructions in affected programs:     1418720 -> 1373448 (-3.19%)
> >    helped:                                6139
> >    HURT:                                  0
> > ---
> >  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 12 ++++++++++++
> >  1 file changed, 12 insertions(+)
> >
> > diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> > index 555987d..161a262 100644
> > --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> > +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> > @@ -21,6 +21,8 @@
> >   * IN THE SOFTWARE.
> >   */
> >
> > +#include <algorithm>
> > +
> >  #include "glsl/ir.h"
> >  #include "glsl/ir_optimization.h"
> >  #include "glsl/nir/glsl_to_nir.h"
> > @@ -662,6 +664,16 @@ fs_visitor::nir_emit_alu(nir_alu_instr *instr)
> >        op[i] = offset(op[i], instr->src[i].swizzle[channel]);
> >     }
> >
> > +   /* Immediates can only be used as the second source of two-source
> > +    * instructions.  We have code in opt_algebraic to flip them as needed
> > +    * for most instructions.  However, it doesn't hurt anything to just do
> > +    * the right thing if we can detect it at the NIR level.
> > +    */
> > +   if ((nir_op_infos[instr->op].algebraic_properties & NIR_OP_IS_COMMUTATIVE) &&
> > +       nir_src_as_const_value(instr->src[0].src)) {
> > +      std::swap(op[0], op[1]);
> > +   }
> > +
> 
> The real problem is that we haven't lifted the restriction about
> non-commutative integer multiply on Broadwell:
> 
> brw_fs_copy_propagation.cpp:
> 
>    /* Fit this constant in by commuting the operands.
>     * Exception: we can't do this for 32-bit integer MUL/MACH
>     * because it's asymmetric.
>     */
>    if ((inst->opcode == BRW_OPCODE_MUL ||
>         inst->opcode == BRW_OPCODE_MACH) &&
>        (inst->src[1].type == BRW_REGISTER_TYPE_D ||
>         inst->src[1].type == BRW_REGISTER_TYPE_UD))
>       break;
>    inst->src[0] = inst->src[1];
>    inst->src[1] = val;
>    progress = true;

Yeah.  I also wrote a patch to do that, adding

   (brw->gen < 8 || brw->is_cherryview) &&

which also solves the problem.  But it won't help on Cherryview, which I
believe still has the asymmetrical multiply, while switching to shifts
will.  I suppose another alternative to NIR late optimizations is to
have brw_fs_nir's handling of imul check for power of two sizes and emit
a SHL.  That would be easy.

I do think we really need to make logical IMUL opcodes and lower them to
MUL/MACH as needed later, so we don't need to worry about breaking MACH
in cases like this.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20150508/e1905727/attachment.sig>