[Mesa-dev] [PATCH] i965/fs_nir: Put the immediate in src1 for commutative ALU ops

Fri May 8 11:17:42 PDT 2015

On Fri, May 8, 2015 at 11:15 AM, Matt Turner <mattst88 at gmail.com> wrote:
> On Fri, May 8, 2015 at 11:13 AM, Jason Ekstrand <jason at jlekstrand.net> wrote:
>> On Fri, May 8, 2015 at 11:05 AM, Kenneth Graunke <kenneth at whitecape.org> wrote:
>>> On Friday, May 08, 2015 10:18:44 AM Matt Turner wrote:
>>>> On Fri, May 8, 2015 at 10:04 AM, Jason Ekstrand <jason at jlekstrand.net> wrote:
>>>> > Shader-db results for fragment shaders on Broadwell:
>>>> >
>>>> >    total instructions in shared programs: 4310987 -> 4310663 (-0.01%)
>>>> >    instructions in affected programs:     43242 -> 42918 (-0.75%)
>>>> >    helped:                                142
>>>> >    HURT:                                  0
>>>> >
>>>> > Shader-db results for vertex shaders on Broadwell:
>>>> >
>>>> >    total instructions in shared programs: 2889581 -> 2844309 (-1.57%)
>>>> >    instructions in affected programs:     1418720 -> 1373448 (-3.19%)
>>>> >    helped:                                6139
>>>> >    HURT:                                  0
>>>> > ---
>>>> >  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 12 ++++++++++++
>>>> >  1 file changed, 12 insertions(+)
>>>> >
>>>> > diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>>>> > index 555987d..161a262 100644
>>>> > --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>>>> > +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>>>> > @@ -21,6 +21,8 @@
>>>> >   * IN THE SOFTWARE.
>>>> >   */
>>>> >
>>>> > +#include <algorithm>
>>>> > +
>>>> >  #include "glsl/ir.h"
>>>> >  #include "glsl/ir_optimization.h"
>>>> >  #include "glsl/nir/glsl_to_nir.h"
>>>> > @@ -662,6 +664,16 @@ fs_visitor::nir_emit_alu(nir_alu_instr *instr)
>>>> >        op[i] = offset(op[i], instr->src[i].swizzle[channel]);
>>>> >     }
>>>> >
>>>> > +   /* Immediates can only be used as the second source of two-source
>>>> > +    * instructions.  We have code in opt_algebraic to flip them as needed
>>>> > +    * for most instructions.  However, it doesn't hurt anything to just do
>>>> > +    * the right thing if we can detect it at the NIR level.
>>>> > +    */
>>>> > +   if ((nir_op_infos[instr->op].algebraic_properties & NIR_OP_IS_COMMUTATIVE) &&
>>>> > +       nir_src_as_const_value(instr->src[0].src)) {
>>>> > +      std::swap(op[0], op[1]);
>>>> > +   }
>>>> > +
>>>>
>>>> The real problem is that we haven't lifted the restriction about
>>>> non-commutative integer multiply on Broadwell:
>>>>
>>>> brw_fs_copy_propagation.cpp:
>>>>
>>>>    /* Fit this constant in by commuting the operands.
>>>>     * Exception: we can't do this for 32-bit integer MUL/MACH
>>>>     * because it's asymmetric.
>>>>     */
>>>>    if ((inst->opcode == BRW_OPCODE_MUL ||
>>>>         inst->opcode == BRW_OPCODE_MACH) &&
>>>>        (inst->src[1].type == BRW_REGISTER_TYPE_D ||
>>>>         inst->src[1].type == BRW_REGISTER_TYPE_UD))
>>>>       break;
>>>>    inst->src[0] = inst->src[1];
>>>>    inst->src[1] = val;
>>>>    progress = true;
>>>
>>> Yeah.  I also wrote a patch to do that, adding
>>>
>>>    (brw->gen < 8 || brw->is_cherryview) &&
>>
>> In that case, shouldn't Cherry View take the same path as hsw when
>> emitting multiplies and look for 15-bit constants?
>
> Almost... that path needs to set one of the MUL's source types to W/UW
> and the stride to 2, like in commit 0c06d019. I'm planning to rip out
> all of the multiplication code from brw_fs_visitor.cpp/brw_fs_nir.cpp
> and move it to a common lowering pass, so I'll fix that at the same
> time.

Awesome!  Thanks for working on that!