[Mesa-dev] [PATCH] i965/fs_nir: Put the immediate in src1 for commutative ALU ops
Jason Ekstrand
jason at jlekstrand.net
Fri May 8 11:13:09 PDT 2015
On Fri, May 8, 2015 at 11:05 AM, Kenneth Graunke <kenneth at whitecape.org> wrote:
> On Friday, May 08, 2015 10:18:44 AM Matt Turner wrote:
>> On Fri, May 8, 2015 at 10:04 AM, Jason Ekstrand <jason at jlekstrand.net> wrote:
>> > Shader-db results for fragment shaders on Broadwell:
>> >
>> > total instructions in shared programs: 4310987 -> 4310663 (-0.01%)
>> > instructions in affected programs: 43242 -> 42918 (-0.75%)
>> > helped: 142
>> > HURT: 0
>> >
>> > Shader-db results for vertex shaders on Broadwell:
>> >
>> > total instructions in shared programs: 2889581 -> 2844309 (-1.57%)
>> > instructions in affected programs: 1418720 -> 1373448 (-3.19%)
>> > helped: 6139
>> > HURT: 0
>> > ---
>> > src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 12 ++++++++++++
>> > 1 file changed, 12 insertions(+)
>> >
>> > diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> > index 555987d..161a262 100644
>> > --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> > +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> > @@ -21,6 +21,8 @@
>> > * IN THE SOFTWARE.
>> > */
>> >
>> > +#include <algorithm>
>> > +
>> > #include "glsl/ir.h"
>> > #include "glsl/ir_optimization.h"
>> > #include "glsl/nir/glsl_to_nir.h"
>> > @@ -662,6 +664,16 @@ fs_visitor::nir_emit_alu(nir_alu_instr *instr)
>> > op[i] = offset(op[i], instr->src[i].swizzle[channel]);
>> > }
>> >
>> > + /* Immediates can only be used as the second source of two-source
>> > + * instructions. We have code in opt_algebraic to flip them as needed
>> > + * for most instructions. However, it doesn't hurt anything to just do
>> > + * the right thing if we can detect it at the NIR level.
>> > + */
>> > + if ((nir_op_infos[instr->op].algebraic_properties & NIR_OP_IS_COMMUTATIVE) &&
>> > + nir_src_as_const_value(instr->src[0].src)) {
>> > + std::swap(op[0], op[1]);
>> > + }
>> > +
>>
>> The real problem is that we haven't lifted the restriction about
>> non-commutative integer multiply on Broadwell:
>>
>> brw_fs_copy_propagation.cpp:
>>
>> /* Fit this constant in by commuting the operands.
>> * Exception: we can't do this for 32-bit integer MUL/MACH
>> * because it's asymmetric.
>> */
>> if ((inst->opcode == BRW_OPCODE_MUL ||
>> inst->opcode == BRW_OPCODE_MACH) &&
>> (inst->src[1].type == BRW_REGISTER_TYPE_D ||
>> inst->src[1].type == BRW_REGISTER_TYPE_UD))
>> break;
>> inst->src[0] = inst->src[1];
>> inst->src[1] = val;
>> progress = true;
>
> Yeah. I also wrote a patch to do that, adding
>
> (brw->gen < 8 || brw->is_cherryview) &&
In that case, shouldn't Cherry View take the same path as hsw when
emitting multiplies and look for 15-bit constants?
> which also solves the problem. But it won't help on Cherryview, which I
> believe still has the asymmetrical multiply, while switching to shifts
> will. I suppose another alternative to NIR late optimizations is to
> have brw_fs_nir's handling of imul check for power of two sizes and emit
> a SHL. That would be easy.
I really don't think SHL is the issue here. It's that we're being
stupid about multiplies. SHL is a nice hack but unless it is actually
faster, I think it's hacking around the problem.
> I do think we really need to make logical IMUL opcodes and lower them to
> MUL/MACH as needed later, so we don't need to worry about breaking MACH
> in cases like this.
More information about the mesa-dev
mailing list