[Mesa-dev] [PATCH 4/4] R600: Fold immediates into ALU instructions when possible
Tom Stellard
tom at stellard.net
Tue Nov 13 07:58:35 PST 2012
On Tue, Nov 13, 2012 at 07:41:20AM -0800, Vincent Lejeune wrote:
>
>
>
>
> ----- Mail original -----
> > De : Tom Stellard <tom at stellard.net>
> > À : mesa-dev at lists.freedesktop.org
> > Cc : Tom Stellard <thomas.stellard at amd.com>
> > Envoyé le : Vendredi 9 novembre 2012 21h47
> > Objet : [Mesa-dev] [PATCH 4/4] R600: Fold immediates into ALU instructions when possible
> >
> > From: Tom Stellard <thomas.stellard at amd.com>
> >
> > ---
> > lib/Target/AMDGPU/R600ISelLowering.cpp | 65 ++++++++++++++++++++++---
> > lib/Target/AMDGPU/R600InstrInfo.cpp | 40 +++++++++++++++
> > lib/Target/AMDGPU/R600InstrInfo.h | 12 +++++
> > test/CodeGen/R600/fcmp-cnd.ll | 4 +-
> > test/CodeGen/R600/fcmp-cnde-int-args.ll | 2 +-
> > test/CodeGen/R600/selectcc-icmp-select-float.ll | 2 +-
> > test/CodeGen/R600/selectcc_cnde.ll | 2 +-
> > test/CodeGen/R600/selectcc_cnde_int.ll | 2 +-
> > 8 files changed, 117 insertions(+), 12 deletions(-)
> >
> > diff --git a/lib/Target/AMDGPU/R600ISelLowering.cpp
> > b/lib/Target/AMDGPU/R600ISelLowering.cpp
> > index 75a2a90..dc2247a 100644
> > --- a/lib/Target/AMDGPU/R600ISelLowering.cpp
> > +++ b/lib/Target/AMDGPU/R600ISelLowering.cpp
> > @@ -131,15 +131,66 @@ MachineBasicBlock *
> > R600TargetLowering::EmitInstrWithCustomInserter(
> > }
> >
> > case AMDGPU::MOV_IMM_F32:
> > - TII->buildMovImm(*BB, I, MI->getOperand(0).getReg(),
> > - MI->getOperand(1).getFPImm()->getValueAPF()
> > - .bitcastToAPInt().getZExtValue());
> > - break;
> > case AMDGPU::MOV_IMM_I32:
> > - TII->buildMovImm(*BB, I, MI->getOperand(0).getReg(),
> > - MI->getOperand(1).getImm());
> > - break;
> > + {
> > + unsigned DstReg = MI->getOperand(0).getReg();
> > + bool CanReplace = true;
> > + int64_t Imm;
> > + unsigned ImmReg = AMDGPU::ALU_LITERAL_X;
> > + bool IsInlineConstant = false;
> > +
> > + if (MI->getOpcode() == AMDGPU::MOV_IMM_F32) {
> > + APFloat Value = MI->getOperand(1).getFPImm()->getValueAPF();
> > + float FloatValue = Value.convertToFloat();
> > + if (FloatValue == 0.0f) {
> > + ImmReg = AMDGPU::ZERO;
> > + IsInlineConstant = true;
> > + } else if (FloatValue == 1.0f) {
> > + ImmReg = AMDGPU::ONE;
> > + IsInlineConstant = true;
> > + }
> > + Imm = Value.bitcastToAPInt().getZExtValue();
> > + } else {
> > + Imm = MI->getOperand(1).getImm();
> > + switch (Imm) {
> > + case 0:
> > + ImmReg = AMDGPU::ZERO;
> > + IsInlineConstant = true;
> > + break;
> > + case 1:
> > + ImmReg = AMDGPU::ONE_INT;
> > + IsInlineConstant = true;
> > + break;
> > + }
> > + }
> > +
> > + // Check all the uses of DstReg to make sure it is safe to fold the
> > + // immediate.
> > + for (MachineRegisterInfo::use_iterator UI = MRI.use_begin(DstReg);
> > + UI != MachineRegisterInfo::use_end(); ++UI)
>
> I think use_iterator are a bit tricky to use ; I was told by people on #llvm that the
> use_iterator has no guarantees on order, and is not a def-use chain per se.
> The use iterator returns every machine operand marked as "use" or "implicit use" by a
> machine instruction. For a virtual register that is defined only once, it matches the definition
> of a def-use chain but not for physical register. For instance in the following block :
>
> vreg1 = MOV T2_X (1)
> ...
> T2_X = MOV vreg2 (2)
> ...
> vreg3 = MOV T2_X (3)
>
> the use iterator will returns operand in instruction (1) and (3), even if it was redefined in (2).
> This situation can occur in our backend in the InstrEmitterPass because we generate physical
> register read for some input loading intrinsics, and register write for some output storing intrinsics.
>
Ok, so I guess use_iterator won't work for us here.
> I think using a COPY instruction here would be better, because we'll let register coalescer fold litterals for
> us.
>
I think a COPY will work for AMDGPU::ZERO and AMDGPU::ONE, but probably
not for when we use AMDGPU::ALU_LITERAL_X, because we can't add the
literal to the copy instruction.
I've thought about this more, and I think it might work to handle the
constant folding in AMDGPUISelDAGToDAG::Select(), so I'm going to give this a
try. This way we won't need to use the use_iterator and it should help
the instruction scheduler do a better job of scheduling.
-Tom
> > {
> >
> > + if (!TII->canUseImmediate(UI.getOperand()) &&
> > + !(IsInlineConstant &&
> > TII->isALUInstr((*UI).getOpcode()))) {
> > + CanReplace = false;
> > + break;
> > + }
> > + }
> > +
> > + if (!CanReplace) {
> > + TII->buildMovImm(*BB, I, DstReg, Imm);
> > + } else {
> > + if (!IsInlineConstant) {
> > + // Add the immediate to the use instructions
> > + for (MachineRegisterInfo::use_iterator UI = MRI.use_begin(DstReg);
> > + UI != MachineRegisterInfo::use_end();
> > ++UI) {
> > + TII->setImmOperand(&*UI, R600Operands::IMM, Imm);
> > + }
> > + }
> > + // Update the use's register
> > + MRI.replaceRegWith(DstReg, ImmReg);
> > + }
> > + break;
> > + }
> >
> > case AMDGPU::RAT_WRITE_CACHELESS_32_eg:
> > case AMDGPU::RAT_WRITE_CACHELESS_128_eg:
> > diff --git a/lib/Target/AMDGPU/R600InstrInfo.cpp
> > b/lib/Target/AMDGPU/R600InstrInfo.cpp
> > index e437313..2ca5e45 100644
> > --- a/lib/Target/AMDGPU/R600InstrInfo.cpp
> > +++ b/lib/Target/AMDGPU/R600InstrInfo.cpp
> > @@ -138,6 +138,15 @@ bool R600InstrInfo::isCubeOp(unsigned Opcode) const
> > }
> > }
> >
> > +bool R600InstrInfo::isALUInstr(unsigned Opcode) const
> > +{
> > + unsigned TargetFlags = get(Opcode).TSFlags;
> > +
> > + return ((TargetFlags & R600_InstFlag::OP1) |
> > + (TargetFlags & R600_InstFlag::OP2) |
> > + (TargetFlags & R600_InstFlag::OP3));
> > +}
> > +
> > DFAPacketizer *R600InstrInfo::CreateTargetScheduleState(const TargetMachine *TM,
> > const ScheduleDAG *DAG) const
> > {
> > @@ -582,6 +591,37 @@ void R600InstrInfo::setImmOperand(MachineInstr *MI,
> > R600Operands::Ops Op,
> > MI->getOperand(Idx).setImm(Imm);
> > }
> >
> > +bool R600InstrInfo::operandUsesImmediate(const MachineInstr &MI,
> > + R600Operands::Ops Op) const
> > +{
> > + int OpIndex = getOperandIdx(MI, Op);
> > + return OpIndex != -1 &&
> > + MI.getOperand(OpIndex).getReg() == AMDGPU::ALU_LITERAL_X;
> > +}
> > +
> > +bool R600InstrInfo::canUseImmediate(const MachineOperand &MO) const
> > +{
> > + const MachineInstr *MI = MO.getParent();
> > +
> > + if (!isALUInstr(MI->getOpcode())) {
> > + return false;
> > + }
> > +
> > + // XXX: Handle cases when an instruction can reference an immediate from
> > + // multiple operands:
> > + //
> > + // 1. Each operand uses the same immediate value
> > + // 2. There is room in the instruction group (MIBundle) for another
> > immediate.
> > + // We need to start using bundles first.
> > + if (operandUsesImmediate(*MI, R600Operands::SRC0) ||
> > + operandUsesImmediate(*MI, R600Operands::SRC1) ||
> > + operandUsesImmediate(*MI, R600Operands::SRC2)) {
> > + return false;
> > + }
> > +
> > + return true;
> > +}
> > +
> > //===----------------------------------------------------------------------===//
> > // Instruction flag getters/setters
> > //===----------------------------------------------------------------------===//
> > diff --git a/lib/Target/AMDGPU/R600InstrInfo.h
> > b/lib/Target/AMDGPU/R600InstrInfo.h
> > index cec1c3b..7c62a53 100644
> > --- a/lib/Target/AMDGPU/R600InstrInfo.h
> > +++ b/lib/Target/AMDGPU/R600InstrInfo.h
> > @@ -50,6 +50,9 @@ namespace llvm {
> > bool isReductionOp(unsigned opcode) const;
> > bool isCubeOp(unsigned opcode) const;
> >
> > + /// isALUInstr - Returns true if this Opcode represents an ALU instruction.
> > + bool isALUInstr(unsigned Opcode) const;
> > +
> > /// isVector - Vector instructions are instructions that must fill all
> > /// instruction slots within an instruction group.
> > bool isVector(const MachineInstr &MI) const;
> > @@ -133,6 +136,15 @@ namespace llvm {
> > /// setImmOperand - Helper function for setting instruction flag values.
> > void setImmOperand(MachineInstr *MI, R600Operands::Ops Op, int64_t Imm)
> > const;
> >
> > + /// operandUsesImmediate - Return true if it is legal for this instruction
> > + /// to reference an immediate from one of its opreand and the given operand
> > + /// is using an immediate.
> > + bool operandUsesImmediate(const MachineInstr &MI, R600Operands::Ops Op)
> > const;
> > +
> > + /// canUseImmediate - Return true if this operand can be replaced with an
> > + /// immediate.
> > + bool canUseImmediate(const MachineOperand &MO) const;
> > +
> > ///hasFlagOperand - Returns true if this instruction has an operand for
> > /// storing target flags.
> > bool hasFlagOperand(const MachineInstr &MI) const;
> > diff --git a/test/CodeGen/R600/fcmp-cnd.ll b/test/CodeGen/R600/fcmp-cnd.ll
> > index c6b6236..a94cfb5 100644
> > --- a/test/CodeGen/R600/fcmp-cnd.ll
> > +++ b/test/CodeGen/R600/fcmp-cnd.ll
> > @@ -1,6 +1,8 @@
> > ;RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck %s
> >
> > -;CHECK: CNDE T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW],
> > T[0-9]+\.[XYZW]}}
> > +;Not checking arguments 2 and 3 to CNDE, because they may change between
> > +;registers and literal.x depending on what the optimizer does.
> > +;CHECK: CNDE T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
> >
> > define void @test(i32 addrspace(1)* %out, float addrspace(1)* %in) {
> > entry:
> > diff --git a/test/CodeGen/R600/fcmp-cnde-int-args.ll
> > b/test/CodeGen/R600/fcmp-cnde-int-args.ll
> > index 92f3b5f..5c981ef 100644
> > --- a/test/CodeGen/R600/fcmp-cnde-int-args.ll
> > +++ b/test/CodeGen/R600/fcmp-cnde-int-args.ll
> > @@ -4,7 +4,7 @@
> > ; chance to optimize the fcmp + select instructions to CNDE was missed
> > ; due to the fact that the operands to fcmp and select had different types
> >
> > -;CHECK: CNDE T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW],
> > T[0-9]+\.[XYZW]}}
> > +;CHECK: CNDE T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], literal.x, 0.0, -1}}
> >
> > define void @test(i32 addrspace(1)* %out, float addrspace(1)* %in) {
> > entry:
> > diff --git a/test/CodeGen/R600/selectcc-icmp-select-float.ll
> > b/test/CodeGen/R600/selectcc-icmp-select-float.ll
> > index f1f8ab1..f65a300 100644
> > --- a/test/CodeGen/R600/selectcc-icmp-select-float.ll
> > +++ b/test/CodeGen/R600/selectcc-icmp-select-float.ll
> > @@ -2,7 +2,7 @@
> >
> > ; Note additional optimizations may cause this SGT to be replaced with a
> > ; CND* instruction.
> > -; CHECK: SGT_INT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW],
> > T[0-9]+\.[XYZW]}}
> > +; CHECK: SGT_INT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], literal.x, -1}}
> > ; Test a selectcc with i32 LHS/RHS and float True/False
> >
> > define void @test(float addrspace(1)* %out, i32 addrspace(1)* %in) {
> > diff --git a/test/CodeGen/R600/selectcc_cnde.ll
> > b/test/CodeGen/R600/selectcc_cnde.ll
> > index e06a170..f0a0f51 100644
> > --- a/test/CodeGen/R600/selectcc_cnde.ll
> > +++ b/test/CodeGen/R600/selectcc_cnde.ll
> > @@ -1,7 +1,7 @@
> > ;RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck %s
> >
> > ;CHECK-NOT: SETE
> > -;CHECK: CNDE T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW],
> > T[0-9]+\.[XYZW]}}
> > +;CHECK: CNDE T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], 1.0, literal.x,
> > [-0-9]+\(2.0}}
> > define void @test(float addrspace(1)* %out, float addrspace(1)* %in) {
> > %1 = load float addrspace(1)* %in
> > %2 = fcmp oeq float %1, 0.0
> > diff --git a/test/CodeGen/R600/selectcc_cnde_int.ll
> > b/test/CodeGen/R600/selectcc_cnde_int.ll
> > index 03d000f..b38078e 100644
> > --- a/test/CodeGen/R600/selectcc_cnde_int.ll
> > +++ b/test/CodeGen/R600/selectcc_cnde_int.ll
> > @@ -1,7 +1,7 @@
> > ;RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck %s
> >
> > ;CHECK-NOT: SETE_INT
> > -;CHECK: CNDE_INT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW],
> > T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
> > +;CHECK: CNDE_INT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], 1, literal.x, 2}}
> > define void @test(i32 addrspace(1)* %out, i32 addrspace(1)* %in) {
> > %1 = load i32 addrspace(1)* %in
> > %2 = icmp eq i32 %1, 0
> > --
> > 1.7.11.4
> >
> > _______________________________________________
> > mesa-dev mailing list
> > mesa-dev at lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> >
More information about the mesa-dev
mailing list