[Mesa-dev] [PATCH] R600: Make sure OQAP defs and uses happen in the same clause
Vincent Lejeune
vljn at ovi.com
Thu Nov 14 15:45:24 PST 2013
This patch is : reviewed-by: Vincent Lejeune<vljn at ovi.com>
----- Mail original -----
> De : Tom Stellard <tom at stellard.net>
> À : Vincent Lejeune <vljn at ovi.com>
> Cc : "mesa-dev at lists.freedesktop.org" <mesa-dev at lists.freedesktop.org>; "llvm-commits at cs.uiuc.edu" <llvm-commits at cs.uiuc.edu>; Tom Stellard <thomas.stellard at amd.com>
> Envoyé le : Jeudi 14 novembre 2013 1h53
> Objet : Re: [PATCH] R600: Make sure OQAP defs and uses happen in the same clause
>
> Hi Vincent,
>
> I discovered a bug in the previous patch. Here is an updated versions.
>
> -Tom
>
> On Tue, Nov 12, 2013 at 03:01:42PM -0800, Tom Stellard wrote:
>> Hi Vincent,
>>
>> Here is an updated patch where I added a call to
>> SubstituteKCacheBank() in canClauseLocalKillFitInClause() This should
>> prevent OQAP uses and defs from being split because of constant bank
>> limitations.
>>
>> Maybe we can leave the ScheduleDAGMutation optimization as a future
>> TODO.
>>
>> -Tom
>>
>> On Sun, Nov 03, 2013 at 10:19:16AM -0800, Vincent Lejeune wrote:
>> > I have put some comments below but otherwise the patch is
>> > reviewed-by: Vincent Lejeune <vljn at ovi.com>
>> >
>> >
>> > >-------------- next part --------------
>> > >>From 2eb4673e3184af0e077cbe30a594602441e8d98e Mon Sep 17
> 00:00:00 2001 >From: Tom Stellard <thomas.stellard at amd.com>
>> > >Date: Thu, 5 Sep 2013 08:59:32 -0700
>> > >Subject: [PATCH] R600: Fix scheduling of instructions that use the
> LDS output
>> > > queue
>> > >
>> > >The LDS output queue is accessed via the OQAP register. The OQAP
>> > >register cannot be live across clauses, so if value is written to
> the
>> > >output queue, it must be retrieved before the end of the clause.
>> > >With the machine scheduler, we cannot statisfy this constraint,
> because
>> > >it lacks proper alias analysis and it will mark some LDS accesses
> as
>> > >having a chain dependency on vertex fetches. Since vertex fetches
>> >
>> > We can customize the dependency graph before machine scheduling takes
> place,
>> > using ScheduleDAGMutation.
>> > I already wrote some code to break artificial dependencies between
> vector
>> > subregister read/write here :
>> >
> http://cgit.freedesktop.org/~vlj/llvm/commit/?h=vliw5&id=e91b16a22845d0a80ed348f158ae7ab293e003a8
>> > While I'm expecting from Matthias Braun's Subregister patches
> to be upstreamed
>> > to obsolete most of this patch except tests, it can be reworked so
> that
>> > it'll parse all MEM dependency, and remove the ones between
> instructions
>> > touching different memory pool (like VTX_FETCH and LDS_READ).
>> >
>> > >require a new clauses, the dependency may end up spiltting OQAP
> uses and
>> > >defs so the end up in different clauses. See the
> lds-output-queue.ll
>> > >test for a more detailed explanation.
>> > >
>> > >To work around this issue, we now combine the LDS read and the
> OQAP
>> > >copy into one instruction and expand it after register allocation.
>> > >
>> > >This patch also adds some checks to the EmitClauseMarker pass, so
> that
>> > >it doesn't end a clause with a value still in the output queue
> and
>> > >removes AR.X and OQAP handling from the scheduler (AR.X uses and
> defs
>> > >were already being expanded post-RA, so the scheduler will never
> see
>> > >them).
>> > >---
>> > > lib/Target/R600/R600EmitClauseMarkers.cpp | 52 ++++++++++++++
>> > > lib/Target/R600/R600ExpandSpecialInstrs.cpp | 17 +++++
>> > > lib/Target/R600/R600ISelLowering.cpp | 20 +++---
>> > > lib/Target/R600/R600InstrInfo.cpp | 8 +++
>> > > lib/Target/R600/R600InstrInfo.h | 2 +
>> > > lib/Target/R600/R600MachineScheduler.cpp | 32 ---------
>> > > lib/Target/R600/R600MachineScheduler.h | 2 -
>> > > lib/Target/R600/R600RegisterInfo.cpp | 13 ++++
>> > > lib/Target/R600/R600RegisterInfo.h | 2 +
>> > > test/CodeGen/R600/lds-output-queue.ll | 99
> +++++++++++++++++++++++++++
>> > > test/CodeGen/R600/local-memory-two-objects.ll | 8 ++-
>> > > 11 files changed, 206 insertions(+), 49 deletions(-)
>> > > create mode 100644 test/CodeGen/R600/lds-output-queue.ll
>> > >
>> > >diff --git a/lib/Target/R600/R600EmitClauseMarkers.cpp
> b/lib/Target/R600/R600EmitClauseMarkers.cpp
>> >
>> >
>> > >+ bool canClauseLocalKillFitInClause(
>> > >+ unsigned AluInstCount,
>> > >+ MachineBasicBlock::iterator Def,
>> > >+ MachineBasicBlock::iterator BBEnd)
> {
>> > >+ const R600RegisterInfo &TRI = TII->getRegisterInfo();
>> > >+ for (MachineInstr::const_mop_iterator
>> > >+ MOI = Def->operands_begin(),
>> > >+ MOE = Def->operands_end(); MOI != MOE; ++MOI)
> {
>> > >+ if (!MOI->isReg() || !MOI->isDef() ||
>> > >+ TRI.isPhysRegLiveAcrossClauses(MOI->getReg()))
>> > >+ continue;
>> > >+
>> > >+ // Def defines a clause local register, so check that its
> use will fit
>> > >+ // in the clause.
>> > >+ unsigned LastUseCount = 0;
>> > >+ for (MachineBasicBlock::iterator UseI = Def; UseI != BBEnd;
> ++UseI) {
>> > >+ AluInstCount += OccupiedDwords(UseI);
>> > >+ // We have reached the maximum instruction limit before
> finding the
>> > >+ // use that kills this register, so we cannot use this
> def in the
>> > >+ // current clause.
>> > >+ if (AluInstCount >= TII->getMaxAlusPerClause())
>> > >+ return false;
>> > >+
>> > >+ // Register kill flags have been cleared by the time we
> get to this
>> > >+ // pass, but it is safe to assume that all uses of this
> register
>> > >+ // occur in the same basic block as its definition,
> because
>> > >+ // it is illegal for the scheduler to schedule them in
>> > >+ // different blocks.
>> > >+ if (UseI->findRegisterUseOperandIdx(MOI->getReg()))
>> > >+ LastUseCount = AluInstCount;
>> > >+
>> > >+ if (UseI != Def &&
> UseI->findRegisterDefOperandIdx(MOI->getReg()) != -1)
>> > >+ break;
>> > >+ }
>> > >+ if (LastUseCount)
>> > >+ return LastUseCount <= TII->getMaxAlusPerClause();
>> > >+ llvm_unreachable("Clause local register live at end of
> clause.");
>> > >+ }
>> > >+ return true;
>> > >+ }
>> >
>> > This function does not check if current clause can hold all constant
> bank.
>> > I think it's likely to be rare for a clause to be split because of
> constant bank limitations,
>> > but it would be better to have an assertion failure in such case to
> make debugging easier.
>> > For instance if the SubstituteKCacheBank return false, you can check
> that there is no lds
>> > use still pending.
>> >
>> >
>> > ----- Mail original -----
>> > > De : Tom Stellard <tom at stellard.net>
>> > > À : Vincent Lejeune <vljn at ovi.com>
>> > > Cc : "llvm-commits at cs.uiuc.edu"
> <llvm-commits at cs.uiuc.edu>; "mesa-dev at lists.freedesktop.org"
> <mesa-dev at lists.freedesktop.org>; Tom Stellard
> <thomas.stellard at amd.com>
>> > > Envoyé le : Mercredi 30 octobre 2013 17h46
>> > > Objet : Re: [PATCH] R600: Make sure OQAP defs and uses happen in
> the same clause
>> > >
>> > > Hi Vincent,
>> > >
>> > > It turns out that it's not possible to correctly schedule
> uses and defs
>> > > of the OQAP register without proper alias analysis in the
> MachineScheduler. See
>> > > the explanation in the lds-output-queue.ll test case.
>> > >
>> > > Here is an updated patch that fixes all the outstanding LDS
> scheduling
>> > > bugs that I am aware of.
>> > >
>> > > -Tom
>> > >
>> > > On Fri, Oct 25, 2013 at 05:42:13AM -0700, Vincent Lejeune wrote:
>> > >> This patch should work when checking than no OQAP is used
> before beeing
>> > > queued, assuming that a value in OQAP is consumed
>> > >> and cannot be read twice. However I'm not sure I cover
> all LDS
>> > > instructions that queues a value, I only use LDS_RET_READ in
> switch case.
>> > >>
>> > >> Vincent
>> > >>
>> > >>
>> > >>
>> > >> ----- Mail original -----
>> > >> > De : Tom Stellard <tom at stellard.net>
>> > >> > À : Vincent Lejeune <vljn at ovi.com>
>> > >> > Cc : "llvm-commits at cs.uiuc.edu"
>> > > <llvm-commits at cs.uiuc.edu>;
> "mesa-dev at lists.freedesktop.org"
>> > > <mesa-dev at lists.freedesktop.org>; Tom Stellard
>> > > <thomas.stellard at amd.com>
>> > >> > Envoyé le : Mardi 22 octobre 2013 23h20
>> > >> > Objet : Re: [PATCH] R600: Make sure OQAP defs and uses
> happen in the
>> > > same clause
>> > >> >
>> > >> > Hi Vincent,
>> > >> >
>> > >> > Here is an updated patch. I wasn't sure where to
> put the
>> > > assertion to
>> > >> > check that UnscheduledNoLiveOut{Defs,Uses} is
> empty when
>> > > switching to a
>> > >> > new clause. I tried adding it to
> R600SchedStartegy::schedNode()
>> > > behind
>> > >> > the if (NextInstKind != CurInstKind) condition, but it
> always failed.
>> > >> > Any suggestions on where I should but it?
>> > >> >
>> > >> > -Tom
>> > >> >
>> > >> >
>> > >> > On Mon, Oct 21, 2013 at 12:40:28PM -0700, Vincent
> Lejeune wrote:
>> > >> >>
>> > >> >>
>> > >> >>
>> > >> >>
>> > >> >> ----- Mail original -----
>> > >> >> > De : Tom Stellard <tom at stellard.net>
>> > >> >> > À : llvm-commits at cs.uiuc.edu
>> > >> >> > Cc : mesa-dev at lists.freedesktop.org; Tom
> Stellard
>> > >> > <thomas.stellard at amd.com>
>> > >> >> > Envoyé le : Vendredi 11 octobre 2013 20h10
>> > >> >> > Objet : [PATCH] R600: Make sure OQAP defs and
> uses happen in
>> > > the same
>> > >> > clause
>> > >> >> >
>> > >> >> > From: Tom Stellard
> <thomas.stellard at amd.com>
>> > >> >> >
>> > >> >> > Reading the special OQAP register pops the
> top value off the
>> > > LDS
>> > >> >> > input queue and returns it to the
> instruction. This queue
>> > > is
>> > >> >> > invalidated at the end of an ALU clause and
> leaving values
>> > > in the
>> > >> > queue
>> > >> >> > can lead to GPU hangs. This means that if we
> load a value
>> > > into the
>> > >> > queue,
>> > >> >> > we must use it before the end of the clause.
>> > >> >> >
>> > >> >> > This fixes some hangs in the OpenCV test
> suite.
>> > >> >> > ---
>> > >> >> > lib/Target/R600/R600MachineScheduler.cpp | 25
>
>> > >> > +++++++++++++------------
>> > >> >> > lib/Target/R600/R600MachineScheduler.h | 4
> ++--
>> > >> >> > test/CodeGen/R600/lds-input-queue.ll | 26
>
>> > >> > ++++++++++++++++++++++++++
>> > >> >> > 3 files changed, 41 insertions(+), 14
> deletions(-)
>> > >> >> > create mode 100644
> test/CodeGen/R600/lds-input-queue.ll
>> > >> >> >
>> > >> >> > diff --git
> a/lib/Target/R600/R600MachineScheduler.cpp
>> > >> >> > b/lib/Target/R600/R600MachineScheduler.cpp
>> > >> >> > index 6c26d9e..611b7f4 100644
>> > >> >> > ---
> a/lib/Target/R600/R600MachineScheduler.cpp
>> > >> >> > +++
> b/lib/Target/R600/R600MachineScheduler.cpp
>> > >> >> > @@ -93,11 +93,12 @@ SUnit*
> R600SchedStrategy::pickNode(bool
>> > >> > &IsTopNode)
>> > >> >> > {
>> > >> >> > }
>> > >> >> >
>> > >> >> >
>> > >> >> > - // We want to scheduled AR defs as soon as
> possible to
>> > > make sure
>> > >> > they
>> > >> >> > aren't
>> > >> >> > - // put in a different ALU clause from
> their uses.
>> > >> >> > - if (!SU &&
> !UnscheduledARDefs.empty()) {
>> > >> >> > - SU = UnscheduledARDefs[0];
>> > >> >> > -
> UnscheduledARDefs.erase(UnscheduledARDefs.begin());
>> > >> >> > + // We want to scheduled defs that cannot
> be live outside
>> > > of this
>> > >> > clause
>> > >> >> > + // as soon as possible to make sure they
> aren't put
>> > > in a
>> > >> > different
>> > >> >> > + // ALU clause from their uses.
>> > >> >> > + if (!SU &&
> !UnscheduledNoLiveOutDefs.empty())
>> > > {
>> > >> >> > + SU = UnscheduledNoLiveOutDefs[0];
>> > >> >> > +
>> > >> >
> UnscheduledNoLiveOutDefs.erase(UnscheduledNoLiveOutDefs.begin());
>> > >> >> > NextInstKind = IDAlu;
>> > >> >> > }
>> > >> >> >
>> > >> >> > @@ -132,9 +133,9 @@ SUnit*
> R600SchedStrategy::pickNode(bool
>> > >> > &IsTopNode)
>> > >> >> > {
>> > >> >> >
>> > >> >> > // We want to schedule the AR uses as late
> as possible to
>> > > make sure
>> > >> > that
>> > >> >> > // the AR defs have been released.
>> > >> >> > - if (!SU &&
> !UnscheduledARUses.empty()) {
>> > >> >> > - SU = UnscheduledARUses[0];
>> > >> >> > -
> UnscheduledARUses.erase(UnscheduledARUses.begin());
>> > >> >> > + if (!SU &&
> !UnscheduledNoLiveOutUses.empty())
>> > > {
>> > >> >> > + SU = UnscheduledNoLiveOutUses[0];
>> > >> >> > +
>> > >> >
> UnscheduledNoLiveOutUses.erase(UnscheduledNoLiveOutUses.begin());
>> > >> >>
>> > >> >> Can we use std::queue<SUnit*> instead of a
> std::vector for
>> > >> > UnscheduledNoLiveOutUses ?
>> > >> >> I had to use a vector because I needed to be able
> to pop non
>> > > topmost SUnit
>> > >> > in some case
>> > >> >> (to fit Instruction Group const read limitation)
> but I would
>> > > rather avoid
>> > >> > erase(iterator) call
>> > >> >> when possible.
>> > >> >>
>> > >> >>
>> > >> >> > NextInstKind = IDAlu;
>> > >> >> > }
>> > >> >> >
>> > >> >> > @@ -217,15 +218,15 @@ void
>> > > R600SchedStrategy::releaseBottomNode(SUnit
>> > >> > *SU)
>> > >> >> > {
>> > >> >> >
>> > >> >> > int IK = getInstKind(SU);
>> > >> >> >
>> > >> >> > - // Check for AR register defines
>> > >> >> > + // Check for registers that do not live
> across ALU
>> > > clauses.
>> > >> >> > for (MachineInstr::const_mop_iterator I =
>> > >> >> > SU->getInstr()->operands_begin(),
>> > >> >> > E =
>> > >> >> > SU->getInstr()->operands_end();
>> > >> >> > I !=
> E; ++I) {
>> > >> >> > - if (I->isReg() &&
> I->getReg() ==
>> > > AMDGPU::AR_X)
>> > >> > {
>> > >> >> > + if (I->isReg() &&
> (I->getReg() ==
>> > > AMDGPU::AR_X ||
>> > >> >> > I->getReg() == AMDGPU::OQAP)) {
>> > >> >> > if (I->isDef()) {
>> > >> >> > - UnscheduledARDefs.push_back(SU);
>> > >> >> > +
> UnscheduledNoLiveOutDefs.push_back(SU);
>> > >> >> > } else {
>> > >> >> > - UnscheduledARUses.push_back(SU);
>> > >> >> > +
> UnscheduledNoLiveOutUses.push_back(SU);
>> > >> >> > }
>> > >> >> > return;
>> > >> >> > }
>> > >> >> > diff --git
> a/lib/Target/R600/R600MachineScheduler.h
>> > >> >> > b/lib/Target/R600/R600MachineScheduler.h
>> > >> >> > index 0a6f120..db2e188 100644
>> > >> >> > --- a/lib/Target/R600/R600MachineScheduler.h
>> > >> >> > +++ b/lib/Target/R600/R600MachineScheduler.h
>> > >> >> > @@ -53,8 +53,8 @@ class R600SchedStrategy :
> public
>> > >> > MachineSchedStrategy {
>> > >> >> >
>> > >> >> > std::vector<SUnit *>
> Available[IDLast],
>> > > Pending[IDLast];
>> > >> >> > std::vector<SUnit *>
> AvailableAlus[AluLast];
>> > >> >> > - std::vector<SUnit *>
> UnscheduledARDefs;
>> > >> >> > - std::vector<SUnit *>
> UnscheduledARUses;
>> > >> >> > + std::vector<SUnit *>
> UnscheduledNoLiveOutDefs;
>> > >> >> > + std::vector<SUnit *>
> UnscheduledNoLiveOutUses;
>> > >> >> > std::vector<SUnit *>
> PhysicalRegCopy;
>> > >> >> >
>> > >> >> > InstKind CurInstKind;
>> > >> >> > diff --git
> a/test/CodeGen/R600/lds-input-queue.ll
>> > >> >> > b/test/CodeGen/R600/lds-input-queue.ll
>> > >> >> > new file mode 100644
>> > >> >> > index 0000000..548b41c
>> > >> >> > --- /dev/null
>> > >> >> > +++ b/test/CodeGen/R600/lds-input-queue.ll
>> > >> >> > @@ -0,0 +1,26 @@
>> > >> >> > +; RUN: llc < %s -march=r600 -mcpu=redwood
> | FileCheck %s
>> > >> >>
>> > >> >> Does the test work with -verify-machineinstrs flag
> set ?
>> > >> >>
>> > >> >> > +;
>> > >> >> > +; This test checks that the lds input queue
> will is empty
>> > > at the end
>> > >> > of
>> > >> >> > +; the ALU clause.
>> > >> >> > +
>> > >> >> > +; CHECK-LABEL: @lds_input_queue
>> > >> >> > +; CHECK: LDS_READ_RET * OQAP
>> > >> >> > +; CHECK-NOT: ALU clause
>> > >> >> > +; CHECK: MOV
> T{{[0-9]\.[XYZW]}}, OQAP
>> > >> >> > +
>> > >> >> > + at local_mem = internal addrspace(3)
> unnamed_addr global [2 x
>> > > i32] [i32
>> > >> > 1, i32
>> > >> >> > 2], align 4
>> > >> >> > +
>> > >> >> > +define void @lds_input_queue(i32
> addrspace(1)* %out, i32
>> > >> > addrspace(1)* %in, i32
>> > >> >> > %index) {
>> > >> >> > +entry:
>> > >> >> > + %0 = getelementptr inbounds [2 x i32]
> addrspace(3)*
>> > > @local_mem, i32
>> > >> > 0, i32
>> > >> >> > %index
>> > >> >> > + %1 = load i32 addrspace(3)* %0
>> > >> >> > + call void @llvm.AMDGPU.barrier.local()
>> > >> >> > +
>> > >> >> > + ; This will start a new clause for the
> vertex fetch
>> > >> >> > + %2 = load i32 addrspace(1)* %in
>> > >> >> > + %3 = add i32 %1, %2
>> > >> >> > + store i32 %3, i32 addrspace(1)* %out
>> > >> >> > + ret void
>> > >> >> > +}
>> > >> >> > +
>> > >> >> > +declare void @llvm.AMDGPU.barrier.local()
>> > >> >> > --
>> > >> >> > 1.7.11.4
>> > >> >> >
>> > >> >> >
> _______________________________________________
>> > >> >> > llvm-commits mailing list
>> > >> >> > llvm-commits at cs.uiuc.edu
>> > >> >> >
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>> > >> >> >
>> > >> >
>> > >
>> > >> From 956c9986fcda84c4a4b0a555432319bbdfd98cfc Mon Sep 17
> 00:00:00 2001
>> > >> From: Vincent Lejeune <vljn at ovi.com>
>> > >> Date: Fri, 25 Oct 2013 14:32:35 +0200
>> > >> Subject: [PATCH] R600: Uses Assert for OQAP
>> > >>
>> > >> ---
>> > >> lib/Target/R600/R600EmitClauseMarkers.cpp | 31
>> > > +++++++++++++++++++++++++++++++
>> > >> lib/Target/R600/R600ISelLowering.cpp | 2 ++
>> > >> 2 files changed, 33 insertions(+)
>> > >>
>> > >> diff --git a/lib/Target/R600/R600EmitClauseMarkers.cpp
>> > > b/lib/Target/R600/R600EmitClauseMarkers.cpp
>> > >> index 928c0e3..9951b15 100644
>> > >> --- a/lib/Target/R600/R600EmitClauseMarkers.cpp
>> > >> +++ b/lib/Target/R600/R600EmitClauseMarkers.cpp
>> > >> @@ -33,6 +33,8 @@ private:
>> > >> static char ID;
>> > >> const R600InstrInfo *TII;
>> > >> int Address;
>> > >> + int OQAPLevel;
>> > >> + bool ArDefined;
>> > >>
>> > >> unsigned OccupiedDwords(MachineInstr *MI) const {
>> > >> switch (MI->getOpcode()) {
>> > >> @@ -160,8 +162,36 @@ private:
>> > >> return true;
>> > >> }
>> > >>
>> > >> + void AssertARAndOQAPCorrectness(MachineInstr *MI) {
>> > >> + for (MachineInstr::mop_iterator MOp =
> MI->operands_begin(),
>> > >> + E = MI->operands_end(); MOp != E; MOp++) {
>> > >> + MachineOperand &MO = *MOp;
>> > >> + if (!MO.isReg() || MO.isDef())
>> > >> + continue;
>> > >> + switch (MO.getReg()) {
>> > >> + default: break;
>> > >> + case AMDGPU::OQAP:
>> > >> + OQAPLevel--;
>> > >> + assert (OQAPLevel >= 0 && "OQAP not
> defined in
>> > > this clause !");
>> > >> + break;
>> > >> + }
>> > >> + }
>> > >> + switch (MI->getOpcode()) {
>> > >> + default: break;
>> > >> + case AMDGPU::MOVA_INT_eg:
>> > >> + ArDefined = true;
>> > >> + break;
>> > >> + case AMDGPU::LDS_READ_RET:
>> > >> + OQAPLevel++;
>> > >> + break;
>> > >> + }
>> > >> + }
>> > >> +
>> > >> MachineBasicBlock::iterator
>> > >> MakeALUClause(MachineBasicBlock &MBB,
> MachineBasicBlock::iterator I)
>> > > {
>> > >> + // LDS queue/ AR registers get resetted in new clause.
>> > >> + OQAPLevel = 0;
>> > >> + ArDefined = false;
>> > >> MachineBasicBlock::iterator ClauseHead = I;
>> > >> std::vector<std::pair<unsigned, unsigned> >
> KCacheBanks;
>> > >> bool PushBeforeModifier = false;
>> > >> @@ -205,6 +235,7 @@ private:
>> > >> !SubstituteKCacheBank(I, KCacheBanks))
>> > >> break;
>> > >> AluInstCount += OccupiedDwords(I);
>> > >> + AssertARAndOQAPCorrectness(I);
>> > >> }
>> > >> unsigned Opcode = PushBeforeModifier ?
>> > >> AMDGPU::CF_ALU_PUSH_BEFORE : AMDGPU::CF_ALU;
>> > >> diff --git a/lib/Target/R600/R600ISelLowering.cpp
>> > > b/lib/Target/R600/R600ISelLowering.cpp
>> > >> index 3c2e388..d8cbcae 100644
>> > >> --- a/lib/Target/R600/R600ISelLowering.cpp
>> > >> +++ b/lib/Target/R600/R600ISelLowering.cpp
>> > >> @@ -215,6 +215,7 @@ MachineBasicBlock *
>> > > R600TargetLowering::EmitInstrWithCustomInserter(
>> > >> }
>> > >>
>> > >> case AMDGPU::TXD: {
>> > >> + abort();
>> > >> unsigned T0 =
>> > > MRI.createVirtualRegister(&AMDGPU::R600_Reg128RegClass);
>> > >> unsigned T1 =
>> > > MRI.createVirtualRegister(&AMDGPU::R600_Reg128RegClass);
>> > >> MachineOperand &RID = MI->getOperand(4);
>> > >> @@ -316,6 +317,7 @@ MachineBasicBlock *
>> > > R600TargetLowering::EmitInstrWithCustomInserter(
>> > >> }
>> > >>
>> > >> case AMDGPU::TXD_SHADOW: {
>> > >> + abort();
>> > >> unsigned T0 =
>> > > MRI.createVirtualRegister(&AMDGPU::R600_Reg128RegClass);
>> > >> unsigned T1 =
>> > > MRI.createVirtualRegister(&AMDGPU::R600_Reg128RegClass);
>> > >> MachineOperand &RID = MI->getOperand(4);
>> > >> --
>> > >> 1.8.3.1
>> > >
>> > >>
>> > >
>
>> From fa5abaa99d923fd429029ed149805063848f3184 Mon Sep 17 00:00:00 2001
>> From: Tom Stellard <thomas.stellard at amd.com>
>> Date: Thu, 5 Sep 2013 08:59:32 -0700
>> Subject: [PATCH] R600: Fix scheduling of instructions that use the LDS
> output
>> queue
>>
>> The LDS output queue is accessed via the OQAP register. The OQAP
>> register cannot be live across clauses, so if value is written to the
>> output queue, it must be retrieved before the end of the clause.
>> With the machine scheduler, we cannot statisfy this constraint, because
>> it lacks proper alias analysis and it will mark some LDS accesses as
>> having a chain dependency on vertex fetches. Since vertex fetches
>> require a new clauses, the dependency may end up spiltting OQAP uses and
>> defs so the end up in different clauses. See the lds-output-queue.ll
>> test for a more detailed explanation.
>>
>> To work around this issue, we now combine the LDS read and the OQAP
>> copy into one instruction and expand it after register allocation.
>>
>> This patch also adds some checks to the EmitClauseMarker pass, so that
>> it doesn't end a clause with a value still in the output queue and
>> removes AR.X and OQAP handling from the scheduler (AR.X uses and defs
>> were already being expanded post-RA, so the scheduler will never see
>> them).
>> ---
>> lib/Target/R600/R600EmitClauseMarkers.cpp | 65 ++++++++++++++++--
>> lib/Target/R600/R600ExpandSpecialInstrs.cpp | 17 +++++
>> lib/Target/R600/R600ISelLowering.cpp | 20 +++---
>> lib/Target/R600/R600InstrInfo.cpp | 8 +++
>> lib/Target/R600/R600InstrInfo.h | 2 +
>> lib/Target/R600/R600MachineScheduler.cpp | 32 ---------
>> lib/Target/R600/R600MachineScheduler.h | 2 -
>> lib/Target/R600/R600RegisterInfo.cpp | 13 ++++
>> lib/Target/R600/R600RegisterInfo.h | 2 +
>> test/CodeGen/R600/lds-output-queue.ll | 99
> +++++++++++++++++++++++++++
>> test/CodeGen/R600/local-memory-two-objects.ll | 8 ++-
>> 11 files changed, 215 insertions(+), 53 deletions(-)
>> create mode 100644 test/CodeGen/R600/lds-output-queue.ll
>>
>> diff --git a/lib/Target/R600/R600EmitClauseMarkers.cpp
> b/lib/Target/R600/R600EmitClauseMarkers.cpp
>> index 928c0e3..881b4aa 100644
>> --- a/lib/Target/R600/R600EmitClauseMarkers.cpp
>> +++ b/lib/Target/R600/R600EmitClauseMarkers.cpp
>> @@ -47,6 +47,11 @@ private:
>> break;
>> }
>>
>> + // These will be expanded to two ALU instructions in the
>> + // ExpandSpecialInstructions pass.
>> + if (TII->isLDSRetInstr(MI->getOpcode()))
>> + return 2;
>> +
>> if(TII->isVector(*MI) ||
>> TII->isCubeOp(MI->getOpcode()) ||
>> TII->isReductionOp(MI->getOpcode()))
>> @@ -108,6 +113,10 @@ private:
>> bool SubstituteKCacheBank(MachineInstr *MI,
>> std::vector<std::pair<unsigned, unsigned> >
> &CachedConsts) const {
>> std::vector<std::pair<unsigned, unsigned> > UsedKCache;
>> +
>> + if (!TII->isALUInstr(MI->getOpcode()) &&
> MI->getOpcode() != AMDGPU::DOT_4)
>> + return true;
>> +
>> const SmallVectorImpl<std::pair<MachineOperand *, int64_t>
>> &Consts =
>> TII->getSrcs(MI);
>> assert((TII->isALUInstr(MI->getOpcode()) ||
>> @@ -160,6 +169,52 @@ private:
>> return true;
>> }
>>
>> + bool canClauseLocalKillFitInClause(
>> + unsigned AluInstCount,
>> + std::vector<std::pair<unsigned, unsigned>
>> KCacheBanks,
>> + MachineBasicBlock::iterator Def,
>> + MachineBasicBlock::iterator BBEnd) {
>> + const R600RegisterInfo &TRI = TII->getRegisterInfo();
>> + for (MachineInstr::const_mop_iterator
>> + MOI = Def->operands_begin(),
>> + MOE = Def->operands_end(); MOI != MOE; ++MOI) {
>> + if (!MOI->isReg() || !MOI->isDef() ||
>> + TRI.isPhysRegLiveAcrossClauses(MOI->getReg()))
>> + continue;
>> +
>> + // Def defines a clause local register, so check that its use will
> fit
>> + // in the clause.
>> + unsigned LastUseCount = 0;
>> + for (MachineBasicBlock::iterator UseI = Def; UseI != BBEnd; ++UseI)
> {
>> + AluInstCount += OccupiedDwords(UseI);
>> + // Make sure we won't need to end the clause due to KCache
> limitations.
>> + if (!SubstituteKCacheBank(UseI, KCacheBanks))
>> + return false;
>> +
>> + // We have reached the maximum instruction limit before finding
> the
>> + // use that kills this register, so we cannot use this def in the
>> + // current clause.
>> + if (AluInstCount >= TII->getMaxAlusPerClause())
>> + return false;
>> +
>> + // Register kill flags have been cleared by the time we get to
> this
>> + // pass, but it is safe to assume that all uses of this register
>> + // occur in the same basic block as its definition, because
>> + // it is illegal for the scheduler to schedule them in
>> + // different blocks.
>> + if (UseI->findRegisterUseOperandIdx(MOI->getReg()))
>> + LastUseCount = AluInstCount;
>> +
>> + if (UseI != Def &&
> UseI->findRegisterDefOperandIdx(MOI->getReg()) != -1)
>> + break;
>> + }
>> + if (LastUseCount)
>> + return LastUseCount <= TII->getMaxAlusPerClause();
>> + llvm_unreachable("Clause local register live at end of
> clause.");
>> + }
>> + return true;
>> + }
>> +
>> MachineBasicBlock::iterator
>> MakeALUClause(MachineBasicBlock &MBB, MachineBasicBlock::iterator I)
> {
>> MachineBasicBlock::iterator ClauseHead = I;
>> @@ -198,11 +253,13 @@ private:
>> I++;
>> break;
>> }
>> - if (TII->isALUInstr(I->getOpcode()) &&
>> - !SubstituteKCacheBank(I, KCacheBanks))
>> +
>> + // If this instruction defines a clause local register, make sure
>> + // its use can fit in this clause.
>> + if (!canClauseLocalKillFitInClause(AluInstCount, KCacheBanks, I, E))
>> break;
>> - if (I->getOpcode() == AMDGPU::DOT_4 &&
>> - !SubstituteKCacheBank(I, KCacheBanks))
>> +
>> + if (!SubstituteKCacheBank(I, KCacheBanks))
>> break;
>> AluInstCount += OccupiedDwords(I);
>> }
>> diff --git a/lib/Target/R600/R600ExpandSpecialInstrs.cpp
> b/lib/Target/R600/R600ExpandSpecialInstrs.cpp
>> index 67b42d7..aeee4aa 100644
>> --- a/lib/Target/R600/R600ExpandSpecialInstrs.cpp
>> +++ b/lib/Target/R600/R600ExpandSpecialInstrs.cpp
>> @@ -68,6 +68,23 @@ bool
> R600ExpandSpecialInstrsPass::runOnMachineFunction(MachineFunction &MF)
> {
>> MachineInstr &MI = *I;
>> I = llvm::next(I);
>>
>> + // Expand LDS_*_RET instructions
>> + if (TII->isLDSRetInstr(MI.getOpcode())) {
>> + int DstIdx = TII->getOperandIdx(MI.getOpcode(),
> AMDGPU::OpName::dst);
>> + assert(DstIdx != -1);
>> + MachineOperand &DstOp = MI.getOperand(DstIdx);
>> + MachineInstr *Mov = TII->buildMovInstr(&MBB, I,
>> + DstOp.getReg(),
> AMDGPU::OQAP);
>> + DstOp.setReg(AMDGPU::OQAP);
>> + int LDSPredSelIdx = TII->getOperandIdx(MI.getOpcode(),
>> + AMDGPU::OpName::pred_sel);
>> + int MovPredSelIdx = TII->getOperandIdx(Mov->getOpcode(),
>> + AMDGPU::OpName::pred_sel);
>> + // Copy the pred_sel bit
>> + Mov->getOperand(MovPredSelIdx).setReg(
>> + MI.getOperand(LDSPredSelIdx).getReg());
>> + }
>> +
>> switch (MI.getOpcode()) {
>> default: break;
>> // Expand PRED_X to one of the PRED_SET instructions.
>> diff --git a/lib/Target/R600/R600ISelLowering.cpp
> b/lib/Target/R600/R600ISelLowering.cpp
>> index 5bb8129..9845d12 100644
>> --- a/lib/Target/R600/R600ISelLowering.cpp
>> +++ b/lib/Target/R600/R600ISelLowering.cpp
>> @@ -128,21 +128,17 @@ MachineBasicBlock *
> R600TargetLowering::EmitInstrWithCustomInserter(
>>
>> switch (MI->getOpcode()) {
>> default:
>> - if (TII->isLDSInstr(MI->getOpcode()) &&
>> - TII->getOperandIdx(MI->getOpcode(), AMDGPU::OpName::dst) !=
> -1) {
>> + // Replace LDS_*_RET instruction that don't have any uses with the
>> + // equivalent LDS_*_NORET instruction.
>> + if (TII->isLDSRetInstr(MI->getOpcode())) {
>> int DstIdx = TII->getOperandIdx(MI->getOpcode(),
> AMDGPU::OpName::dst);
>> assert(DstIdx != -1);
>> MachineInstrBuilder NewMI;
>> - if (!MRI.use_empty(MI->getOperand(DstIdx).getReg())) {
>> - NewMI = BuildMI(*BB, I, BB->findDebugLoc(I),
> TII->get(MI->getOpcode()),
>> - AMDGPU::OQAP);
>> - TII->buildDefaultInstruction(*BB, I, AMDGPU::MOV,
>> - MI->getOperand(0).getReg(),
>> - AMDGPU::OQAP);
>> - } else {
>> - NewMI = BuildMI(*BB, I, BB->findDebugLoc(I),
>> -
> TII->get(AMDGPU::getLDSNoRetOp(MI->getOpcode())));
>> - }
>> + if (!MRI.use_empty(MI->getOperand(DstIdx).getReg()))
>> + return BB;
>> +
>> + NewMI = BuildMI(*BB, I, BB->findDebugLoc(I),
>> +
> TII->get(AMDGPU::getLDSNoRetOp(MI->getOpcode())));
>> for (unsigned i = 1, e = MI->getNumOperands(); i < e; ++i)
> {
>> NewMI.addOperand(MI->getOperand(i));
>> }
>> diff --git a/lib/Target/R600/R600InstrInfo.cpp
> b/lib/Target/R600/R600InstrInfo.cpp
>> index a11d54a..0ae4d0b 100644
>> --- a/lib/Target/R600/R600InstrInfo.cpp
>> +++ b/lib/Target/R600/R600InstrInfo.cpp
>> @@ -141,6 +141,14 @@ bool R600InstrInfo::isLDSInstr(unsigned Opcode) const
> {
>> (TargetFlags & R600_InstFlag::LDS_1A2D));
>> }
>>
>> +bool R600InstrInfo::isLDSNoRetInstr(unsigned Opcode) const {
>> + return isLDSInstr(Opcode) && getOperandIdx(Opcode,
> AMDGPU::OpName::dst) == -1;
>> +}
>> +
>> +bool R600InstrInfo::isLDSRetInstr(unsigned Opcode) const {
>> + return isLDSInstr(Opcode) && getOperandIdx(Opcode,
> AMDGPU::OpName::dst) != -1;
>> +}
>> +
>> bool R600InstrInfo::canBeConsideredALU(const MachineInstr *MI) const
> {
>> if (isALUInstr(MI->getOpcode()))
>> return true;
>> diff --git a/lib/Target/R600/R600InstrInfo.h
> b/lib/Target/R600/R600InstrInfo.h
>> index d7438ef..d6d1fe8 100644
>> --- a/lib/Target/R600/R600InstrInfo.h
>> +++ b/lib/Target/R600/R600InstrInfo.h
>> @@ -65,6 +65,8 @@ namespace llvm {
>> bool isALUInstr(unsigned Opcode) const;
>> bool hasInstrModifiers(unsigned Opcode) const;
>> bool isLDSInstr(unsigned Opcode) const;
>> + bool isLDSNoRetInstr(unsigned Opcode) const;
>> + bool isLDSRetInstr(unsigned Opcode) const;
>>
>> /// \returns true if this \p Opcode represents an ALU
> instruction or an
>> /// instruction that will be lowered in ExpandSpecialInstrs Pass.
>> diff --git a/lib/Target/R600/R600MachineScheduler.cpp
> b/lib/Target/R600/R600MachineScheduler.cpp
>> index 6c26d9e..da2a4d8 100644
>> --- a/lib/Target/R600/R600MachineScheduler.cpp
>> +++ b/lib/Target/R600/R600MachineScheduler.cpp
>> @@ -92,15 +92,6 @@ SUnit* R600SchedStrategy::pickNode(bool &IsTopNode)
> {
>> AllowSwitchFromAlu = true;
>> }
>>
>> -
>> - // We want to scheduled AR defs as soon as possible to make sure they
> aren't
>> - // put in a different ALU clause from their uses.
>> - if (!SU && !UnscheduledARDefs.empty()) {
>> - SU = UnscheduledARDefs[0];
>> - UnscheduledARDefs.erase(UnscheduledARDefs.begin());
>> - NextInstKind = IDAlu;
>> - }
>> -
>> if (!SU && ((AllowSwitchToAlu && CurInstKind != IDAlu)
> ||
>> (!AllowSwitchFromAlu && CurInstKind == IDAlu))) {
>> // try to pick ALU
>> @@ -130,15 +121,6 @@ SUnit* R600SchedStrategy::pickNode(bool
> &IsTopNode) {
>> NextInstKind = IDOther;
>> }
>>
>> - // We want to schedule the AR uses as late as possible to make sure that
>> - // the AR defs have been released.
>> - if (!SU && !UnscheduledARUses.empty()) {
>> - SU = UnscheduledARUses[0];
>> - UnscheduledARUses.erase(UnscheduledARUses.begin());
>> - NextInstKind = IDAlu;
>> - }
>> -
>> -
>> DEBUG(
>> if (SU) {
>> dbgs() << " ** Pick node **\n";
>> @@ -217,20 +199,6 @@ void R600SchedStrategy::releaseBottomNode(SUnit *SU)
> {
>>
>> int IK = getInstKind(SU);
>>
>> - // Check for AR register defines
>> - for (MachineInstr::const_mop_iterator I =
> SU->getInstr()->operands_begin(),
>> - E =
> SU->getInstr()->operands_end();
>> - I != E; ++I) {
>> - if (I->isReg() && I->getReg() == AMDGPU::AR_X) {
>> - if (I->isDef()) {
>> - UnscheduledARDefs.push_back(SU);
>> - } else {
>> - UnscheduledARUses.push_back(SU);
>> - }
>> - return;
>> - }
>> - }
>> -
>> // There is no export clause, we can schedule one as soon as its ready
>> if (IK == IDOther)
>> Available[IDOther].push_back(SU);
>> diff --git a/lib/Target/R600/R600MachineScheduler.h
> b/lib/Target/R600/R600MachineScheduler.h
>> index 0a6f120..97c8cde 100644
>> --- a/lib/Target/R600/R600MachineScheduler.h
>> +++ b/lib/Target/R600/R600MachineScheduler.h
>> @@ -53,8 +53,6 @@ class R600SchedStrategy : public MachineSchedStrategy
> {
>>
>> std::vector<SUnit *> Available[IDLast], Pending[IDLast];
>> std::vector<SUnit *> AvailableAlus[AluLast];
>> - std::vector<SUnit *> UnscheduledARDefs;
>> - std::vector<SUnit *> UnscheduledARUses;
>> std::vector<SUnit *> PhysicalRegCopy;
>>
>> InstKind CurInstKind;
>> diff --git a/lib/Target/R600/R600RegisterInfo.cpp
> b/lib/Target/R600/R600RegisterInfo.cpp
>> index dd8f3ef..83fc90b 100644
>> --- a/lib/Target/R600/R600RegisterInfo.cpp
>> +++ b/lib/Target/R600/R600RegisterInfo.cpp
>> @@ -85,3 +85,16 @@ const RegClassWeight
> &R600RegisterInfo::getRegClassWeight(
>> const TargetRegisterClass *RC) const {
>> return RCW;
>> }
>> +
>> +bool R600RegisterInfo::isPhysRegLiveAcrossClauses(unsigned Reg) const
> {
>> + assert(!TargetRegisterInfo::isVirtualRegister(Reg));
>> +
>> + switch (Reg) {
>> + case AMDGPU::OQAP:
>> + case AMDGPU::OQBP:
>> + case AMDGPU::AR_X:
>> + return false;
>> + default:
>> + return true;
>> + }
>> +}
>> diff --git a/lib/Target/R600/R600RegisterInfo.h
> b/lib/Target/R600/R600RegisterInfo.h
>> index d458e55..18f3b3e 100644
>> --- a/lib/Target/R600/R600RegisterInfo.h
>> +++ b/lib/Target/R600/R600RegisterInfo.h
>> @@ -45,6 +45,8 @@ struct R600RegisterInfo : public AMDGPURegisterInfo
> {
>>
>> virtual const RegClassWeight &getRegClassWeight(const
> TargetRegisterClass *RC) const;
>>
>> + // \returns true if \p Reg can be defined in one ALU caluse and
> used in another.
>> + virtual bool isPhysRegLiveAcrossClauses(unsigned Reg) const;
>> };
>>
>> } // End namespace llvm
>> diff --git a/test/CodeGen/R600/lds-output-queue.ll
> b/test/CodeGen/R600/lds-output-queue.ll
>> new file mode 100644
>> index 0000000..63a4332
>> --- /dev/null
>> +++ b/test/CodeGen/R600/lds-output-queue.ll
>> @@ -0,0 +1,99 @@
>> +; RUN: llc < %s -march=r600 -mcpu=redwood -verify-machineinstrs |
> FileCheck %s
>> +;
>> +; This test checks that the lds input queue will is empty at the end of
>> +; the ALU clause.
>> +
>> +; CHECK-LABEL: @lds_input_queue
>> +; CHECK: LDS_READ_RET * OQAP
>> +; CHECK-NOT: ALU clause
>> +; CHECK: MOV * T{{[0-9]\.[XYZW]}}, OQAP
>> +
>> + at local_mem = internal addrspace(3) unnamed_addr global [2 x i32] [i32 1,
> i32 2], align 4
>> +
>> +define void @lds_input_queue(i32 addrspace(1)* %out, i32 addrspace(1)*
> %in, i32 %index) {
>> +entry:
>> + %0 = getelementptr inbounds [2 x i32] addrspace(3)* @local_mem, i32 0,
> i32 %index
>> + %1 = load i32 addrspace(3)* %0
>> + call void @llvm.AMDGPU.barrier.local()
>> +
>> + ; This will start a new clause for the vertex fetch
>> + %2 = load i32 addrspace(1)* %in
>> + %3 = add i32 %1, %2
>> + store i32 %3, i32 addrspace(1)* %out
>> + ret void
>> +}
>> +
>> +declare void @llvm.AMDGPU.barrier.local()
>> +
>> +; The machine scheduler does not do proper alias analysis and assumes that
>> +; loads from global values (Note that a global value is different that a
>> +; value from global memory. A global value is a value that is declared
>> +; outside of a function, it can reside in any address space) alias with
>> +; all other loads.
>> +;
>> +; This is a problem for scheduling the reads from the local data share
> (lds).
>> +; These reads are implemented using two instructions. The first copies
> the
>> +; data from lds into the lds output queue, and the second moves the data
> from
>> +; the input queue into main memory. These two instructions don't have
> to be
>> +; scheduled one after the other, but they do need to be scheduled in the
> same
>> +; clause. The aliasing problem mentioned above causes problems when there
> is a
>> +; load from global memory which immediately follows a load from a global
> value that
>> +; has been declared in the local memory space:
>> +;
>> +; %0 = getelementptr inbounds [2 x i32] addrspace(3)* @local_mem, i32 0,
> i32 %index
>> +; %1 = load i32 addrspace(3)* %0
>> +; %2 = load i32 addrspace(1)* %in
>> +;
>> +; The instruction selection phase will generate ISA that looks like this:
>> +; %OQAP = LDS_READ_RET
>> +; %vreg0 = MOV %OQAP
>> +; %vreg1 = VTX_READ_32
>> +; %vreg2 = ADD_INT %vreg1, %vreg0
>> +;
>> +; The bottom scheduler will schedule the two ALU instructions first:
>> +;
>> +; UNSCHEDULED:
>> +; %OQAP = LDS_READ_RET
>> +; %vreg1 = VTX_READ_32
>> +;
>> +; SCHEDULED:
>> +;
>> +; vreg0 = MOV %OQAP
>> +; vreg2 = ADD_INT %vreg1, %vreg2
>> +;
>> +; The lack of proper aliasing results in the local memory read
> (LDS_READ_RET)
>> +; to consider the global memory read (VTX_READ_32) has a chain dependency,
> so
>> +; the global memory read will always be scheduled first. This will give
> us a
>> +; final program which looks like this:
>> +;
>> +; Alu clause:
>> +; %OQAP = LDS_READ_RET
>> +; VTX clause:
>> +; %vreg1 = VTX_READ_32
>> +; Alu clause:
>> +; vreg0 = MOV %OQAP
>> +; vreg2 = ADD_INT %vreg1, %vreg2
>> +;
>> +; This is an illegal program because the OQAP def and use know occur in
>> +; different ALU clauses.
>> +;
>> +; This test checks this scenario and makes sure it doesn't result in
> an
>> +; illegal program. For now, we have fixed this issue by merging the
>> +; LDS_READ_RET and MOV together during instruction selection and then
>> +; expanding them after scheduling. Once the scheduler has better alias
>> +; analysis, we should be able to keep these instructions sparate before
>> +; scheduling.
>> +;
>> +; CHECK-LABEL: @local_global_alias
>> +; CHECK: LDS_READ_RET
>> +; CHECK-NOT: ALU clause
>> +; CHECK MOV * T{{[0-9]\.[XYZW]}}, OQAP
>> +define void @local_global_alias(i32 addrspace(1)* %out, i32 addrspace(1)*
> %in) {
>> +entry:
>> + %0 = getelementptr inbounds [2 x i32] addrspace(3)* @local_mem, i32 0,
> i32 0
>> + %1 = load i32 addrspace(3)* %0
>> + %2 = load i32 addrspace(1)* %in
>> + %3 = add i32 %2, %1
>> + store i32 %3, i32 addrspace(1)* %out
>> + ret void
>> +}
>> diff --git a/test/CodeGen/R600/local-memory-two-objects.ll
> b/test/CodeGen/R600/local-memory-two-objects.ll
>> index b413fe3..e2d8406 100644
>> --- a/test/CodeGen/R600/local-memory-two-objects.ll
>> +++ b/test/CodeGen/R600/local-memory-two-objects.ll
>> @@ -12,9 +12,11 @@
>> ; SI-CHECK: .long 47180
>> ; SI-CHECK-NEXT: .long 32768
>>
>> -; Make sure the lds writes are using different addresses.
>> -; EG-CHECK: LDS_WRITE {{[*]*}}
> {{PV|T}}[[ADDRW:[0-9]*\.[XYZW]]]
>> -; EG-CHECK-NOT: LDS_WRITE {{[*]*}} T[[ADDRW]]
>> +; We would like to check the the lds writes are using different
>> +; addresses, but due to variations in the scheduler, we can't do
>> +; this consistently on evergreen GPUs.
>> +; EG-CHECK: LDS_WRITE
>> +; EG-CHECK: LDS_WRITE
>> ; SI-CHECK: DS_WRITE_B32 0, {{v[0-9]*}}, v[[ADDRW:[0-9]*]]
>> ; SI-CHECK-NOT: DS_WRITE_B32 0, {{v[0-9]*}}, v[[ADDRW]]
>>
>> --
>> 1.8.1.4
>>
>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
More information about the mesa-dev
mailing list